WO2010017518A2 - Connecting microrna genes to the core transcriptional regulatory circuitry of embryonic stem cells - Google Patents
Connecting microrna genes to the core transcriptional regulatory circuitry of embryonic stem cells Download PDFInfo
- Publication number
- WO2010017518A2 WO2010017518A2 PCT/US2009/053214 US2009053214W WO2010017518A2 WO 2010017518 A2 WO2010017518 A2 WO 2010017518A2 US 2009053214 W US2009053214 W US 2009053214W WO 2010017518 A2 WO2010017518 A2 WO 2010017518A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- mirna
- cell
- mιr
- cells
- mmu
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1034—Isolating an individual clone by screening libraries
- C12N15/1089—Design, preparation, screening or analysis of libraries using computer algorithms
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/30—Detection of binding sites or motifs
Definitions
- Embryonic stem (ES) cells hold significant potential for clinical therapies because of their distinctive capacity to both self-renew and differentiate into a wide range of specialized cell types. Understanding the transcriptional regulatory circuitry of ES cells and early cellular differentiation is fundamental to understanding human development and realizing the therapeutic potential of these cells. Transcription factors that control ES cell pluripotency and self-renewal have been identified (Chambers and Smith, 2004; Niwa, 2007; Silva and Smith, 2008) and a draft of the core regulatory circuitry by which these factors exert their regulatory effects on protein-coding genes has been described (Boyer et al., 2005; Loh et al., 2006; Lee et al., 2006; Boyer et al. 2006).
- miRNAs contribute to the control of early development. However, little is known about the function and regulation of miRNAs in ES cells. Furthermore, although numerous miRNAs have been identified in various mammalian species, there is much less information available regarding miRNA gene transcriptional start sites and promoter regions. Summary of the Invention
- the invention relates in part to promoters and high probability transcriptional start sites for genes, e.g., microRNA genes, and methods for identification thereof.
- the invention provides a method of identifying a genomic region containing a high probability transcriptional start site for a microRNA (miRNA) gene, the method comprising: (a) identifying a genomic region comprising a candidate transcriptional start site for an miRNA gene based at least in part on enrichment for histone H3 trimethylated at its lysine residue (H3K4me3) within such region; and (b) assigning a score to said region based at least in part on (i) its proximity to one or more annotated mature miRNA sequences, (ii) expressed sequence tag (EST) data, and/or (iii) conservation of the region between multiple species, wherein the following factors, if present, contribute positively to the score: (I) proximity of the region to one or more annotated mature miRNA sequences, (II) identification of the region as containing the
- the region is between 100 base pairs (bp) and 10 kilobases (10 kB) in length. In some embodiments the region is between 100 base pairs (bp) and 5 kilobases (5 kB) in length. In some embodiments the region is between 100 base pairs (bp) and 1 kilobase (1 kB) in length. In some embodiments the method comprises: (i) identifying a plurality of genomic regions containing candidate transcriptional start sites for miRNA genes from the genomes of at least two cell types of different cell lineages; and (ii) identifying genomic regions that are conserved between the at least two cell types, wherein such conservation indicates an increased likelihood that the genomic region comprises a transcriptional start site.
- the method comprises: (i) identifying a plurality of genomic regions containing candidate transcriptional start sites for miRNA genes from the genomes of at least two different differentiated cell types; and (ii) identifying genomic regions that are conserved between the at least two cell types, wherein such conservation indicates an increased likelihood that the genomic region comprises a transcriptional start site.
- the method comprises: (i) identifying a plurality of genomic regions containing candidate transcriptional start sites for miRNA genes from the genomes of cells derived from each at least two different mammalian species; and (ii) identifying genomic regions that are conserved between the cells derived from each at least two different mammalian species, wherein such conservation indicates an increased likelihood that the genomic region comprises a transcriptional start site.
- the cells from the at least two different mammalian organisms are of the same cell type or lineage. In some embodiments the cells are from mouse and human.
- the invention further provides a computer-readable medium having instructions stored thereon for performing at least step (b) or step (c) of the method when provided with suitable data. [0005]
- the invention provides a computer-readable medium having information stored thereon, wherein the information describes a plurality of regions comprising high probability miRNA gene transcriptional start sites, wherein said information describes regions comprising high probability mammalian miRNA gene transcriptional start sites for at least 100 miRNA genes or at least 75% of the miRNA genes in a selected mammalian species.
- the high probability miRNA gene transcriptional start sites are identified by a method comprising steps of: (a) identifying a genomic region comprising a candidate transcriptional start site for an miRNA gene based at least in part on enrichment for histone H3 trimethylated at its lysine residue (H3K4me3) within such region; and (b) assigning a score to said region based at least in part on (i) its proximity to one or more annotated mature miRNA sequences, (ii) expressed sequence tag (EST) data, and/or (iii) conservation of the region between multiple species, wherein the following factors, if present, contribute positively to the score: (I) proximity of the region to one or more annotated mature miRNA sequences, (II) identification of the region as containing the start site of a known transcript that spans a miRNA or of an EST that spans a miRNA, and (III) conservation of the region between multiple mammalian species; and the following factors, if present, contribute negatively
- the miRNA transcriptional start sites are high probability mammalian, e.g., human, miRNA gene transcriptional start sites.
- the invention further provides a method comprising steps of: (i) electronically accessing a computer-readable medium of the invention; and (ii) extracting or analyzing information therefrom.
- the invention provides a computer-readable medium having information stored thereon, wherein the information describes a regulatory network comprising relationships between one or more key ES cell transcription factors, at least 20 ES cell transcription factor target genes, and at least some targets of the ES cell transcription factor target genes, wherein the ES cell transcription factor target genes include at least some genes that encode proteins and at least some genes that encode miRNAs.
- the key ES cell transcription factors are selected from: Oct4, Nanog, Sox2, and TcB.
- the key ES cell transcription factors are Oct4, Nanog, Sox2, and TcO.
- the information stored on the computer-readable medium further comprises information describing relationships between Polycomb group proteins and at least some of the key ES cell transcription factor target genes.
- the invention further comprises a method comprising steps of: (i) electronically accessing said computer-readable medium; and (ii) extracting or analyzing information therefrom. [0008]
- the invention further provides an isolated nucleic acid comprising a region comprising a high probability transcriptional start site for a mammalian miRNA gene.
- the region is identified according to a method comprising steps of: (a) identifying a genomic region comprising a candidate transcriptional start site for an miRNA gene based at least in part on enrichment for histone H3 trimethylated at its lysine residue (H3K4me3) within such region; and (b) assigning a score to said region based at least in part on (i) its proximity to one or more annotated mature miRNA sequences, (ii) expressed sequence tag (EST) data, and/or (iii) conservation of the region between multiple species, wherein the following factors, if present, contribute positively to the score: (I) proximity of the region to one or more annotated mature miRNA sequences, (II) identification of the region as containing the start site of a known transcript that spans a miRNA or of an EST that spans a miRNA, and (III) conservation of the region between multiple mammalian species; and the following factors, if present, contribute negatively to the score: (IV) if the H
- the region comprises or consists of a transcription start site (TSS) listed in Table S6 or S7 and wherein, optionally, the isolated nucleic acid comprises no more than 1 kB, 2 kB, 5 kB, 8 kB, or 10 kB of genomic sequence on the 5' side, the 3' side, or both sides of the TSS.
- the region comprises at least 50 continuous nucleic acids of a transcription start site (TSS) listed in Table S6 or S7 and wherein, optionally, the isolated nucleic acid comprises no more than 1 kB, 2 kB, 5 kB, 8 kB, or 10 kB of genomic sequence on the 5' side, the 3' side, or both sides of the TSS.
- the isolated nucleic acid further comprises a miRNA sequence.
- the invention further provides a composition comprising such an isolated nucleic acid and a transcription factor, wherein the transcription factor is one that binds to the region in at least some cell types.
- the invention further provides a nucleic acid construct, e.g., an isolated nucleic acid construct, comprising such an isolated nucleic acid.
- the isolated nucleic acid comprises a promoter and the construct comprises a heterologous nucleic acid operably linked to the promoter.
- the isolated nucleic acid comprises a promoter and the construct comprises a sequence encoding a polypeptide or microRNA.
- the polypeptide is a reporter polypeptide of use to detect and/or quantify expression from the promoter.
- the reporter polypeptide comprises a fluorescent protein.
- the invention further provides a host cell or transgenic non-human mammal, e.g., a mouse, containing such a nucleic acid construct.
- the invention further provides a method of identifying an agent with potential to modulate expression of a miRNA, the method comprising: (i) providing a nucleic acid construct comprising a miRNA promoter operably linked to a heterologous nucleic acid; and (ii) determining whether a test agent affects expression of the heterologous nucleic acid, wherein if the test agent affects expression of the heterologous nucleic acid, the test agent is identified as an agent with potential to modulate expression of the miRNA.
- the heterologous nucleic acid encodes a reporter protein.
- the nucleic acid construct is in a cell and the method comprises contacting the cell with the test agent, In some embodiments the miRNA is listed in Table S6 or Table S7. In some embodiments, the method further comprises: (iii) contacting cells with the agent; (iv) measuring expression of the miRNA or of a target gene of the miRNA; and (v) determining whether contacting the cells with the agent alters expression of the miRNA or miRNA target gene relative to expression that would be expected in the absence of the agent.
- the invention further provides a method of identifying a miRNA that acts as a determinant of cell fate decisions, wherein the miRNA is one that is selectively expressed in cells of one or more differentiated cell types or lineages, the method comprising determining whether the promoter of the miRNA is repressed by a Polycomb group protein in ES and/or iPS cells, wherein if the promoter of the miRNA is repressed by a Polycomb group protein in ES and/or iPS cells, the miRNA is identified as a determinant of cell fate decisions.
- determining whether the promoter is repressed by a Polycomb group protein comprises determining whether the promoter is bound by a Polycomb group protein.
- the miRNA is listed in Table S6 or S7.
- the invention further provides a method of identifying a polymorphism or mutation in a mammalian species, the method comprising; (i) obtaining the sequence of a genomic region containing a miRNA promoter in a plurality of individuals of the species; and (ii) determining whether the sequence of the region varies between within the region, wherein variations within the sequence define polymorphisms or mutations.
- the miRNA is listed in Table S6 or S7.
- the invention further provides a method of identifying a polymorphism or mutation associated with increased or decreased risk of developing a disease, the method comprising: (i) analyzing the sequence of a genomic region containing a miRNA promoter in a plurality of individuals with the disease; and (ii) determining whether a correlation exists between the presence of particular polymorphic variant(s) or mutation(s) within the region in individuals and presence of the disease.
- the disease is associated with aberrant (e.g., increased or decreased) miRNA expression.
- the disease is cancer.
- the miRNA is listed in Table S6 or Table S7.
- the miRNA promoter is one that is bound by a Polycomb group protein in ES and/or iPS cells.
- the invention further provides a method of modulating the differentiation of a pluripotent mammalian stem cell, the method comprising: modulating the level or activity of a miRNA in the pluripotent stem cell, wherein the miRNA is encoded by a gene whose promoter is bound by a key embryonic stem (ES) cell transcription factor in ES and/or iPS cells.
- the pluripotent stem cell is an induced pluripotent stem (iPS) cell.
- the method comprises decreasing the level or activity of a miRNA in the cell.
- the method comprises contacting the cell with an oligonucleotide complementary to the miRNA.
- the method comprises expressing an oligonucleotide complementary to the miRNA (or miRNA precursor) in the cell. In some embodiments the method comprises increasing the level or activity of a miRNA in the cell. In some embodiments the method comprises introducing the miRNA or a miRNA precursor containing the miRNA into the cell, or expressing the miRNA or a miRNA precursor in the cell. In some embodiments the method comprises modulating the binding of a transcription factor to the promoter of the gene that encodes the miRNA. In some embodiments the miRNA is one whose promoter is bound by a Polycomb group protein in ES and/or iPS cells. In some embodiments the method further comprises administering the cell to an individual.
- the pluripotent stem cell is a human cell.
- the invention further provides a mammalian cell, e.g., a human cell, wherein the differentiation state of the cell has been modulated according to such a method.
- the invention provides a method of treating an individual comprising: administering such a cell to the individual.
- the method comprises (i) obtaining a cell from an individual; (ii) reprogramming the cell in vitro; and (iii) administering the cell to the individual.
- the cell is differentiated in vitro.
- the invention further provides a method of modulating the in vitro reprogramming of a differentiated mammalian somatic cell, the method comprising: modulating the level or activity of a miRNA in the differentiated mammalian somatic cell, wherein the miRNA is encoded by a gene whose promoter is bound by a key embryonic stem (ES) cell transcription factor in ES and/or iPS cells.
- the method comprises reprogramming the somatic cell to a pluripotent state.
- the method comprises reprogramming the somatic cell to a pluripotent state and then differentiating the reprogrammed pluripotent cell to a desired cell type or lineage.
- the method comprises reprogramming the somatic cell from a first at least partially differentiated state to a second at least partially differentiated state. In some embodiments the method comprises reprogramming the somatic cell from a first cell type to a second cell type, wherein the first and second cell types are in different cell lineages. In some embodiments the method comprises decreasing the level or activity of a miRNA in the cell. In some embodiments the method comprises contacting the cell with an oligonucleotide complementary to the miRNA. In some embodiments the method comprises increasing the level or activity of a miRNA in the cell. In some embodiments the method comprises introducing the miRNA or a miRNA precursor containing the miRNA into the cell, or expressing the miRNA or a miRNA precursor in the cell.
- the method comprises modulating the binding of a transcription factor to the promoter of the gene that encodes the miRNA.
- the miRNA is one whose promoter is bound by a Polycomb group protein in ES and/or iPS cells.
- the somatic cell is a human cell.
- the invention further provides a reprogrammed mammalian somatic cell, wherein the in vitro reprogramming of the cell has been modulated according to the method.
- the invention further provides a method of treating an individual comprising: administering the cell to the individual.
- the method comprises (i) obtaining a cell from an individual; (ii) reprogramming the cell in vitro; and (iii) administering the cell to the individual.
- the invention further provides a method of modulating the differentiation state of a mammalian somatic cell, the method comprising: modulating the level or activity of a miRNA in the mammalian somatic cell, wherein the miRNA is one that is expressed in a cell type or cell lineage specific manner and is encoded by a gene whose promoter is bound by a Polycomb group protein in ES and/or iPS cells.
- the somatic cell is a human cell.
- the method comprises decreasing the level or activity of a miRNA in the cell.
- the method comprises contacting the cell with an oligonucleotide complementary to the miRNA.
- the method comprises increasing the level or activity of a miRNA in the cell.
- the method comprises introducing the miRNA or a miRNA precursor containing the miRNA into the cell, or expressing the miRNA or a miRNA precursor in the cell. In some embodiments the method comprises modulating the binding of a transcription factor to the promoter of the gene that encodes the miRNA.
- the invention further provides a mammalian somatic cell, wherein the differentiation state of the cell has been modulated according to the method. [0016]
- the invention further provides a method of treating an individual comprising: administering the cell to the individual.
- method comprises (i) obtaining a cell from an individual; (ii) modulating the differentiation state of the cell in vitro; and (iii) administering the cell to the individual. In some embodiments the modulating promotes differentiation of the cell to a desired cell type or lineage.
- FIG. 1 High-Resolution Genome-wide Mapping of Core ES Cell Transcription Factors with ChIP-seq.
- A Summary of binding data for Oct4, Sox2, Nanog, and TcO. 14,230 sites are cobound genome wide and mapped to either promoter-proximal (TSS ⁇ 8 kb, dark green, 27% of binding sites), genie (>8 kb from TSS, middle green, 30% of binding sites), or intergenic (light green, 43% of binding sites).
- TSS promoter-proximal
- the promoter-proximal binding sites are associated with 3,289 genes.
- FIG. 1 Description of algorithm for miRNA promoter identification. A library of candidate transcriptional start sites was generated with histone H3 lysine 4 trimethyl (H3K4me3) location analysis data from multiple tissues ([Barski et al., 2007], [Guenther et al., 2007] and [Mikkelsen et al., 2007]). Candidates were scored to assess likelihood that they represent true miRNA promoters. Based on scores, a list of mouse and human miRNA promoters was assembled. Additional details can be found in Example 7. (B) Examples of identified miRNA promoter regions.
- a map of H3K4me3 enrichment is displayed in regions neighboring selected human and mouse miRNAs for multiple cell types: human ES cells (hES), REH human pro-B cell line (B cell), primary human hepatocytes (Liver), primary human T cells (T cell), mouse ES cells (mES), neural precursor cells (NPCs), and mouse embryonic fibroblasts (MEFs).
- miRNA promoter coordinates were confirmed by distance to mature miRNA genomic sequence, conservation, and EST data (shown as solid line where available). Predicted transcriptional start site and direction of transcription are noted by an arrow, with mature miRNA sequences indicated (red). CpG islands, commonly found at promoters, are indicated (green). Dotted lines denote presumed transcripts.
- FIG. Oct4, Sox2, Nanog, and TcG Occupancy and Regulation of miRNA Promoters.
- A Oct4 (blue), Sox2 (purple), Nanog (orange), and Tcf3 (red) binding is shown at four murine miRNA genes as in Figure IA.
- H3K4me3 enrichment in ES cells is indicated by shading across genomic region. Presumed transcripts are shown as dotted lines. Coordinates for the mmu-mir-290-295 cluster are derived from NCBI build 37.
- (B) Oct4 ChIP enrichment ratios (ChIP-enriched versus total genomic DNA) are shown across human miRNA promoter region for the hsa-mir-302 cluster.
- H3K4me3 enrichment in ES cells is indicated by shading across genomic region.
- C Schematic of miRNAs with conserved binding by the core transcription factors in ES cells. Transcription factors are represented by dark blue circles and miRNAs are represented by purple hexagons.
- D Quantitative RT- PCR analysis of RNA extracted from ZHBTc4 cells in the presence or absence of doxycycline treatment. Fold-change was calculated for each pri-miRNA for samples from 12 hr and 24 hr of doxycyline treatment relative to those from untreated cells. Transcript levels were normalized to Gapdh levels. Error bars indicate standard deviation derived from triplicate PCR reactions.
- D Most human and mouse miRNA promoters show evidence of H3K4me3 enrichment in multiple tissues.
- FIG. 4 Figure 4. Regulation of Oct4/Sox2/Nanog/TCF3 -Bound miRNAs during Differentiation.
- A Pie charts showing relative contributions of miRNAs to the complete population of miRNAs in mES cells (red), MEFs (blue), and NPCs (green) based on quantification of miRNAs by small RNA sequencing. A full list of the miRNAs identified can be found in Table S9.
- B Normalized frequency of detection of individual mature miRNAs whose promoters are occupied by Oct4/Sox2/Nanog/ TcO in mouse. Red line in center and right panel show the level of detection in ES cells.
- C Histogram of changes in frequency of detection.
- FIG. 5 Polycomb Represses Lineage-Specific miRNAs in ES Cells.
- A Suzl2 (light green) and H3K27me3 (dark green, Mikkelsen et al., 2007) binding are shown for two miRNA genes in murine ES cells. Predicted start sites (arrow), CpG islands (green bar), presumed miRNA primary transcript (dotted line), and mature miRNA (red bar) are shown.
- B Expression analysis of miRNAs from mES cells based on quantitative small RNA sequencing. Cumulative distributions for Polycomb-bound miRNAs (green line) and all miRNAs (gray line) are shown.
- C Expression analysis of miRNAs occupied by Suzl2 in mES cells.
- the Polycomb group (PcG) protein Suzl2 is represented by a green circle.
- FIG. miRNA Modulation of the Gene Regulatory Network in ES Cells
- A An incoherent feed-forward motif (Alon, 2007) involving an miRNA repression of a transcription factor target gene is illustrated (left). Transcription factors are represented by dark blue circles, miRNAs in purple hexagons, protein-coding gene in pink rectangles, and proteins in orange ovals. Selected instances of this network motif identified in ES cells based on data from Sinkkonen et al., 2008 or data in Figure Sl 1 are shown (right).
- B Second model of incoherent feed-forward motif (Alon, 2007) involving protein repression of an miRNA is illustrated (left).
- Lin28 blocks the maturation of primary pvi-Let-7g (Viswanathan et al., 2008). Lin28 and the Let-7g gene are occupied by Oct4/Sox2/Nanog/Tef3, Targetscan prediction (Grimson et al., 2007) of Lin28 by mature Let-7g is noted (purple dashed line, right).
- C A coherent feed-forward motif (Alon, 2007) involving miRNA repression of a transcriptional repressor that regulates a transcription factor target gene is illustrated (left).
- FIG. 7 Multilevel Regulatory Network Controlling ES Cell Identity, Updated map of ES cell regulatory circuitry is shown. Interconnected autoregulatory loop is shown to the left. Active genes are shown at the top right, and inactive genes are shown at the bottom right. Transcription factors are represented by dark blue circles, and Suzl2 by a green circle. Gene promoters are represented by red rectangles, gene products by orange circles, and miRNA promoters are represented by purple hexagons.
- FIG. 1 Figure Sl. Comparison of ChIP-seq and RT-PCR data for Oct4 and Suzl2.
- FIG. 1 Promoters for known genes occupied by Oct4/Sox2/Nanog/Tcf3 in mES cells, a. Overlap of genes whose promoters are within 8kb of sites enriched for Oct4, Sox2, Nanog, or Tcf3. Not shown are the Nanog:Oct4 overlap (289) and Sox2:Tcf3 overlap (26). Red line deliniates genes considered occupied by Oct4/Sox2/Nanog/Tcf3. b. Enrichment for selected GO-terms previously reported to be associated with Oct4/Sox2/Nanog binding (Boyer et ah, 2005) was tested on the sets of genes occupied at high-confidence for 1 to 4 of the tested DNA binding factors. Hypergeometric p-value is shown for genes annotated for DNA binding (blue), Regulation of Transcription (green) and Development (red).
- FIG. 1 Comparison of ChIP-seq and ChIP-chip genome wide data for Oct4, Nanog and Tcf3.
- (lower) Binding derived from ChIP ⁇ chip enrichment ratios Colde et. al., 2008)
- b. Poor probe density prevents detection of -1/3 of ChIP-seq binding events on Agilent genome- wide tiling arrays.
- Top panel shows the fraction of regions that are occupied by Oct4/Sox2/Nanog/TcO at high-confidence in mES cells as identified by ChIP-seq that are enriched for Oct4 (blue), Nanog (orange) and TcO (red) on Agilent genome-wide microarrays (Cole et al., 2008). Numbers on the x-axis define the boundaries used to classify probe densities for the histogram. Bottom panel illustrates a histogram of the microarray probe densities of the enriched regions identified, c. Comparison of motif association. At the set of genome- wide ChIP-chip probe positions, we examined the association between an Oct4 DNA motif and ChIP-chip and ChIP-seq enrichment.
- Probes / Bins were considered positive if they were associated with a high scoring motif within a 200 bp window (+/-100 bp). The background motif occurance for all probe positions is 8.2% (left most group). 1297 ChIP-seq bins and 421 ChIP-chip probes are included in the top categories respectively.
- Figure S4 High resolution analysis of Oct4/Sox2/Nanog/Tcf3 binding based on Meta-analysis, a-d. Short sequence reads for a. Oct4, b. Sox2, c. Nanog, d. Tc ⁇ mapping within 250bp of 2000 highly enriched regions where the peak of binding was found within 50bp of a high quality Oct4/Sox2 motif were collected.
- FIG. 1 Flowchart describing the method used to identify the promoters for primary miRNA transcripts in human and mouse. For a full description, see Example 7.
- b Two examples of identification of miRNA promoters. Top, Initial identification of possible start sites based on H3K4me3 enriched regions from four cell types. Enrichment of H3K4me3-modified nucleosomes is shown as shades of gray. Red bar represents the position of the mature miRNA. Black bars below the graph are regions enriched for H3K4me3. Initial scores are shown below the black bars.
- Middle Identification of candidate start sites ⁇ 5kb upstream of the mature miRNA (yellow shaded area).
- Bottom identification of candidate start sites that either initiate overlapping (left) or non-overlapping (right) transcripts.
- EST and transcript data is shown. Scores associated with identified genes are shown bold.
- Figure S6 Summary of miRNA promoter classification, a. Promoters assigned to mature miRNAs were classified by the dominant feature of their scoring. Green: miRNAs that were found to have overlapping ESTs or genes confirming their promoters. Orange: miRNAs that were found to have a candidate start site within 5kb of the mature miRNA.
- Gray miRNAs with either no candidates within 250kb of the mature miRNA or where all candidates had a score less then zero (see Fig. S5b, right).
- Yellow miRNAs for which the closest candidate start site was selected solely on the basis of its proximity, b.
- the basis of miRNA promoter identification including Gene or EST evidence (green), distance of ⁇ 5 kilobases to mature miRNA (orange), nearest possible promoter to miRNA (yellow), tended to be conserved between human and mouse.
- FIG. 7 Figure S7. Regulation of miRNAs by Oct4.
- a In an engineered murine cell line (Niwa et al., 2000), endogenous Oct4 is deleted, and Oct4 expression is maintained by a Dox-repressible transgene.
- b By 24 hours of Dox-treatment, Oct4 mRNA levels are reduced as shown by reverse transcription (RT)-PCR.
- c 24 hours following Dox-treatment, cells remain ES-like by morphology, d. 24 hours following Dox-treatment Sox2 protein can still be detected by immunofluoresence..
- e Changes in levels of Oct4/Sox2/Nanog/Tcf3 occupied mature miRNAs based on Solexa sequencing of small RNAs. Fold change was calculated by comparing normalized read counts from untreated cells and cells 24 hours after Dox treatment. A full list of miRNA reads can be found in Table S9. Details about the normalization procedure are contained in Example 7.
- FIG. 1 Figure S8. Regulation of miRNAs by TcO.
- TcD was knocked down in V6.5 mES cells using lentiviral vectors containing shRNAs.
- a RT-PCR confirmation of knockdown at 72 hours post-infection using Taqman probes against TcD (relative to levels in cells infected with GFP control lentivirus).
- b Schematic of the position of RT-PCR probes used to measure the levels of pri-miRNA transcripts in Figure 3d and part c.
- c Results of quantitative reverse transcriptase(RT)-PCR analysis of probes designed to several pri-miRNAs occupied by Oct4/Sox2/Nanog/TcD. Change in the level of primary transcript compared to GFP control lentivirus are shown.
- * p ⁇ 0.05
- ** p ⁇ 0.001 using a two-sampled t-test assuming equal variance. Standard deviation is indicated with error bars.
- FIG. 1 Maps of RNA genes occupied by the core master regulators in ES cells are expressed in induced Pluripotent Stem cells (iPS).
- RNA was extracted from MEFs (columns 1-3), rnES cells (columns 4, 5) and iPS cells (column 6) and hybridized to microarrays with LNA probes targeting all known miRNAs. Differentially expressed miRNAs enriched in either MEFs or mES cells are shown (FDR ⁇ 10%, see Example 7, iPS cells were not used to determine differential expression). Data were Z-score normalized, and cell types were clustered hierarchically (top). Active miRNA promoters associated with Oct4/Sox2/Nanog/Tcf3 are listed to the right,
- FIG. PcG occupied miRNAs are generally expressed in a tissue specific manner. Mature miRNAs derived from genes occupied by Suzl2 and H3K27me3-modified nucleosomes were compared to the list of tissue specific miRNAs derived from the miRNA expression atlas (Landgraf et al, 2007). Vertical axis represents tissue-specificity and miRNAs with specificity score >1 are shown. miRNAs bound by Oct4/Sox2/Nanog/Tc ⁇ and expressed in mES cells are not shown (largely ES cell specific miRNAs). Among the tissue- specific miRNAs there is significant enrichment (p ⁇ 0.005 by hypergeometric distribution) for miRNAs occupied by Suzl2 (green).
- the invention relates at least in part to microRNAs and microRNA genes.
- the invention relates to identification of promoters for miRNA genes, e.g,. in mammalian cells.
- the invention relates to the regulation and role(s) of miRNAs in pluripotency and differentiation, e.g., in mammalian cells.
- the invention integrates miRNAs and their target genes into the core regulatory circuitry of pluripotency and self-renewal, e.g., in ES cells, induced pluripotent stem (iPS) cells, etc.
- the invention provides a method of identifying a promoter of a miRNA gene.
- miRNA microRNA
- the invention provides computer-readable medium having computer-executable instructions stored thereon for performing at least part of the method, e.g., step (b) and/or (c) when provided with suitable data.
- computer systems comprising the computer-readable medium and a processor for performing the instructions.
- the system comprises means for inputting and/or outputting or displaying data and/or results.
- the invention provides the recognition that in vivo chromatin signatures can be used to identify promoters and/or high probability transcriptional start sites (TSSs), e.g., in mammalian cells.
- TSSs high probability transcriptional start sites
- Such in vivo chromatin signatures can comprise enrichment for histone H3 trimethylated at its lysine residue (H3K4me3).
- Inventive methods for identifying promoters and/or TSSs are exemplified herein using miRNA genes.
- the invention provides methods to identify high probability transcriptional start site (TSS) for other genes, e.g., genes for other non-coding RNAs (which may be short or long), or protein- coding genes.
- the invention provides a method of identifying a genomic region containing a high probability transcriptional start site (TSS) for a gene, the method comprising: (a) identifying a genomic region comprising a candidate transcriptional start site for a gene based at least in part on enrichment for histone H3 trimethylated at its lysine residue (H3K4me3) within such region; and (b) assigning a score to said region based at least in part on (i) its proximity to one or more annotated RNA sequences, (ii) expressed sequence tag (EST) data, and/or (iii) conservation of the region between multiple species, wherein the following factors, if present, contribute positively to the score: (I) proximity of the region to one or more annotated RNA sequences, (II) identification of the region as containing the start site of a known transcript that spans an RNA or of an EST that spans an RNA, and (III) conservation of the region between multiple mammalian species; and the following factors,
- Computer-readable medium having computer-executable instructions stored thereon for performing at least part of the method, e.g., step (b) and/or (c) when provided with suitable data are provided.
- computer systems comprising the computer- readable medium and a processor for performing the instructions.
- the system comprises means for inputting and/or outputting or displaying data and/or results.
- the invention provides genomic regions comprising promoters of human and mouse mammalian miRNA genes (see, e.g., Table S6 and Table S7). Identification of the promoters of miRNA genes is of great scientific and practical interest for a number of reasons.
- the invention provides such methods. Modulating miRNA expression in turn modulates expression of miRNA target genes (e.g., genes whose expression is inhibited by the miRNA), By modulating expression or activity of a particular miRNA, expression of multiple target genes can be modulated.
- miRNA target genes e.g., genes whose expression is inhibited by the miRNA
- the invention provides a number of regulatory interactions such as autoregulatory loops, coherent and incoherent feed-forward loops, and various other network motifs, etc., that are of use in controlling gene expression.
- the invention provides genomic regions comprising miRNA promoters that are bound by key ES cell transcription factors (e.g., Oct4, Nanog, Sox2, and/or TcO) in ES cells and or are bound by Polycomb group protein(s) in ES cells.
- key ES cell transcription factors e.g., Oct4, Nanog, Sox2, and/or TcO
- the invention provides computer-readable media containing information describing the genomic regions and/or TF binding sites. Further provided are methods comprising accessing the information and, optionally, retrieving or analyzing it.
- the invention also discloses miRNAs whose promoters are, in ES cells, bound by at least one key ES cell transcription factor (see, e.g., Tables S6 and S7).
- a miRNA or miRNA gene whose promoter is, in ES cells, bound by at least one key ES cell transcription factor is referred to as an "ESTF-bound miRNA” or "ESTF-bound miRNA gene", respectively.
- a promoter that, in ES cells, is bound by at least one key ES cell transcription factor is referred to as an "ESTF-bound promoter".
- a miRNA promoter disclosed herein is bound by one of the afore-mentioned TFs, while in other embodiments a promoter is bound by 2, 3, or 4 of the transcription factors (TFs), i.e., it is "co-occupied" by multiple TFs.
- miRNA precursors e.g., stem-loop structures
- miRNA precursors comprising the miRNAs.
- One of skill in the art will be able to consult databases such as miRBase (http://microrna.sanger.ac.uk/sequences/), which contains sequences of miRNA precursors corresponding to known miRNAs.
- the invention also provides genomic regions containing miRNA promoters that are bound by Polycomb group protein(s) in ES cells and, optionally, also bound by one or more key ES cell transcription factors.
- the invention also provides miRNAs whose promoters are bound by Polycomb group protein(s) in ES cells. Such binding is typically associated with repression of the miRNA gene.
- the invention provides the recognition that miRNAs that were bound by Polycomb group protein(s) in ES cells were among the transcripts that are specifically induced in differentiated cell types. Based at least in part on this recognition, the invention provides methods of identifying miRNAs that serve as key determinants of cell fate decisions, e.g., as "master regulators" controlling cell identity.
- the subset of miRNAs that are both cell-type specific and whose promoters are bound by Polycomb group proteins in ES cells are of great interest in this regard.
- These miRNAs which are repressed in pluripotent cells and are expressed in differentiated cell types, are candidates for playing key roles in specifying cell fate. Modulation of such miRNAs has the potential to modulate reprogramming in a variety of contexts.
- derepressing miRNA(s) whose promoter(s) are bound by Polycomb and that are expressed specifically or selectively in one or more cell lineages or cell types is of use to direct the differentiation of pluripotent cells along such cell lineages or to such cell types.
- increasing expression or activity of miRNA(s) whose promoters are bound by Polycomb and that are expressed specifically or selectively in one or more cell lineages or cell types is of use to direct the differentiation of pluripotent cells along such cell lineages or to such cell types.
- such lineage is neuronal, ectodermal, mesodermal, endodermal, etc.
- the invention provides isolated nucleic acids comprising the genomic regions (e.g., any genomic region identified as a TSS in Table S6 or S7).
- the nucleic acid further comprises a miRNA sequence, e.g., the corresponding miRNA sequence listed in Table S6 or S7.
- the nucleic acid further comprises a sequence that encodes a miRNA precuror, e.g., the precursor of the corresponding miRNA sequence listed in Table S6 or S7.
- the nucleic acid comprises up to 100 bp, up to 500 bp, up to 1 kB, up to 5 kB, up to 8 kB, or up to 10 kB of genomic sequence on either the 5' side, the 3' side, or both sides of the identified genomic region.
- the invention further provides isolated nucleic acids at least 20 or at least 25 bp in length, whose sequence falls within or overlaps an identified genomic region.
- the isolated nucleic acid comprises a binding site for Oct4, Nanog, Sox2, Tcf3, or any 2, 3, or all 4 of these TFs.
- the invention further provides nucleic acid constructs, e.g., plasmids or other vectors (e.g., expression vectors), comprising any of the afore-mentioned isolated nucleic acid sequences.
- the nucleic acid construct comprises a heterologous nucleic acid sequence, i.e., a sequence not normally found adjacent to the isolated nucleic acid sequence in the genome of the organism from which it was derived.
- the heterologous nucleic acid and promoter are operably linked, whereby the promoter and heterologous nucleic acid are positioned with respect to one another so that the promoter directs expression of the heterologous nucleic acid, in cells that contain appropriate TFs.
- the heterologous nucleic acid encodes a reporter molecule, e.g., a fluorescent protein or protein having enzymatic activity that can be used to assess expression from the miRNA promoter or a selectable marker such as a drug resistance marker or nutritional marker.
- the invention further provides cells, e.g., isolated cells, containing the construct, and transgenic animals, cells of which contain the construct, e.g., integrated into the genome. Cells could be of any cell type or lineage. In some embodiments the cells are ES cells or iPS cells.
- Constructs of the invention and cells containing them are of use, e.g., in methods to detect and/or quantify expression directed by the miRNA gene promoter and/or in methods to identify agents (e.g., small molecules such as organic compounds having molecular weight less than about 1 kD, or less than about 1.5 or 2 kD; polypeptides, peptides, nucleic acids, etc.) that modulate expression from such promoters.
- agents e.g., small molecules such as organic compounds having molecular weight less than about 1 kD, or less than about 1.5 or 2 kD; polypeptides, peptides, nucleic acids, etc.
- agents may, for example, inhibit or promote binding of a TF to the promoter.
- Methods of identifying modulators of ESTF bound or Polycomb group protein bound miRNA are thus aspects of the invention.
- Such agents may be designed or may be isolated by screening, e.g., compound libraries.
- Polymorphisms and mutations in promoter regions can alter expression of an operably linked nucleic acid and may be associated with disease.
- the invention provides methods of identifying polymorphisms or mutations associated with disease, e.g., in humans.
- Certain of the methods comprise analyzing sequences of the genomic regions in a plurality of individuals and determining whether particular sequence variant(s) are associated with disease, e.g., whether particular sequence variant(s) occur with greater frequency in individuals suffering from a disease than in control individuals, e.g., individuals not suffering from the disease (and typically matched for parameters such as age, etc.). Once such polymorphisms or mutations are identified, they may be used to provide diagnostic or prognostic information and/or in developing therapies for the disease.
- the methods are of use, in certain embodiments, to identify polymorphisms and/or mutations associated with cancer, e.g., cancer of the breast, prostate, kidney, lung, liver, gastrointestinal tract (e.g., colon) testis, stomach, pancreas, thyroid, brain or other nervous system tissue, connective tissue, skin, and/or hematopoietic system (e.g., leukemia, lymphoma).
- cancer e.g., cancer of the breast, prostate, kidney, lung, liver, gastrointestinal tract (e.g., colon) testis, stomach, pancreas, thyroid, brain or other nervous system tissue, connective tissue, skin, and/or hematopoietic system (e.g., leukemia, lymphoma).
- the disease is one that is associated with altered or aberrant or inappropriate differentiation or dedifferentiation or development.
- identification of the promoters of miRNA genes provides a means to explore the underlying basis for alterations in miRNA expression that may be associated with a condition
- the invention provides methods comprising modulating the expression or activity of a miRNA whose promoter is, in ES cells, bound by a key ES cell transcription factor.
- Modulatating the expression or activity of a miRNA can involve causing or facilitating a qualitative or quantitative change, alteration, or modification in the expression or activity of the miRNA. Such alteration may, for example, be an increase or decrease in miRNA level or activity within a cell.
- the invention provides cells wherein miRNA expression or activity has been modulated according to the inventive methods. The methods are of use, e.g., in reprogramming cells, e.g., in vitro.
- Such reprogramming could involve reprogramming differentiated cells to a pluripotent state, reprogramming cells from a first at least partly differentiated state to a second at least partly differentiated state, modulating, e.g., promoting or inhibiting differentiation of pluripotent cells to a partly or fully differentiated state (e.g., to a cell lineage or cell type of interest), etc.
- a variety of methods for modulating miRNA level or activity are known in the art. Any suitable method can be used in the present invention.
- Cells can be contacted in vitro with molecules that are taken up and modulate miRNA expression or activity or molecules can be administered to individuals.
- miRNA or miRNA precursors can be introduced into cells to increase miRNA level and result in an increase in miRNA-mediated inhibition of miRNA target gene expression.
- Nucleic acids that encode miRNA or miRNA precursors can be introduced into cells and stably or transiently expressed therein. miRNA can be inhibited by antisense-based approaches.
- an miRNA is inhibited by introducing an oligonucleotide (e.g., synthetic oligonucleotides, which may be chemically synthesized) that is complementary to the miRNA or miRNA precursor into a cell (e.g., in vitro) or administering such oligonucleotides to an organism.
- an oligonucleotide e.g., synthetic oligonucleotides, which may be chemically synthesized
- the oligonucleotide need not be perfectly complementary to the miRNA or miRNA precursor, e.g., it may have 1, 2, 3, 4, 5, or more mismatches and/or be at least 70%, at least 80%, or at least 90% complementary to the miRNA.
- the oligonucleotide is at least about 19 nt in length.
- the oligonucleotide is between about 17 nt and about 50 nt in length. It will be appreciated that such oligonucleotides may contain one or more non-standard nucleotides, modified nucleotides (e.g., having modified bases and/or sugars) or nucleotide analogs, and/or have a modified backbone and/or be attached or have attached thereto, one or more non-nucleic acid moieties. In some embodiments, the oligonucleotide has one or more modifications, e.g., to provide RNase protection and/or pharmacologic properties such as enhanced tissue and cellular uptake.
- the oligonucleotide differs from normal RNA by having partial or complete 2'-O-methylation of sugar, phosphorothioate backbone and/or a cholesterol- moiety at the 3'-end.
- USPTO Patent Applications 20080171715, 20070213292 and PCT publications WO/2006/137941 WO/2008/025025 disclose a variety of compounds and methods of use to modulate miRNA. Certain agents that modulate miRNA level or activity are available from a variety of commercial suppliers (e.g., Thermo Scientific (Dharmacon), Ambion, etc.).
- miRNA gene promoters that are, in ES cells, bound by at least one key ES cell transcription factor are also bound by such factor(s) in iPS cells.
- Results and regulatory circuitry derived with ES cells, and compositions and methods of the invention are applicable in the context of other pluripotent cells, e.g., iPS cells.
- Methods of the invention may be applied in the context of a variety of mammalian and avian species. Mammals of interest include rodents (e.g., mice, rats, rabbits), primates (e.g., human, monkeys, apes), domesticated animals such as caprine, ovine, bovine, porcine, canine, feline species.
- Methods of the invention may, as appropriate, be applied to somatic cells that are at least partly differentiated along a cell lineage of interest, In some embodiments, the methods are applied to terminally differentiated somatic cells.
- Mammalian somatic cells useful in various embodiments of the present invention include, for example, fibroblasts, neurons, glial cells, pancreatic islet cells, epidermal cells, epithelial cells, endothelial cells, hepatocytes, hair follicle cells, keratinocytes, hematopoietic cells, melanocytes, chondrocytes, lymphocytes (B and T lymphocytes), erythrocytes, macrophages, monocytes, mononuclear cells, cardiac muscle cells, skeletal muscle cells, etc., Sertoli cells, granulosa cells, and precursor cells that are committed or partly differentiated along cell lineage leading to any of the afore-mentioned cell types.
- adult stem cells are used.
- precursor cells such as neural precursor cells, hematopoietic precursor cells, or muscle precursor cells are used.
- methods of the invention can be practiced on primary cells, non-immortalized cells, immortalized cells, genetically modified cells, cells that are considered "wild type” or "normal", cells obtained from an individual suffering from a disease, etc.
- Certain methods of the invention are practiced on pluripotent cells, e.g., ES cells or iPS cells. Methods for generating such cells are known in the art.
- iPS cells can be generated by introducing genes encoding transcription factors Oct4, Sox2, c-Myc and Klf4 (with c-Myc being dispensable), or Oct4, Nanog, Sox2, and Lin28 into somatic cells, e.g., via retroviral infection.
- somatic cells e.g., via retroviral infection.
- transient transfection is used to introduce the reprogramming factors.
- the reprogramming factors are introduced using an approach that avoids the use of viruses as vectors.
- a non-integrating episomal vector may be used.
- a single multiprotein expression vector that comprises the coding sequences of two or more of the reprogramming factors (e.g., Klf4, Oct4 and Sox2) linked with 2A peptides is used.
- a recombinase-excisable virus (or non-virus expression cassette) is used. See, e.g., Soldner, F., et al. Cell. 136(5):964-77, 2009.
- molecules such as histone deacetylase inhibitors, methyltransferase inhibitors, Wnt pathway agonists, molecules that enhance expression of endogenous genes such as Oct4, Sox2, or molecules that can substitute for one or more reprogramming factors (e.g., Klf4), may be used.
- methods of the invention are performed in vivo.
- cells are obtained from an individual and subjected to a method of the invention ex vivo (outside the body).
- an agent that modulates expression or activity of an ESTF- bound miRNA is administered to an individual to treat a condition.
- the condition is cancer.
- an agent that modulates activity or expression of an miRNA whose promoter is bound by Polycomb and that is expressed specifically or selectively in one or more cell lineages or cell types is of use to treat a disease, e.g., a disease associated with altered or aberrant or inappropriate differentiation or dedifferentiation or development.
- the condition is cancer.
- the present invention further provides methods for treating a condition in an individual in need of treatment for a condition.
- somatic cell(s) are obtained and reprogrammed using a method of the invention, e.g., (i) an ESTF-bound miRNA or miRNA gene is modulated in reprogramming the cell to pluripotency; and/or (ii) the cell is reprogrammed to pluripotency and an ESTF-bound miRNA or miRNA gene is modulated in differentiating the resulting pluripotent cell to a desired cell type or linage; and/or (iii) an ESTF-bound miRNA or miRNA gene is modulated in differentiating the somatic cell to a desired cell type or linage without necessarily reprogramming the cell to pluripotency as an intermediate step.
- the invention further provides embodiments of the afore-mentioned methods applied to Polycomb group protein bound miRNA and miRNA genes.
- the invention further provides embodiments of the afore-mentione
- the reprogrammed cells may be expanded in culture.
- cells are obtained from the individual to whom they or their progeny are eventually administered after manipulation ex vivo.
- Pluripotent reprogrammed cells e.g., reprogrammed cells and/or their progeny that retain the property of pluripotency
- Pluripotent reprogrammed cells may be maintained under conditions suitable for the cells to develop into cells of a desired cell type or cell lineage.
- the cells are differentiated in vitro using protocols known in the art.
- the reprogrammed cells of a desired cell type are introduced into the individual to treat the condition.
- the somatic cells obtained from the individual contain a mutation in one or more genes.
- the somatic cells obtained from the individual are first treated to repair or compensate for the defect, e.g., by introducing one or more wild type copies of the gene(s) into the cells such that the resulting cells express the wild type version of the gene.
- the cells are then introduced into the individual.
- the somatic cells obtained from the individual are engineered to express one or more genes following their removal from the individual.
- the cells may be engineered by introducing a gene or expression cassette comprising a gene into the cells.
- the introduced gene may be one that is useful for purposes of identifying, selecting, and/or generating a reprogrammed cell.
- the introduced gene(s) contribute to initiating and/or maintaining the reprogrammed state or differentiating the cell to a desired cell type or lineage.
- the methods of the present invention can be used to treat, prevent, or stabilize a neurological disease such as Alzheimer's disease, Parkinson's disease, Huntington's disease, or ALS, lysosomal storage diseases, multiple sclerosis, or a spinal cord injury, diseases associated with muscle atrophy or dysfunction or damage.
- a neurological disease such as Alzheimer's disease, Parkinson's disease, Huntington's disease, or ALS, lysosomal storage diseases, multiple sclerosis, or a spinal cord injury, diseases associated with muscle atrophy or dysfunction or damage.
- human hematopoietic stem cells derived from cells reprogrammed according to the present invention may be used in medical treatments requiring bone marrow transplantation or replenishment of hematopoietic cells. Such cells are also of use to treat anemia, diseases that compromise the immune system such as AIDS, etc.
- somatic cells obtained from an individual suffering from a disease and reprogrammed and/or differentiated in vitro using a method of the invention are used as an in vitro model system to study the disease and/or to identify agents (e.g., small molecules) useful for treating the disease.
- agents e.g., small molecules
- small molecules can be screened to identify compounds that may be of use to treat a disease.
- cells used in an inventive method herein are usually descendants of the original cells obtained from a subject.
- Reprogrammed cells that produce a growth factor or hormone such as insulin, etc. may be administered to a mammal for the treatment or prevention of endocrine disorders.
- Reprogrammed epithelial cells may be administered to repair damage to the lining of a body cavity or organ, such as a lung, gut, exocrine gland, or urogenital tract. It is also contemplated that reprogrammed cells may be administered to a mammal to treat damage or deficiency of cells in an organ such as the bladder, brain, esophagus, fallopian tube, heart, intestines, gallbladder, kidney, liver, lung, ovaries, pancreas, prostate, spinal cord, spleen, stomach, testes, thymus, thyroid, trachea, ureter, urethra, or uterus.
- an organ such as the bladder, brain, esophagus, fallopian tube, heart, intestines, gallbladder, kidney, liver, lung, ovaries, pancreas, prostate, spinal cord, spleen, stomach, testes, thymus, thyroid, trachea, ureter,
- Cells may be combined with a matrix to form a tissue or organ in vitro or in vivo that may be used to repair or replace a tissue or organ in a recipient mammal.
- methods of the invention can be used to treat individuals in need of a functional organ.
- somatic cells are obtained from an individual in need of a functional organ, and reprogrammed using the methods of the invention to produce reprogrammed somatic cells.
- Such reprogrammed somatic cells are then cultured under conditions suitable for development of the reprogrammed somatic cells into a desired organ, which is then introduced into the individual.
- the invention also relates in part to methods of performing chromatin immunoprecipitation (ChIP) experiments and to improvements in such methods.
- ChIP chromatin immunoprecipitation
- the invention also relates in part to methods for analysis of data obtained from chromatin immunoprecipitation (ChIP) experiments and improvements in such methods.
- the methods comprise sequencing of genomic DNA bound by a transcription factor in pluripotent cells and/or analysis of data obtained from such sequencing.
- the methods allow for identification of sites to which a transcription factor binds to within a resolution of 25 base pairs (bp).
- the methods are of use to map binding sites of any TF of interest, e.g., using ChIP followed by sequencing (ChIP-Seq).
- such methods are of particular use in conjunction with high throughput DNA sequencing methods, e.g., methods that employ parallel sequencing of large numbers (e.g., millions) of fragments and/or obtaining large numbers of short reads (e.g., less than 100 bp in length).
- sequencing techniques can comprise sequencing by synthesis (e.g., using Solexa technology), sequencing by ligation (e.g., using SOLiD technology from Applied Biosystems), 454 technology, or pyrosequencing.
- thousands, tens of thousands or more sequencing reactions are performed in parallel, generating millions or even billions of bases of DNA sequence per "run". See, e.g., Shendure J & Ji H. Nat Biotechnol., 26(10): 1135-45, 2008, for a non-limiting discussion of some of these technologies. It will be appreciated that sequencing technologies are evolving and improving rapidly.
- MicroRNAs are crucial for normal embryonic stem (ES) cell self- renewal and cellular differentiation, but how miRNA gene expression is controlled by the key transcriptional regulators of ES cells has not been established.
- ES embryonic stem
- the key ES cell transcription factors are associated with promoters for most miRNAs that are preferentially expressed in ES cells and with promoters for a set of silent miRNA genes.
- This silent set of miRNA genes is co-occupied by Polycomb Group proteins in ES cells and expressed in a tissue-specific fashion in differentiated cells.
- Embryonic stem (ES) cells hold significant potential for clinical therapies because of their distinctive capacity to both self-renew and differentiate into a wide range of specialized cell types. Understanding the transcriptional regulatory circuitry of ES cells and early cellular differentiation is fundamental to understanding human development and realizing the therapeutic potential of these cells.
- MicroRNAs are also likely to play key roles in ES cell gene regulation ([Kanellopoulou et al., 2005], [Murchison et al., 2005] and [Wang et al., 2007J), but little is known about how miRNAs participate in the core regulatory circuitry controlling self-renewal and pluripotency in ES cells. [0069] Several lines of evidence indicate that miRNAs contribute to the control of early development.
- miRNAs appear to regulate the expression of a significant percentage of all genes in a wide array of mammalian cell types ([Lewis et al., 2005], [Lim et al., 2005], [Krek et al., 2005] and [Farh et al., 2005]).
- a subset of miRNAs is preferentially expressed in ES cells or embryonic tissue ([Houbaviy et al., 2003], [Suh et al., 2004], [Houbaviy et al., 2005] and [Mineno et al., 2006]).
- mice fail to develop (Bernstein et al., 2003), and ES cells deficient in miRNA-processing enzymes show defects in differentiation and proliferation ([Kanellopoulou et al., 2005], [Murchison et al., 2005] and [Wang et al., 2007]).
- Specific miRNAs have been shown to participate in mammalian cellular differentiation and embryonic development (Stefani and Slack, 2008). However, how transcription factors and miRNAs function together in the regulatory circuitry that controls early development has not yet been examined.
- miRNA genes have been sparse annotation of miRNA gene transcriptional start sites and promoter regions. Mature miRNAs, which specify posttranscriptional gene repression, arise from larger transcripts that are then processed (Bartel, 2004). Over 400 mature miRNAs have been confidently identified in the human genome (Landgraf et al., 2007), but only a minority of the primary transcripts have been identified and annotated. Prior attempts to connect ES cell transcriptional regulators to miRNA genes have searched for transcription factor binding sites only close to the annotated mature miRNA sequences ([Boyer et al., 2005], [Loh et al., 2006] and [Lee et al., 2006]).
- Example 1 High-resolution genome-wide location analysis in ES cells with ChIP-seq
- Oct4, Sox2, Nanog, and TcB were found to co-occupy 14,230 sites in the genome ( Figures IA, Sl, and S2 and Tables S1-S3 (Tables S1-S3 are available in Marson, et al., 2008)). Approximately one quarter of these occurred within 8 kb of the transcription start site of 3,289 annotated genes, another one quarter occurred within genes but more than 8 kb from the start site, and almost half occurred in intergenic regions distal from annotated start sites (Example 7).
- Binding of the four factors at sites surrounding the Sox2 gene ( Figure IB) exemplified two key features of the data: all four transcription factors co-occupied the identified binding sites and the resolution was sufficient to determine the DNA sequence associated with these binding events to a resolution of ⁇ 25 bp.
- Composite analysis of all bound regions provided higher resolution and suggested how these factors occupy their common DNA-sequence motif ( Figure S4, Table S4). Knowledge of these binding sites provided data necessary to map these key transcription factors to the promoters of miRNA genes.
- This set of miRNAs occupied by Oct4/Sox2/Nanog/Tcf3 represented roughly 20% of annotated mammalian miRNAs, similar to the 20% of protein-coding genes that were bound at their promoters by these key transcription factors (Table S2).
- Oct4 and Nanog are silenced as ES cells begin to differentiate ([Chambers and Smith, 2004] and [Niwa, 2007]). If Oct4/Sox2/Nanog/Tcf3 are required for activation or repression of its target miRNAs, the targets should be differentially expressed when ES cells are compared to a differentiated cell type.
- Solexa sequencing of 18- 30 nucleotide transcripts in ES cells, MEFs, and NPCs was performed to obtain quantitative information on the abundance of miRNAs in pluripotent cells relative to two differentiated cell types (Figure 4A and Table Sl). In each cell type examined, a small subset of miRNAs predominated, with pronounced changes in miRNA abundance observed among the cell types (Example 7).
- Oct4/Sox2/Nanog/Tc ⁇ -occupied miRNAs were, in general, preferentially expressed in embryonic stem cells (Figure 4C). Whereas most miRNAs are unchanged in expression in ES cells relative to MEFs or NPCs, a significant portion of Oct4/Sox2/Nanog/Tcf3 -occupied miRNAs are 100-fold more abundant in ES cells than in MEFs (p ⁇ 5 x 10 " 15 ), and 1000-fold more abundant in ES cells than in NPCs (p ⁇ 5 x 10 ⁇ 9 ).
- tissue-specific expression pattern of miRNAs repressed by Polycomb in ES cells is consistent with these miRNAs serving as determinants of cell-fate decisions in a manner analogous to the developmental regulators whose genes are repressed by Polycomb in ES cells ([Lee et al, 2006], [Bernstein et al., 2006] and [Boyer et al., 2006]). Such a function in cell-fate determination would require that these miRNAs remain silenced in pluripotent ES cells.
- this second group of miRNAs were co-occupied by Polycomb group proteins, which are also known to silence key lineage-specific, protein-coding developmental regulators.
- Polycomb group proteins which are also known to silence key lineage-specific, protein-coding developmental regulators.
- miRNA polycistrons which encode the most abundant miRNAs in ES cells and which are silenced during early cellular differentiation ([Houbaviy et al., 2003], [Houbaviy et al., 2005] and [Suh et al., 2004]), were occupied at their promoters by Oct4, Sox2, Nanog, and Tc ⁇ .
- the most abundant in murine ES cells was the mir-290-295 cluster, which contains multiple mature miRNAs with seed sequences similar or identical to those of the miRNAs in the mir-302 cluster and the mir-17-92 cluster. miRNAs with the same seed sequence also predominate in human embryonic stem cells (Laurent et al., 2008).
- miRNAs in this family have been implicated in cell proliferation ([O'Donnell et al., 2005], [He et al., 2005] and [Voorhoeve et al., 2006]), consistent with the impaired self-renewal phenotype observed in miRNA-deficient ES cells ([Kanellopoulou et al., 2005], [Murchison et al., 2005] and [Wang et al., 2007]).
- miRNAs contribute to the rapid degradation of maternal transcripts in early zygotic development (Giraldez et al., 2006), and mRNA expression data suggest that this miRNA family also promotes the clearance of transcripts in early mammalian development (Farh et al., 2005). [0088] In addition to promoting the rapid clearance of transcripts as cells transition from one state to another during development, miRNAs also likely contribute to the control of cell identity by fine-tuning the expression of genes.
- miR-430 the zebrafish homolog of the mammalian mir-290-295 family, serves to precisely tune the levels of Nodal antagonists Lefty 1 and Lefty 2 relative to Nodal, a subtle modulation of protein levels that has pronounced effects on embryonic development (Choi et al., 2007). Recently, a list of 250 murine ES cell mRNAs that appear to be under the control of miRNAs in the miR-290-295 cluster was reported (Sinkkonen et al., 2008). This study reports that Lefty 1 and Lefty2 are evolutionarily conserved targets of the miR-290-295 miRNA family.
- miRNAs also maintain the expression of de novo DNA methyltransferases 3a and 3b (Dnmt3a and Dnmt3b), perhaps by dampening the expression of the transcriptional repressor Rbl2, helping to poise ES cells for efficient methylation of Oct4 and other pluripotency genes during differentiation.
- core ES cell transcription factors appear to promote the active expression of Lefty 1 and Lefty2 but also fine-tune the expression of these important signaling proteins by activating a family of miRNAs that target the Lefty 1 and Lefty2 3'UTRs.
- This network motif whereby a regulator exerts both positive and negative effects on its target termed "incoherent feed-forward" regulation (Alon, 2007), provides a mechanism to fine-tune the steady-state level or kinetics of a target's activation (Figure 6A).
- V6.5 (C57BL/6-129) murine ES cells were grown under typical ES cell conditions (see Example 7) on irradiated MEFs. For location analysis, cells were grown for one passage off of MEFs on gelatinized tissue-culture plates. NPCs derived from V6.5 ES cells and MEFs prepared and cultured from DR-4 strain mice were grown using standard protocols as previously described (see Example 7). ZHBTc4 cells harboring a doxycycline- repressible Oct4 allele (Ni wa et al., 2000), a gift from A. Smith, were cultured under standard ES cell conditions on gelatin. Cultures were treated with 2 ⁇ g/ml doxycycline
- Purified immunoprecipitated DNA was prepared for sequencing according to a modified version of the Solexa Genomic DNA protocol. Fragmented DNA was end repaired and subjected to 18 cycles of linker-mediated (LM)-PCR using oligos purchased from
- Real-time PCR primers were designed using the standard specifications of PrimerExpress (Applied Biosystems) to amplify regions within the . ⁇ 200 nt immediately upstream of the tested miRNA hairpins or in the middle o ⁇ mir -290-295 polycistron but outside of any miRNA hairpin regions (Example 7 and Figure S8B). Primers were used in SYBR Green quantitative PCR assays on the Applied Biosystems 7500 Real Time PCR system. Expression levels were calculated relative to Gapdh mRNA levels, which were quantified in parallel by Taqman analysis. Detailed methods and primer sequences can be found in Example 7. [00109] References
- Ben-Porath et al. 2008 I. Ben-Porath, M.W. Thomson, VJ. Carey, R. Ge, G.W. Bell, A. Regev and R.A. Weinberg, An embryonic stem cell-like gene expression signature in poorly differentiated aggressive human tumors, Nat. Genet. 40 (2008), pp. 499-507. Bernstein et al.. 2006 B.E. Bernstein, T.S. Mikkelsen, X. Xie, M. Kamal, D.J. Huebert, J. Cuff, B. Fry, A. Meissner, M. Wernig and K. Plath et al, A bivalent chromatin structure marks key developmental genes in embryonic stem cells, Cell 125 (2006), pp. 315-326.
- Chambers and Smith 2004 I. Chambers and A. Smith, Self-renewal of teratocarcinoma and embryonic stem cells, Oncogene 23 (2004), pp. 7150-7160.
- Giraldez et al. 2006 A.J. Giraldez, Y. Mishima, J. Rihel, R.J. Grocock, S. Van Dongen, K.
- Robertson et al. 2007 G. Robertson, M. Hirst, M. Bainbridge, M. Bilenky, Y. Zhao, T. Zeng,
- Module map of stem cell genes guides creation of epithelial cancer stem cells, Cell Stem Cell 2
- Tcf3 functions as a steady state limiter of transcriptional programs of mouse embryonic stem cell self renewal, Stem Cells. (2008)
- Example 7 Additional Experimental Procedures, Results, and Discussion [00110] Contents of Example 7
- Table S9 miRNA expression in murine ES, neural precursors, embryonic fibroblasts and Oet4->repressible ZHBTc4 cells [00140] Table SlO Regions enriched for Suzl2 in mouse ES cells [00141] Table SI l miRNA microarray expression data
- Top track for each data set illustrates the normalized number of reads assigned to each 25bp bin. Bars in the second track identify regions of the genome enriched at p ⁇ 10-9. mES_chromatinjChIPseq.mm8.WIG.gz - ChIP-seq data for H3K4me3, H3K79me2, H3K36me3 and Suzl2 in mES cells. Top track for each data set illustrates the normalized number of reads assigned to each 25bp bin. Bars in the second track identify regions of the genome enriched at p ⁇ 10 "9 .
- Human embryonic stem (ES) cells were obtained from WiCeIl (Madison, WI; NIH Code WA09) and grown as described. Cell culture conditions and harvesting have been described previously (Boyer et al., 2005; Lee et al., 2006; Guenther et al., 2007). Quality control for the H9 cells included immunohistochemical analysis of pluripotency markers, alkaline phosphatase activity, teratoma formation, and formation of embryoid bodies and has been previously published as supplemental material (Boyer et al., 2005; Lee et al., 2006).
- V6.5 (C57BL/6-129) murine ES cells were grown under typical ES cell culture conditions on irradiated mouse embryonic fibroblasts (MEFs) as previously described (Boyer et al., 2006). Briefly, cells were grown on gelatinized tissue culture plates in Dulbecco's modified Eagle medium supplemented with 15% fetal bovine serum (characterized from Hyclone), 1000 U/ml leukemia inhibitory factor (LIF, Chemicon; ESGRO ESGl 106), nonessential amino acids, L-glutamine, Penicillin/Streptomycin and ⁇ -mercaptoethanol.
- LIF leukemia inhibitory factor
- Oct4-bound genomic DNA was enriched from whole cell lysate using an epitope specific goat polyclonal antibody purchased from Santa Cruz (sc-8628) and compared to a reference whole cell extract (Boyer et al., 2005). Regions occupied with high confidence for this antibody identified by ChIP-seq in mES cells are listed in Table S3 and by ChIP-chip on genome-wide tiling arrays in hES cells are on Table S8. Oct4 ChIP-seq data can be visualized on the UCSC browser by uploading supplemental file: mES_regulator_ChIPseq.mm8.WIG.gz
- Sox2-bound genomic DNA was enriched from whole cell lysate using an affinity purified goat polyclonal antibody purchased from R&D Systems (AF2018) and compared to a reference whole cell extract (Boyer et al., 2005). Regions occupied with high confidence for this antibody identified by ChIP-seq in mES cells are listed in Table S3.
- Sox2 ChIP-seq data can be visualized on the UCSC browser by uploading supplemental file: mES_regulator_ChIPseq.mm8.WIG.gz
- Nanog-bound genomic DNA was enriched from whole cell lysate using an affinity purified rabbit polyclonal antibody purchased from Bethyl Labs (bl 1662) and compared to a reference whole cell extract (Boyer et al., 2005). Regions bound with high confidence for this antibody are listed in Table S3.
- Nanog ChIP-seq data can be visualized on the UCSC browser by uploading supplemental file: mES_regulator_ChIPseq.mm8.WIG.gz
- Tc ⁇ -bound genomic DNA was enriched from whole cell lysate using an epitope specific goat polyclonal antibody purchased from Santa Cruz (sc-8635) and compared to a reference whole cell extract (Cole et al., 2008). Regions occupied with high confidence for this antibody identified by ChIP-seq in mES cells are listed in Table S3. TcO ChIP-seq data can be visualized on the UCSC browser by uploading supplemental file: mES_regulator_ChIPseq.mm8.WIG.gz
- Suzl2-bound genomic DNA was enriched from whole cell lysate using an affinity purified rabbit polyclonal antibody purchased from Abeam (AB 12073) and compared to a reference whole cell extract (Lee et al., 2006). Regions bound with high confidence for this antibody are listed in Table SlO.
- Suzl2 ChIP-seq data can be visualized on the UCSC browser by uploading supplemental file mES_chomatin_ChIPseq.mm8.WIG.gz
- H3K4me3 -modified nucleosomes were enriched from whole cell lysate using an epitope-specific rabbit polyclonal antibody purchased from Abeam (AB8580) (Santos-Rosa et al., 2002; Guenther et al., 2007). Samples were analyzed using ChIP-seq. Comparison of this data with ChIP-seq published previously (Mikkelsen et al., 2007) showed near identify in profile and bound regions (Table S5).
- H3K4me3 ChIP-seq data can be visualized on the UCSC browser by uploading supplemental file: mES chomatin_ChIPseq.mm8.WIG.gz
- H3K79me2-modified nucleosomes were isolated from mES whole cell lysate using Abeam antibody AB3594 (Guenther et al., 2007). Chromatin immunoprecipitations against H3K36me3 were compared to reference WCE DNA obtained from mES cells. Samples were analyzed using ChIP-seq and were used for visual validation of predicted miRNA promoter association with mature miRNA sequences only ( Figure 2).
- H3K79me2 ChIP-seq data can be visualized on the UCSC browser by uploading supplemental file: mES_chomatin_ChIPseq.mm8.WIG.gz
- H3K36me3 -modified nucleosomes were isolated from mES whole cell lysate using rabbit polyclonal antibody purchased from Abeam (AB9050) (Guenther et al., 2007). Chromatin immunoprecipitations against H3K36me3 were compared to reference WCE DNA obtained from mES cells. Samples were analyzed using ChIP-seq and were used for visual validation of predicted miRNA promoter association with mature miRNA sequences only ( Figure 2).
- H3K36me3 ChIP-seq data can be visualized on the UCSC browser by uploading supplemental file; mES_ehomatin_Ch ⁇ Pseq.mm8.WIG.gz [00158] Chromatin Immunoprecipitation
- Protocols describing all materials and methods have been previously described (Lee et al. 2007) and can be downloaded from http://web.wi. mit.edu/young/hESJPRC. [00160] Briefly, we performed independent immunoprecipitations for each analysis.
- Embryonic stem cells were grown to a final count of 5x10 - 1x10 cells for each location analysis experiment. Cells were chemically crosslinked by the addition of one-tenth volume of fresh 11 % formaldehyde solution for 15 minutes at room temperature. Cells were rinsed twice with IxPBS and harvested using a silicon scraper and flash frozen in liquid nitrogen. Cells were stored at -80 C prior to use.
- Immunoprecipitated (ChIP) DNA was prepared for sequencing according to a modified version of the Illumina/Solexa Genomic DNA protocol. Fragmented DNA was prepared for ligation of Solexa linkers by repairing the ends and adding a single adenine nucleotide overhang to allow for directional ligation. A 1 : 100 dilution of the Adaptor Oligo Mix (Illumina) was used in the ligation step. A subsequent PCR step with limited (18) amplification cycles added additional linker sequence to the fragments to prepare them for annealing to the Genome Analyzer flow-cell.
- a narrow range of fragment sizes was selected by separation on a 2% agarose gel and excision of a band between 150- 300 bp (representing shear fragments between 50 and 200nt in length and ⁇ 100bp of primer sequence).
- the DNA was purified from the agarose and diluted to 10 nM for loading on the flow cell.
- the DNA library (2-4 pM) was applied to the flow-cell (8 samples per flow-cell) using the Cluster Station device from Illumina.
- the concentration of library applied to the flow-cell was calibrated such that polonies generated in the bridge amplification step originate from single strands of DNA.
- Multiple rounds of amplification reagents were flowed across the cell in the bridge amplification step to generate polonies of approximately 1 ,000 strands in 1 ⁇ m diameter spots. Double stranded polonies were visually checked for density and morphology by staining with a 1 :5000 dilution of SYBR Green I (Invitrogen) and visualizing with a microscope under fluorescent illumination. Validated flow-cells were stored at 4 C until sequencing. [00169] Sequencing
- ChIP-seq requires significantly less uniqueness to map reads to the genome and so should be able to detect binding across a much larger fraction of the genome ( ⁇ 70% as reported in Mikklesen et al., 2007).
- Agilent probe density across the ChIP-seq enriched regions we found a broad range of probe densities, with almost half of all high-confidence targets in regions with less then 3 probes per kb ( Figure S3b). While the portions of the genome tiled at > 3 probes per kb had strong overlaps, enriched regions of the genome with lower probe densities were much more difficult to identify by ChIP-chip. [00182]
- DNA motif discovery was performed on the genomic regions that were enriched at high-confidence by anti-Oct4 chromatin immunoprecipitation. In order to obtain maximum resolution, a modified version of the ChIP-seq read mapping algorithm was used. Genomic bins were reduced in size from 25 bp to 10 bp. Furthermore, a read extension that placed greater weight towards the middle of the 200 bp extension was used. This model placed 1/3 count in the 8 bins from 0-40 and 160200 bp, 2/3 counts in the 8 bins from 40-80 and 120-160 bp and 1 count in the 4 bins from 80-120 bp.
- MEME uses the individual nucleotide frequencies within input sequences to model expected motif frequencies. This simple model might result discovery of motifs which are enriched because of non-random di-, tri-, etc. nucleotide frequencies. Consequently, three different sets of control sequences of identical length were used to ensure the specificity of the motif discovery results. First, the sequences immediately flanking each input sequence were used as control sequences. Second, randomly selected sequences having the same distribution of distances from transcription start sites as the Oct4 input sequences were used as control sequences. Third, sequences from completely random genomic regions were used as control sequences. Each of these sets of control sequences were also examined using MEME.
- the motif discovered from actual Oct4 bound sequences was not identified in the control sequences.
- the motif discovery process was repeated using different numbers and lengths of sequences, but the same motif was discovered for a wide array of input sequences.
- motif discovery was repeated with the top 500 Sox2, Nanog, and Tcf3 binding peaks, the same motif was identified.
- the motif occurs within 100 bp of the peak of ChIP-seq density at more than 90% of the top regions enriched in each experiment, while occuring in the same span at 24-28% of control regions and within 25 bp of the ChIP- seq peak at more than 80% of regions versus 9-1 1% of control regions.
- H3K4me3 enriched sites from as many sources as possible as a collection of promoters.
- H3K4me3 sites were identified in ES cells (H9), hepatocytes, a pro-B cell line (REH cells) (Guenther et al., 2007) and T cells (Barski et al., 2007).
- Mouse H3K4me3 sites were identified from ES cells (V6.5), neural precursors, and embryonic fibroblasts (Mikkelsen et al., 2007).
- a scoring system was derived empirically to select the most likely start sites for each miRNA. Each possible site was given a bonus if it was either the start of a known transcript that spanned the miRNA or of an EST that spanned the miRNA. Scores were reduced if the H3K4me3 enriched region was assignable instead to a transcript or EST that did not overlap the miRNA. Additional positive scores were given to enriched sites within 5kb of the miRNA, while additional negative scores were given based on the number of intervening H3K4me3 sites between the test region and the miRNA.
- each enriched region was tested for conservation between human and mouse using the UCSC liftover program (Hinrichs et al., 2006). If two test regions overlapped, they were considered to be conserved (21%). In the cases where human and mouse disagreed on the quality of a site, if the site had an EST or gene overlapping the miRNA, that site was given a high score in both species. Alternatively, if one species had a non-overlapping site, that site was considered to be an unlikely promoter in both species. Finally, for miRNAs where a likely promoter was identified in only one species, we manually checked the homologous region of the other genome to search for regions enriched for H3K4me3 -modified nucleosomes that may have fallen below the high-confidence threshold.
- Predicted miRNA genes can be visualised on the UCSC browser by uploading the supplemental files:mouse_miRNA_track.mm8.bed and humanjniRNA_track.hg 17. bed
- This region is notable in that, while it excludes the largest peaks of Oct4/Sox2/Nanog/Tcf3, it does contain a smaller (yet significantly enriched) region located over the promoter (small peak at the promoter in Figure 3a).
- This promoter proximal construct showed 5- 10x higher maximal expression in ES cells relative to more differentiated cells. Expression of this construct was dependent on a small portion of the construct that included the TATAA box and a proximal site of Oct4/Sox2/Nanog/Tcf3 occupancy.
- Immunoprecipitated DNA and whole cell extract DNA were purified by treatment with RNAse A, proteinase K and multiple phenol: chloroform :isoamyl alcohol extractions. Purified DNA was blunted and ligated to linker and amplified using a two-stage PCR protocol. Amplified DNA was labeled and purified using Bioprime random primer labeling kits (Invitrogen): immunoenriched DNA was labeled with Cy5 fluorophore, whole cell extract DNA was labelled with Cy3 fluorophore.
- the human promoter array was purchased from Agilent Technology
- the array consists of 1 15 slides each containing -44,000 60mer oligos designed to cover the non-repeat portion of the human genome. The design of these arrays are discussed in detail elsewhere (Lee et al., 2006).
- Agilent controls is a set of negative control spots that contain 60-mer sequences that do not cross-hybridize to human genomic DNA. We calculated the median intensity of these negative control spots in each channel and then subtracted this number from the intensities of all other features.
- Cy3-enriched DNA channel was then divided by the median of the control oligonucleotides from the Cy5-enriched DNA channel. This yielded a normalization factor that was applied to each intensity in the Cy5 DNA channel.
- This error model functions by converting the intensity information in both channels to an X score which is dependent on both the absolute value of intensities and background noise in each channel using an f-score calculated as described (Boyer et al., 2005) for promoter regions or using a score of 0.3 for tiled arrays.
- f-score calculated as described (Boyer et al., 2005) for promoter regions or using a score of 0.3 for tiled arrays.
- IPxontrol ratios below 1 represent noise (as the immunoprecipitation should only result in enrichment of specific signals) and the distribution of noise among ratios above 1 is the reflection of the distribution of noise among ratios below 1.
- Candidate bound probe sets were required to pass one of two additional filters: two of the three probes in a probe set must each have single probe p-values ⁇ 0.005 or the centre probe in the probe set has a single probe p-value ⁇ 0.001 and one of the flanking probes has a single point p-value ⁇ 0.1. These two filters cover situations where a binding event occurs midway between two probes and each weakly detects the event or where a binding event occurs very close to one probe and is very weakly detected by a neighboring probe. Individual probe sets that passed these criteria and were spaced closely together were collapsed into bound regions if the center probes of the probe sets were within 1000 bp of each other.
- Enriched regions were compared relative to transcript start and stop coordinates of known genes compiled from four different databases: RefSeq (Pruitt et al., 2005), Mammalian Gene Collection (MGC) (Gerhard et al., 2004), Ensembl (Hubbard et al., 2005), and University of California Santa Cruz (UCSC) Known Genes (genome.ucsc.edu) (Kent et al., 2002). All human coordinate information was downloaded in January 2005 from the UCSC Genome Browser (hgl7, NCBI build 35). Mouse data was downloaded in June of 2007 (mm8, NCBI build 36).
- miRNAs start sites two separate windows were used to evaluate overlaps. For chromatin marks and non-sequence specific proteins, miRNA promoters were considered bound if they were within lkb of an enriched sequence. For sequence specific factors such as Oct4, we used a more relaxed region of 8kb surrounding the promoter, consistent with previous work we have published (Boyer et al., 2005). A full list of the high confidence start sites bound to promoters can be found in Tables S6 and S7.
- ES cells were differentiated along the neural lineage using standard protocols.
- V6.5 ES cells were differentiated into neural progenitor cells (NPCs) through embryoid body formation for 4 days and selection in ITSFn media for 5-7 days, and maintained in FGF2 and EGF2 (R&D Systems) (Okabe et al., 1996).
- NPCs neural progenitor cells
- Mouse embryonic fibroblasts were prepared from DR-4 strain mice as previously described (Tucker et al., 1997). Cells were cultured in Dulbecco's modified Eagle medium supplemented with 10% cosmic calf serum, ⁇ -mercaptoethanol, nonessential amino acids, L- glutamine and penniclin/streptomycin.
- Murine induced pluripotent stem cells iPS were generated as described in Wernig et al., 2007. iPS cells were cultured under the same conditions as mES cells.
- RT-PCR (Superscript II, Invitrogen) was performed with 5' primer (CAAGCAGAAGACGGCATA) (SEQ ID NO: 3). Splicing of overlapping ends PCR (SOEPCR) was performed (Phusion, NEB) with 5' primer and 3' PCR primer (AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGA) (SEQ ID NO: 4), generating cDNA with extended 3' adaptor sequence.
- PCR product (40 ⁇ l) was denatured (85 0 C, 10 min, formamide loading dye), and the differently sized strands were purified on a 90% formamide, 8% acrylamide gel, yielding single-stranded DNA suitable Solexa sequencing.
- the DNA library (2-4 pM) was applied to the flow-cell (8 samples per flow-cell) using the Cluster Station device from Ulumina.
- the concentration of library applied to the flow-cell was calibrated such that polonies generated in the bridge amplification step originate from single strands of DNA.
- Multiple rounds of amplification reagents were flowed across the cell in the bridge amplification step to generate polonies of approximately 1,000 strands in 1 ⁇ m diameter spots. Double stranded polonies were visually checked for density and morphology by staining with a 1 :5000 dilution of SYBR Green I (Invitrogen) and visualizing with a microscope under fluorescent illumination. Validated flow-cells were stored at 4 C until sequencing. [00233] Sequencing and Analysis
- Flow-cells were removed from storage and subjected to linearization and annealing of sequencing primer on the Cluster Station. Primed flow-cells were loaded into the Illumina Genome Analyzer 1 G. After the first base was incorporated in the Sequencing- by-Synthesis reaction the process was paused for a key quality control checkpoint. A small section of each lane was imaged and the average intensity value for all four bases was compared to minimum thresholds. Flow-cells with low first base intensities were re-primed and if signal was not recovered the flow-cell was aborted. Flow-cells with signal intensities meeting the minimum thresholds were resumed and sequenced for 36 cycles.
- RNA from murine embryonic stem cells (mES, V6.5), mouse embryonic fibroblasts (MEFs) and murine induced pluripotent (iPS) cells was extracted with RNeasy (Qiagen) reagents.
- 5 ⁇ g total RNA from treated and control samples were labeled with Hy3TM and Hy5TM fluorescent label, using the miRCURYTM LNA Array labeling kit (Exiqon, Denmark) following the procedure described by the manufacturer.
- the labeled samples were mixed pair-wise and hybridized to the miRNA arrays printed using miRCURYTM LNA oligoset version 8.1 (Exiqon, Denmark). Each miRNA was printed in duplicate, on codelink slides (GE), using GeneMachines Omnigrid 100.
- the hybridization was performed at 6OC overnight using the Agilent Hybridization system - SurHyb, after which the slides were washed using the miRCURYTM LNA washing buffer kit (Exiqon, Denmark) following the procedure described by the manufacturer. The slides were then scanned using Axon 4000B scanner and the image analysis was performed using Genepix Pro 6.0.
- Threshold z-value to remove outliers 10, 000
- Oct4 is a critical regulator of ES cell pluripotency and disruption of Oct4 leads to rapid differentiation of the ES cells (Niwa et al., 2000).
- a doxycyline regulated promoter Niwa et al., 2000 and Figure S7a.
- Oct4 mRNA is rapidly lost in these cells upon doxycycline induction ( Figure S7b).
- TcD is a terminal component of the canonical Wnt pathway in ES cell has been integrated into the core circuitry regulating ES cells. Recent reports have indicated that TcO depletion causes impaired differentiation in ES cells and upregulation of pluripotency genes, including Oct4, Sox2 and Nanog (Cole et al., Genes and Dev 2008; Tarn et al., Stem Cells 2008; Yi et al., Stem Cells 2008). Genes encoding several key pluripotency factors were observed to increase in expression, albeit only mildly, but other genes decreased in expression or remained expressed at the same level. The different regulatory effects at different target genes may depend on the proteins associated with Tc ⁇ at the each promoter.
- TcO knockdown experiments were performed essentially as in Cole et al., 2008 with minor modifications.
- Lentivirus was produced according to Open Biosystems Trans- lentiviral shRNA Packaging System (TLP4614).
- the shRNA constructs targeting murine Tc ⁇ were designed using an siRNA rules-based algorithm consisting of sequence, specificity, and position scoring for optimal hairpins that consist of a 21 -base stem and a 6- base loop (RMM4534-NM-009332).
- a knockdown control virus targeting EGFP was produced from vector obtained from the RNAi Consortium. V6.5 mES cells were plated at -30% confluence on the day of infection.
- any targets of the miRNA cluster that are also occupied by the 4 factors would represent feed forward targets.
- promoters for 64 are occupied by Oct4/Sox2/Nanog/Tcf3. This is approximately 50% more interactions
- Oct4/Sox2/Nanog/TcG only 5 are occupied by domains of Suzl2 binding >500bp (larger region sizes have been correlated with gene silencing, Lee et al., 2006). This may be because
- PcG bound proteins are not functional targets of mir-290-295 in mES cells. Alternatively these proteins are not expressed in ES cells following Dicer deletion and are thus excluded from the target list (Sinkkonen et al., 2008), but may be targets at other stages of development. In the later case, the miRNAs may serve as a redundant silencing mechanism, along with Polycomb group complexes, to help prevent even low levels of expression of the developmental regulators in ES cells.
- Transactivation of miR-34a by p53 broadly influences gene expression and promotes apoptosis. MoI Cell 26, 745-752.
- TcG is an integral component of the core regulatory circuitry of embryonic stem cells. Genes Dev 22, 746-755.
- MicroRNA-34b and MicroRNA-34c are targets of p53 and cooperate in control of cell proliferation and adhesion-independent growth. Cancer Res 67, 8433-8438.
- c-Myc-regulated microRNAs modulate E2F1 expression. Nature 435, 839-843.
- RNA Carbodiimide mediated cross-linking of RNA to nylon membranes improves the detection of siRNA, miRNA and piRNA by northern blot. Nucleic Acids Res 35, e60 (2007).
- RefSeq a curated non-redundant sequence database of genomes, transcripts and proteins.
- the invention includes embodiments that relate analogously to any intervening value or range defined by any two values in the series, and that the lowest value may be taken as a minimum and the greatest value may be taken as a maximum.
- Numerical values include values expressed as percentages. For any embodiment of the invention in which a numerical value is prefaced by "about” or ''approximately”, the invention includes an embodiment in which the exact value is recited. For any embodiment of the invention in which a numerical value is not prefaced by "about” or “approximately”, the invention includes an embodiment in which the value is prefaced by "about” or “approximately”.
- miRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3 non- overlap gene
- miRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3
- miRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3 chr ⁇ 97213431 chr ⁇ 97212625- mmu-m ⁇ r-138-2 97213519 (+) 10 97214300 ⁇ 5kb non- chr ⁇ 10819749 overlap
- miRNA H3K4m Interven- GENE/ Proxi- ConH3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3 non- overlap chr10 4282386 gene 53
- miRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3 chr11 7788919
- miRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3 chr11 1198308 61-119830936 chr11 1198629 mmu-m ⁇ r-338 (-) 20 40-119863140 ENW CpG GENIC K27me3 chr12 1091064
- miRNA H3K4m interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST ma! served Oct4 Sox2 Nanog Tcf3 Suz12 me3 chr12 1101721
- ChM46058613 ChM 4 6060279 EST BI mmu-m ⁇ r-15a 8-60586216 (-) 15 2-60602992 FNEW CpG 0 696529 Cons
- miRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3
- miRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3
- ChM 7 1753072 EST BU mmu-let-7e (+) 20 6-17533550
- miRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3 chr17 5587671 non-
- ChM 168839603 ChM 16888162 hsa-m ⁇ r-214 168839685 (-) 0 8-168882427 TB - 0 ND hsa-m ⁇ r-199a- ChM 168845341 chM 16888162
- ChM 216679897 ChM 21683290 gene RAB3- hsa-m ⁇ r-194-1 216679974 (-) -10 5-216835678 ELBT CpG 0 GAP 150 K27 chr2 32668885- chr2 32493280- hsa-m ⁇ r-558 32668959 (+) 20 32493480 TELB CpG 0 GENIC chr2 47516470- chr2 47508001- hsa-m ⁇ r-559 47516552 (+) 20 47508201 EL CpG 0 GENIC K>
- miRNA H3K4 CpG Interven- Proxi- Con- H3K27 name position score TSS position me3 Island ing Sites GENE/EST mal served Oct4 Suz12 me3 chr2 219691865 ch r2 21969246 no ⁇ -overiap hsa-m ⁇ r-375 219691941 (-) 7-219693466 TLE CpG 0 EST ⁇ 5kb Suz12 K27 Ch r2 219984343 chr2 21998486 hsa-m ⁇ r-153-1 219984421 (-) 10 7-219985266 T CpG 0 ⁇ 5kb Chr2 219984343 chr2 21999954 hsa-mir-153-1 219984421 (-) 18 8-219999748 E CpG 2 GENIC K27 chil?
- miFiNA H3K4 CpG Interven- ProxiConH3K27 name position SsCcOolre TSS position me3 island ing Sites GENE/EST mal served Oct4 Suz12 me3
- miRNA H3K4 CpG Interven- Proxi- Con- H3K27 name position score TSS position me3 lsland ing Sites GENE/EST mal served Oct4 Suz12 me3 chr6 72143378- chr6 72262454- hsa-m ⁇ r-30c-2 72143459 (-) -1 72262653 T - 1 ND
- miRNA H3K4 CpG IntervenProxi- Con- H3K27 name position score TSS position me3 Island ing Sites GENE/EST mai served Oct4 Suz12 me3
- miRNA H3K4 CpG intervenProxiConH3K27 name position score TSS position me3 Island ing Sites GENE/EST mal served Oct4 Suz12 me3 non-overlap gene POLR3D,
- miRNA H3K4 CpG Interven- Proxi- Con- H3K27 name position score TSS position me3 Island ing Sites GENE/EST mal served Oct4 Suz12 me3 chr9 20706109- chr ⁇ 20674113- EST BP231045 hsa-m ⁇ r-491 20706184 (+) 15 20674313 ELT CpG 0 , Cons GENIC ⁇ 5kb Cons ND chr ⁇ 21502109- chr ⁇ 21549673- hsa-m ⁇ r-31 21502187 (-) 10 21549873 ETL CpG 0 EST DA246725 ND
- miRNA H3K4 CpG Interven- Proxi- Con- H3K27 name position score TSS position me3 Island ing Sites GENE/EST mal served Oct4 Suz12 me3
- ChM 2 52714007 chM2 5271132 hsa-m ⁇ r-615 52714092 (+) 9 1-52711758 E CpG 1 ⁇ 5kb Suz12 K27
- miRNA H3K4 CpG int ⁇ rven- Proxi- Con- H3K27 name position score TSS position rne3 lsland ing Sites GENE/EST mal served Oct4 Suz12 me3 chr13 49521111 chr134955404 hsa-m ⁇ r-16-1 49521195 (-) 25 0-49554240 BTEL CpG 0 GENIC ⁇ 5kb Cons - K27
- ChM 4 1004409 hsa-mir-380 ( + ) 0 72-100441559 E - 0 chr14 10056182
- miRNA H3K4 CpG Interven- Proxi- Con- H3K27 name position score TSS position me3 Island ing Sites GENE/EST mal served Oct4 Suz12 me3 chr14 10056287 6-100562955 chr14 1004409 hsa-m ⁇ r-329-1 (+) 0 72-100441559 E - 0 chr14 10056319
- miRNA H3K4 CpG interven- Proxi- Con- H3K27 name position score TSS position me3 Island ing Sites GENE/EST mal served Oct4 Suz12 me3 chr 14 10058200 7-100582089 chr14 1004409 hsa-m ⁇ r-381 (+) 0 72-100441559 E - 0 chr 14 10058254
- miRNA H3K4 CpG interven- Proxi- Con- H3K27 name position score TSS position me3 Island ing Sites GENE/EST mal served Oct4 Suz12 me3 chii 4 10059584 9-100595926 ch r14 1004409 hsa-m ⁇ r-154 (+) 0 72-100441559 E - 0 chii 4 10059667
- ChM 4 1004409 hsa-m ⁇ r-377 ( + ) 0 72-100441559 E - 0 chr 14 10060058
- miRNA H3K4 CpG IntervenProxi- Con- H3K27 name position score TSS position me3 Island ing Sites GENE/EST mal serv ⁇ Suz12 me3
- miRNA H3K4 CpG IntervenProxiConH3K27 name position score TSS position me3 Island ing Sites GENE/EST mal sent me3
- ChM 7 76714272 ChM 7 7675331 gene FLJ44861 hsa-m ⁇ r-338 76714348 (-) 6 1-76755359 BEL CpG 1 , Cons GENIC ⁇ 5kb Cons - - K27 hsa-m ⁇ r-133a- ChM 8 17659661
- miRNA H3K4 CpG intervenProxiConH3K27 name position score TSS position me3 Island ing Sites
- GENE/EST mal served me3 chii 9 4721702- chii 9 4720051- hsa-m ⁇ r-7-3 4721781 (+) 32 4720251 E - 0 GENIC ⁇ 5kb
- Cons chri 9 6446959- ch ii 9 6440459- hsa-m ⁇ r-220b 6447045 (+) 0 6441187 TB - 0 ND chr19 10690085 chri 9 1068965 hsa-m ⁇ r-638 10690183 (+) 30 4-10689854 LTEB CpG 0 GENIC ⁇ 5kb hsa-mir-199a- chr19 10789095 chii 9 1078933
- miRNA H3K4 CpG IntervenProxi- Con- H3K27 name position score TSS position me3 Island ing Sites GENE/EST mal served Oct4 Suz12 me3 hsa-m ⁇ r-516a- chri 9 58956203 ChM 9 5877185
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Chemical & Material Sciences (AREA)
- Physics & Mathematics (AREA)
- Genetics & Genomics (AREA)
- Biotechnology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biophysics (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Zoology (AREA)
- Biomedical Technology (AREA)
- Wood Science & Technology (AREA)
- Organic Chemistry (AREA)
- Plant Pathology (AREA)
- Microbiology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Biochemistry (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The present invention provides, among other things, promoters for mouse and human microRNA genes and methods of use thereof.
Description
CONNECTING MICRORNA GENES TO THE CORE TRANSCRIPTIONAL REGULATORY CIRCUITRY OF EMBRYONIC STEM CELLS
Related Applications
[0001] This application claims priority to, and the benefit of, U.S. Provisional Application No. 61/188,211, filed August 7, 2008. The entire contents of the afore-mentioned applications are incorporated herein by reference.
Government Funding Statement
[0002] The United States Government has provided grant support utilized in the development of the present invention. Grants 5-RO1-HDO45022, 5-R37-CA084198, 5-R37- CA087869, and HG002668 from the National Institute of Health have supported development of this invention. The Government has certain rights in the invention.
Background of the Invention
[0003] Embryonic stem (ES) cells hold significant potential for clinical therapies because of their distinctive capacity to both self-renew and differentiate into a wide range of specialized cell types. Understanding the transcriptional regulatory circuitry of ES cells and early cellular differentiation is fundamental to understanding human development and realizing the therapeutic potential of these cells. Transcription factors that control ES cell pluripotency and self-renewal have been identified (Chambers and Smith, 2004; Niwa, 2007; Silva and Smith, 2008) and a draft of the core regulatory circuitry by which these factors exert their regulatory effects on protein-coding genes has been described (Boyer et al., 2005; Loh et al., 2006; Lee et al., 2006; Boyer et al. 2006). Several lines of evidence indicate that microRNAs (miRNAs) contribute to the control of early development. However, little is known about the function and regulation of miRNAs in ES cells. Furthermore, although numerous miRNAs have been identified in various mammalian species, there is much less information available regarding miRNA gene transcriptional start sites and promoter regions.
Summary of the Invention
[0004] The invention relates in part to promoters and high probability transcriptional start sites for genes, e.g., microRNA genes, and methods for identification thereof. In one aspect, the invention provides a method of identifying a genomic region containing a high probability transcriptional start site for a microRNA (miRNA) gene, the method comprising: (a) identifying a genomic region comprising a candidate transcriptional start site for an miRNA gene based at least in part on enrichment for histone H3 trimethylated at its lysine residue (H3K4me3) within such region; and (b) assigning a score to said region based at least in part on (i) its proximity to one or more annotated mature miRNA sequences, (ii) expressed sequence tag (EST) data, and/or (iii) conservation of the region between multiple species, wherein the following factors, if present, contribute positively to the score: (I) proximity of the region to one or more annotated mature miRNA sequences, (II) identification of the region as containing the start site of a known transcript that spans a miRNA or of an EST that spans a miRNA, and (III) conservation of the region between multiple mammalian species; and the following factors, if present, contribute negatively to the score: (IV) if the H3K4me3 enriched region is assignable instead to a transcript or EST that does not overlap the miRNA; (V) intervening H3K4me3 sites between the region and the miRNA; and (c) identifying the genomic region as containing a high probability transcriptional start site for a microRNA (miRNA) gene based at least in part on the score. In some embodiments the region is between 100 base pairs (bp) and 10 kilobases (10 kB) in length. In some embodiments the region is between 100 base pairs (bp) and 5 kilobases (5 kB) in length. In some embodiments the region is between 100 base pairs (bp) and 1 kilobase (1 kB) in length. In some embodiments the method comprises: (i) identifying a plurality of genomic regions containing candidate transcriptional start sites for miRNA genes from the genomes of at least two cell types of different cell lineages; and (ii) identifying genomic regions that are conserved between the at least two cell types, wherein such conservation indicates an increased likelihood that the genomic region comprises a transcriptional start site. In some embodiments the method comprises: (i) identifying a plurality of genomic regions containing candidate transcriptional start sites for miRNA genes from the genomes of at least two different differentiated cell types; and (ii) identifying genomic regions that are conserved between the at least two cell types, wherein such conservation indicates an increased likelihood that the genomic region comprises a
transcriptional start site. In some embodiments the method comprises: (i) identifying a plurality of genomic regions containing candidate transcriptional start sites for miRNA genes from the genomes of cells derived from each at least two different mammalian species; and (ii) identifying genomic regions that are conserved between the cells derived from each at least two different mammalian species, wherein such conservation indicates an increased likelihood that the genomic region comprises a transcriptional start site. In some embodiments the cells from the at least two different mammalian organisms are of the same cell type or lineage. In some embodiments the cells are from mouse and human. The invention further provides a computer-readable medium having instructions stored thereon for performing at least step (b) or step (c) of the method when provided with suitable data. [0005] In another aspect, the invention provides a computer-readable medium having information stored thereon, wherein the information describes a plurality of regions comprising high probability miRNA gene transcriptional start sites, wherein said information describes regions comprising high probability mammalian miRNA gene transcriptional start sites for at least 100 miRNA genes or at least 75% of the miRNA genes in a selected mammalian species.
[0006] In some embodiments, the high probability miRNA gene transcriptional start sites are identified by a method comprising steps of: (a) identifying a genomic region comprising a candidate transcriptional start site for an miRNA gene based at least in part on enrichment for histone H3 trimethylated at its lysine residue (H3K4me3) within such region; and (b) assigning a score to said region based at least in part on (i) its proximity to one or more annotated mature miRNA sequences, (ii) expressed sequence tag (EST) data, and/or (iii) conservation of the region between multiple species, wherein the following factors, if present, contribute positively to the score: (I) proximity of the region to one or more annotated mature miRNA sequences, (II) identification of the region as containing the start site of a known transcript that spans a miRNA or of an EST that spans a miRNA, and (III) conservation of the region between multiple mammalian species; and the following factors, if present, contribute negatively to the score: (IV) if the H3K4me3 enriched region is assignable instead to a transcript or EST that does not overlap the miRNA; (V) intervening H3K4me3 sites between the region and the miRNA; and (c) identifying the region as containing a high probability transcriptional start site for a microRNA (miRNA) gene based at least in part on the score. In some embodiments the miRNA transcriptional start sites are
high probability mammalian, e.g., human, miRNA gene transcriptional start sites. The invention further provides a method comprising steps of: (i) electronically accessing a computer-readable medium of the invention; and (ii) extracting or analyzing information therefrom.
[0007] In another aspect, the invention provides a computer-readable medium having information stored thereon, wherein the information describes a regulatory network comprising relationships between one or more key ES cell transcription factors, at least 20 ES cell transcription factor target genes, and at least some targets of the ES cell transcription factor target genes, wherein the ES cell transcription factor target genes include at least some genes that encode proteins and at least some genes that encode miRNAs. In some embodiments the key ES cell transcription factors are selected from: Oct4, Nanog, Sox2, and TcB. In some embodiments the key ES cell transcription factors are Oct4, Nanog, Sox2, and TcO. In some embodiments the information stored on the computer-readable medium further comprises information describing relationships between Polycomb group proteins and at least some of the key ES cell transcription factor target genes. The invention further comprises a method comprising steps of: (i) electronically accessing said computer-readable medium; and (ii) extracting or analyzing information therefrom. [0008] The invention further provides an isolated nucleic acid comprising a region comprising a high probability transcriptional start site for a mammalian miRNA gene. In some embodiments, the region is identified according to a method comprising steps of: (a) identifying a genomic region comprising a candidate transcriptional start site for an miRNA gene based at least in part on enrichment for histone H3 trimethylated at its lysine residue (H3K4me3) within such region; and (b) assigning a score to said region based at least in part on (i) its proximity to one or more annotated mature miRNA sequences, (ii) expressed sequence tag (EST) data, and/or (iii) conservation of the region between multiple species, wherein the following factors, if present, contribute positively to the score: (I) proximity of the region to one or more annotated mature miRNA sequences, (II) identification of the region as containing the start site of a known transcript that spans a miRNA or of an EST that spans a miRNA, and (III) conservation of the region between multiple mammalian species; and the following factors, if present, contribute negatively to the score: (IV) if the H3K4me3 enriched region is assignable instead to a transcript or EST that does not overlap the miRNA; (V) intervening H3K4me3 sites between the region and the miRNA; (c) identifying the
region as containing a high probability transcriptional start site for a microRNA (miRNA) gene based at least in part on the score. In some embodiments, the region comprises or consists of a transcription start site (TSS) listed in Table S6 or S7 and wherein, optionally, the isolated nucleic acid comprises no more than 1 kB, 2 kB, 5 kB, 8 kB, or 10 kB of genomic sequence on the 5' side, the 3' side, or both sides of the TSS. In some embodiments the region comprises at least 50 continuous nucleic acids of a transcription start site (TSS) listed in Table S6 or S7 and wherein, optionally, the isolated nucleic acid comprises no more than 1 kB, 2 kB, 5 kB, 8 kB, or 10 kB of genomic sequence on the 5' side, the 3' side, or both sides of the TSS. In some embodiments the isolated nucleic acid further comprises a miRNA sequence. The invention further provides a composition comprising such an isolated nucleic acid and a transcription factor, wherein the transcription factor is one that binds to the region in at least some cell types. The invention further provides a nucleic acid construct, e.g., an isolated nucleic acid construct, comprising such an isolated nucleic acid. In some embodiments, the isolated nucleic acid comprises a promoter and the construct comprises a heterologous nucleic acid operably linked to the promoter. In some embodiments the isolated nucleic acid comprises a promoter and the construct comprises a sequence encoding a polypeptide or microRNA. In some embodiments the polypeptide is a reporter polypeptide of use to detect and/or quantify expression from the promoter. In some embodiments the reporter polypeptide comprises a fluorescent protein. The invention further provides a host cell or transgenic non-human mammal, e.g., a mouse, containing such a nucleic acid construct.
[0009] The invention further provides a method of identifying an agent with potential to modulate expression of a miRNA, the method comprising: (i) providing a nucleic acid construct comprising a miRNA promoter operably linked to a heterologous nucleic acid; and (ii) determining whether a test agent affects expression of the heterologous nucleic acid, wherein if the test agent affects expression of the heterologous nucleic acid, the test agent is identified as an agent with potential to modulate expression of the miRNA. In some embodiments the heterologous nucleic acid encodes a reporter protein. In some embodiments the nucleic acid construct is in a cell and the method comprises contacting the cell with the test agent, In some embodiments the miRNA is listed in Table S6 or Table S7. In some embodiments, the method further comprises: (iii) contacting cells with the agent; (iv) measuring expression of the miRNA or of a target gene of the miRNA; and (v)
determining whether contacting the cells with the agent alters expression of the miRNA or miRNA target gene relative to expression that would be expected in the absence of the agent. [0010] The invention further provides a method of identifying a miRNA that acts as a determinant of cell fate decisions, wherein the miRNA is one that is selectively expressed in cells of one or more differentiated cell types or lineages, the method comprising determining whether the promoter of the miRNA is repressed by a Polycomb group protein in ES and/or iPS cells, wherein if the promoter of the miRNA is repressed by a Polycomb group protein in ES and/or iPS cells, the miRNA is identified as a determinant of cell fate decisions. In some embodiments determining whether the promoter is repressed by a Polycomb group protein comprises determining whether the promoter is bound by a Polycomb group protein. In some embodiments the miRNA is listed in Table S6 or S7.
[0011] The invention further provides a method of identifying a polymorphism or mutation in a mammalian species, the method comprising; (i) obtaining the sequence of a genomic region containing a miRNA promoter in a plurality of individuals of the species; and (ii) determining whether the sequence of the region varies between within the region, wherein variations within the sequence define polymorphisms or mutations. In some embodiments the miRNA is listed in Table S6 or S7.
[0012] The invention further provides a method of identifying a polymorphism or mutation associated with increased or decreased risk of developing a disease, the method comprising: (i) analyzing the sequence of a genomic region containing a miRNA promoter in a plurality of individuals with the disease; and (ii) determining whether a correlation exists between the presence of particular polymorphic variant(s) or mutation(s) within the region in individuals and presence of the disease. In some embodiments the disease is associated with aberrant (e.g., increased or decreased) miRNA expression. In some embodiments the disease is cancer. In some embodiments the miRNA is listed in Table S6 or Table S7. In some embodiments the miRNA promoter is one that is bound by a Polycomb group protein in ES and/or iPS cells.
[0013] The invention further provides a method of modulating the differentiation of a pluripotent mammalian stem cell, the method comprising: modulating the level or activity of a miRNA in the pluripotent stem cell, wherein the miRNA is encoded by a gene whose promoter is bound by a key embryonic stem (ES) cell transcription factor in ES and/or iPS cells. In some embodiments the pluripotent stem cell is an induced pluripotent stem (iPS)
cell. In some embodiments the method comprises decreasing the level or activity of a miRNA in the cell. In some embodiments the method comprises contacting the cell with an oligonucleotide complementary to the miRNA. In some embodiments the method comprises expressing an oligonucleotide complementary to the miRNA (or miRNA precursor) in the cell. In some embodiments the method comprises increasing the level or activity of a miRNA in the cell. In some embodiments the method comprises introducing the miRNA or a miRNA precursor containing the miRNA into the cell, or expressing the miRNA or a miRNA precursor in the cell. In some embodiments the method comprises modulating the binding of a transcription factor to the promoter of the gene that encodes the miRNA. In some embodiments the miRNA is one whose promoter is bound by a Polycomb group protein in ES and/or iPS cells. In some embodiments the method further comprises administering the cell to an individual. In some embodiments the pluripotent stem cell is a human cell. The invention further provides a mammalian cell, e.g., a human cell, wherein the differentiation state of the cell has been modulated according to such a method. In another aspect, the invention provides a method of treating an individual comprising: administering such a cell to the individual. In some embodiments the method comprises (i) obtaining a cell from an individual; (ii) reprogramming the cell in vitro; and (iii) administering the cell to the individual. In some embodiments the cell is differentiated in vitro.
[0014] The invention further provides a method of modulating the in vitro reprogramming of a differentiated mammalian somatic cell, the method comprising: modulating the level or activity of a miRNA in the differentiated mammalian somatic cell, wherein the miRNA is encoded by a gene whose promoter is bound by a key embryonic stem (ES) cell transcription factor in ES and/or iPS cells. In some embodiments the method comprises reprogramming the somatic cell to a pluripotent state. In some embodiments the method comprises reprogramming the somatic cell to a pluripotent state and then differentiating the reprogrammed pluripotent cell to a desired cell type or lineage. In some embodiments the method comprises reprogramming the somatic cell from a first at least partially differentiated state to a second at least partially differentiated state. In some embodiments the method comprises reprogramming the somatic cell from a first cell type to a second cell type, wherein the first and second cell types are in different cell lineages. In some embodiments the method comprises decreasing the level or activity of a miRNA in the
cell. In some embodiments the method comprises contacting the cell with an oligonucleotide complementary to the miRNA. In some embodiments the method comprises increasing the level or activity of a miRNA in the cell. In some embodiments the method comprises introducing the miRNA or a miRNA precursor containing the miRNA into the cell, or expressing the miRNA or a miRNA precursor in the cell. In some embodiments the method comprises modulating the binding of a transcription factor to the promoter of the gene that encodes the miRNA. In some embodiments the miRNA is one whose promoter is bound by a Polycomb group protein in ES and/or iPS cells. In some embodiments the somatic cell is a human cell. The invention further provides a reprogrammed mammalian somatic cell, wherein the in vitro reprogramming of the cell has been modulated according to the method. The invention further provides a method of treating an individual comprising: administering the cell to the individual. In some embodiments the method comprises (i) obtaining a cell from an individual; (ii) reprogramming the cell in vitro; and (iii) administering the cell to the individual.
[0015] The invention further provides a method of modulating the differentiation state of a mammalian somatic cell, the method comprising: modulating the level or activity of a miRNA in the mammalian somatic cell, wherein the miRNA is one that is expressed in a cell type or cell lineage specific manner and is encoded by a gene whose promoter is bound by a Polycomb group protein in ES and/or iPS cells. In some embodiments the somatic cell is a human cell. In some embodiments the method comprises decreasing the level or activity of a miRNA in the cell. In some embodiments the method comprises contacting the cell with an oligonucleotide complementary to the miRNA. In some embodiments the method comprises increasing the level or activity of a miRNA in the cell. In some embodiments the method comprises introducing the miRNA or a miRNA precursor containing the miRNA into the cell, or expressing the miRNA or a miRNA precursor in the cell. In some embodiments the method comprises modulating the binding of a transcription factor to the promoter of the gene that encodes the miRNA. The invention further provides a mammalian somatic cell, wherein the differentiation state of the cell has been modulated according to the method. [0016] The invention further provides a method of treating an individual comprising: administering the cell to the individual. In some embodiments method comprises (i) obtaining a cell from an individual; (ii) modulating the differentiation state of the cell in
vitro; and (iii) administering the cell to the individual. In some embodiments the modulating promotes differentiation of the cell to a desired cell type or lineage.
Brief Description of the Drawings
[0017] Figure 1. High-Resolution Genome-wide Mapping of Core ES Cell Transcription Factors with ChIP-seq. (A) Summary of binding data for Oct4, Sox2, Nanog, and TcO. 14,230 sites are cobound genome wide and mapped to either promoter-proximal (TSS ± 8 kb, dark green, 27% of binding sites), genie (>8 kb from TSS, middle green, 30% of binding sites), or intergenic (light green, 43% of binding sites). The promoter-proximal binding sites are associated with 3,289 genes. (B) (Upper) Binding of Oct4 (blue), Sox2 (purple), Nanog (orange), and Tcf3 (red) across 37.5 kb of mouse chromosome 3 surrounding the Sox2 gene (black box below the graph, arrow indicates transcription start site). Short sequences uniquely and perfectly mapping to the genome were extended to 200 bp (maximum fragment length) and scored in 25 bp bins. The scores of the bins were then normalized to the total number of reads mapped. Oct4/Sox2 DNA-binding motifs (Loh et al., 2006) were mapped across the genome and are shown as gray boxes below the graph. Height of the box reflects the quality of the motif. (Lower) Detailed analysis of three enriched regions (Chromosome 3: 4,837,600-34,838,300, 34,845,300-34,846,000, and 34,859,900-34,860,500) at the Sox2 gene indicated with dotted boxes above. The 5' most bases from ChIP-seq were separated by strand and binned into 25 bp regions. Sense (darker tone) and antisense (light tone) of each of the four factors tested are directed toward the binding site, which in each case occurs at a high-confidence Oct/Sox2 DNA-binding motif indicated below.
[0018] Figure 2. Identification of miRNA Promoters. (A) Description of algorithm for miRNA promoter identification. A library of candidate transcriptional start sites was generated with histone H3 lysine 4 trimethyl (H3K4me3) location analysis data from multiple tissues ([Barski et al., 2007], [Guenther et al., 2007] and [Mikkelsen et al., 2007]). Candidates were scored to assess likelihood that they represent true miRNA promoters. Based on scores, a list of mouse and human miRNA promoters was assembled. Additional details can be found in Example 7. (B) Examples of identified miRNA promoter regions. A map of H3K4me3 enrichment is displayed in regions neighboring selected human and mouse miRNAs for multiple cell types: human ES cells (hES), REH human pro-B cell line (B cell), primary human hepatocytes (Liver), primary human T cells (T cell), mouse ES cells (mES),
neural precursor cells (NPCs), and mouse embryonic fibroblasts (MEFs). miRNA promoter coordinates were confirmed by distance to mature miRNA genomic sequence, conservation, and EST data (shown as solid line where available). Predicted transcriptional start site and direction of transcription are noted by an arrow, with mature miRNA sequences indicated (red). CpG islands, commonly found at promoters, are indicated (green). Dotted lines denote presumed transcripts. (C) Confirmation of predicted transcription start sites for active miRNAs using chromatin modifications. Normalized ChIP-seq counts for H3K4me3 (red), H3K79me2 (blue), and H3K36me3 (green) are shown for two miRNA genes where EST data were unavailable. Predicted start site (arrow), CpG islands (green bar), presumed transcript (dotted lines), and miRNA positions (red bar) are shown.
[0019] Figure 3. Oct4, Sox2, Nanog, and TcG Occupancy and Regulation of miRNA Promoters. (A) Oct4 (blue), Sox2 (purple), Nanog (orange), and Tcf3 (red) binding is shown at four murine miRNA genes as in Figure IA. H3K4me3 enrichment in ES cells is indicated by shading across genomic region. Presumed transcripts are shown as dotted lines. Coordinates for the mmu-mir-290-295 cluster are derived from NCBI build 37. (B) Oct4 ChIP enrichment ratios (ChIP-enriched versus total genomic DNA) are shown across human miRNA promoter region for the hsa-mir-302 cluster. H3K4me3 enrichment in ES cells is indicated by shading across genomic region. (C) Schematic of miRNAs with conserved binding by the core transcription factors in ES cells. Transcription factors are represented by dark blue circles and miRNAs are represented by purple hexagons. (D) Quantitative RT- PCR analysis of RNA extracted from ZHBTc4 cells in the presence or absence of doxycycline treatment. Fold-change was calculated for each pri-miRNA for samples from 12 hr and 24 hr of doxycyline treatment relative to those from untreated cells. Transcript levels were normalized to Gapdh levels. Error bars indicate standard deviation derived from triplicate PCR reactions. (D) Most human and mouse miRNA promoters show evidence of H3K4me3 enrichment in multiple tissues.
[0020] Figure 4. Regulation of Oct4/Sox2/Nanog/TCF3 -Bound miRNAs during Differentiation. (A) Pie charts showing relative contributions of miRNAs to the complete population of miRNAs in mES cells (red), MEFs (blue), and NPCs (green) based on quantification of miRNAs by small RNA sequencing. A full list of the miRNAs identified can be found in Table S9. (B) Normalized frequency of detection of individual mature miRNAs whose promoters are occupied by Oct4/Sox2/Nanog/ TcO in mouse. Red line in
center and right panel show the level of detection in ES cells. (C) Histogram of changes in frequency of detection. Changes for miRNAs whose promoters are occupied by Oct4, Sox2, Nanog, and TcG in mouse are shown as bars (red for ES enriched, blue for MEF enriched, and green for NPC enriched). The background frequency for nonoccupied miRNAs is shown as a gray line.
[0021] Figure 5. Polycomb Represses Lineage-Specific miRNAs in ES Cells. (A) Suzl2 (light green) and H3K27me3 (dark green, Mikkelsen et al., 2007) binding are shown for two miRNA genes in murine ES cells. Predicted start sites (arrow), CpG islands (green bar), presumed miRNA primary transcript (dotted line), and mature miRNA (red bar) are shown. (B) Expression analysis of miRNAs from mES cells based on quantitative small RNA sequencing. Cumulative distributions for Polycomb-bound miRNAs (green line) and all miRNAs (gray line) are shown. (C) Expression analysis of miRNAs occupied by Suzl2 in mES cells. Relative counts are shown for mES (red), NPCs (orange), and MEFs (yellow). H3K27me3 (green line) and H3K4me3 (blue line) mapped reads are shown for mES cells, MEFs, and NPCs (Mikkelsen et al., 2007). (D) Schematic of a subset of miRNA genes occupied by Suzl2 in both mES and hES cells as in Figure 3C. miRNA genes where Oct4/Sox2/Nanog/Tcf3 are also present are indicated. Cells known to selectively express these miRNAs based on computation predictions (Farh et al,, 2005) or experimental confirmation (Landgraf et al., 2007) are indicated. The Polycomb group (PcG) protein Suzl2 is represented by a green circle.
[0022] Figure 6. miRNA Modulation of the Gene Regulatory Network in ES Cells [0023] (A) An incoherent feed-forward motif (Alon, 2007) involving an miRNA repression of a transcription factor target gene is illustrated (left). Transcription factors are represented by dark blue circles, miRNAs in purple hexagons, protein-coding gene in pink rectangles, and proteins in orange ovals. Selected instances of this network motif identified in ES cells based on data from Sinkkonen et al., 2008 or data in Figure Sl 1 are shown (right). (B) Second model of incoherent feed-forward motif (Alon, 2007) involving protein repression of an miRNA is illustrated (left). In ES cells, Lin28 blocks the maturation of primary pvi-Let-7g (Viswanathan et al., 2008). Lin28 and the Let-7g gene are occupied by Oct4/Sox2/Nanog/Tef3, Targetscan prediction (Grimson et al., 2007) of Lin28 by mature Let-7g is noted (purple dashed line, right). (C) A coherent feed-forward motif (Alon, 2007) involving miRNA repression of a transcriptional repressor that regulates a transcription
factor target gene is illustrated (left). This motif is found in ES cells, where mir-290-295 miRNAs repress Rbl2 indirectly maintaining the expression of Dnmt3 'a and Dnmt3a, which are also occupied at their promoters by Oct4/Sox2/Nanog/Tcf3 (right), [0024] Figure 7. Multilevel Regulatory Network Controlling ES Cell Identity, Updated map of ES cell regulatory circuitry is shown. Interconnected autoregulatory loop is shown to the left. Active genes are shown at the top right, and inactive genes are shown at the bottom right. Transcription factors are represented by dark blue circles, and Suzl2 by a green circle. Gene promoters are represented by red rectangles, gene products by orange circles, and miRNA promoters are represented by purple hexagons.
[0025] Figure Sl. Comparison of ChIP-seq and RT-PCR data for Oct4 and Suzl2. a. ChIP-seq reads for Oct4 enrichment were compared to RT-PCR results previously described in Loh et al. 71 probes that were identified as enriched are shown in blue and 39 regions identified as not-enriched are shown in gray. The maximum number of ChIP-seq reads assigned within the region is shown on the vertical axis. Red line denoted the threshold of binding with p < 10-9. Ambiguous RT-PCR results were excluded, b. ChIP-seq reads for Suzl2 enrichement were compared to RT-PCR results previously described in Boyer et al. as in a. ChIP-seq data within 200bp of 68 probes identified as enriched by RT-PCR and confirmed by ChIP-chip are shown in blue and 18 probes identified as not-enriched are shown in gray.
[0026] Figure S2. Promoters for known genes occupied by Oct4/Sox2/Nanog/Tcf3 in mES cells, a. Overlap of genes whose promoters are within 8kb of sites enriched for Oct4, Sox2, Nanog, or Tcf3. Not shown are the Nanog:Oct4 overlap (289) and Sox2:Tcf3 overlap (26). Red line deliniates genes considered occupied by Oct4/Sox2/Nanog/Tcf3. b. Enrichment for selected GO-terms previously reported to be associated with Oct4/Sox2/Nanog binding (Boyer et ah, 2005) was tested on the sets of genes occupied at high-confidence for 1 to 4 of the tested DNA binding factors. Hypergeometric p-value is shown for genes annotated for DNA binding (blue), Regulation of Transcription (green) and Development (red).
[0027] Figure S3. Comparison of ChIP-seq and ChIP-chip genome wide data for Oct4, Nanog and Tcf3. a. Binding of Oct4 (blue), Nanog (orange) and TcO (red) across 17kb surrounding the Wnt8a and Nodal genes (black below the graph, arrow indicates transcription start site) as in figure Ib. (upper) Binding derived from ChIP-seq data plotted as
reads per million, (lower) Binding derived from ChIP^chip enrichment ratios (Cole et. al., 2008) b. Poor probe density prevents detection of -1/3 of ChIP-seq binding events on Agilent genome- wide tiling arrays. Top panel shows the fraction of regions that are occupied by Oct4/Sox2/Nanog/TcO at high-confidence in mES cells as identified by ChIP-seq that are enriched for Oct4 (blue), Nanog (orange) and TcO (red) on Agilent genome-wide microarrays (Cole et al., 2008). Numbers on the x-axis define the boundaries used to classify probe densities for the histogram. Bottom panel illustrates a histogram of the microarray probe densities of the enriched regions identified, c. Comparison of motif association. At the set of genome- wide ChIP-chip probe positions, we examined the association between an Oct4 DNA motif and ChIP-chip and ChIP-seq enrichment. Probes / Bins were considered positive if they were associated with a high scoring motif within a 200 bp window (+/-100 bp). The background motif occurance for all probe positions is 8.2% (left most group). 1297 ChIP-seq bins and 421 ChIP-chip probes are included in the top categories respectively. [0028] Figure S4. High resolution analysis of Oct4/Sox2/Nanog/Tcf3 binding based on Meta-analysis, a-d. Short sequence reads for a. Oct4, b. Sox2, c. Nanog, d. Tcβ mapping within 250bp of 2000 highly enriched regions where the peak of binding was found within 50bp of a high quality Oct4/Sox2 motif were collected. Composite profiles were created at base pair resolution for forward and reverse strand reads centered on the Oct4/Sox2 motif (aligned at +1). The difference between the number of positive and negative strand reads are shown for each base pair (circles). The best fit line is shown for each factor (see Example 7). e-h Zoomed in region of a-d showing 20bp surrounding the Oct4/Sox2 motif. Dashed line indicates the position where the best fit line crosses the X-axis. For reference, the motif is shown below each graph, i. Summary of meta-analysis for Oct4, Sox2, Nanog and Tcf3. Arrows indicate the nucleotide where each transcription factor switches from a positive strand bias to a negative strand bias. The octomer and HMG box motifs are indicated. [0029] Figure S5. Algorithm for Identification of miRNA promoters, a. Flowchart describing the method used to identify the promoters for primary miRNA transcripts in human and mouse. For a full description, see Example 7. b. Two examples of identification of miRNA promoters. Top, Initial identification of possible start sites based on H3K4me3 enriched regions from four cell types. Enrichment of H3K4me3-modified nucleosomes is shown as shades of gray. Red bar represents the position of the mature miRNA. Black bars below the graph are regions enriched for H3K4me3. Initial scores are shown below the black
bars. The region on the far right was excluded from the analysis (score = X) since it is downstream of the mature miRNA. Middle, Identification of candidate start sites <5kb upstream of the mature miRNA (yellow shaded area). Bottom, identification of candidate start sites that either initiate overlapping (left) or non-overlapping (right) transcripts. EST and transcript data is shown. Scores associated with identified genes are shown bold. [0030] Figure S6. Summary of miRNA promoter classification, a. Promoters assigned to mature miRNAs were classified by the dominant feature of their scoring. Green: miRNAs that were found to have overlapping ESTs or genes confirming their promoters. Orange: miRNAs that were found to have a candidate start site within 5kb of the mature miRNA. Gray: miRNAs with either no candidates within 250kb of the mature miRNA or where all candidates had a score less then zero (see Fig. S5b, right). Yellow: miRNAs for which the closest candidate start site was selected solely on the basis of its proximity, b. The basis of miRNA promoter identification, including Gene or EST evidence (green), distance of <5 kilobases to mature miRNA (orange), nearest possible promoter to miRNA (yellow), tended to be conserved between human and mouse.
[0031] Figure S7. Regulation of miRNAs by Oct4. a. In an engineered murine cell line (Niwa et al., 2000), endogenous Oct4 is deleted, and Oct4 expression is maintained by a Dox-repressible transgene. b. By 24 hours of Dox-treatment, Oct4 mRNA levels are reduced as shown by reverse transcription (RT)-PCR. c. 24 hours following Dox-treatment, cells remain ES-like by morphology, d. 24 hours following Dox-treatment Sox2 protein can still be detected by immunofluoresence.. e. Changes in levels of Oct4/Sox2/Nanog/Tcf3 occupied mature miRNAs based on Solexa sequencing of small RNAs. Fold change was calculated by comparing normalized read counts from untreated cells and cells 24 hours after Dox treatment. A full list of miRNA reads can be found in Table S9. Details about the normalization procedure are contained in Example 7.
[0032] Figure S8. Regulation of miRNAs by TcO. TcD was knocked down in V6.5 mES cells using lentiviral vectors containing shRNAs. a. RT-PCR confirmation of knockdown at 72 hours post-infection using Taqman probes against TcD (relative to levels in cells infected with GFP control lentivirus). b. Schematic of the position of RT-PCR probes used to measure the levels of pri-miRNA transcripts in Figure 3d and part c.c. Results of quantitative reverse transcriptase(RT)-PCR analysis of probes designed to several pri-miRNAs occupied by Oct4/Sox2/Nanog/TcD. Change in the level of primary transcript compared to GFP control
lentivirus are shown. * = p < 0.05, ** = p < 0.001 using a two-sampled t-test assuming equal variance. Standard deviation is indicated with error bars.
[0033] Figure S9. miRNA genes occupied by the core master regulators in ES cells are expressed in induced Pluripotent Stem cells (iPS). RNA was extracted from MEFs (columns 1-3), rnES cells (columns 4, 5) and iPS cells (column 6) and hybridized to microarrays with LNA probes targeting all known miRNAs. Differentially expressed miRNAs enriched in either MEFs or mES cells are shown (FDR < 10%, see Example 7, iPS cells were not used to determine differential expression). Data were Z-score normalized, and cell types were clustered hierarchically (top). Active miRNA promoters associated with Oct4/Sox2/Nanog/Tcf3 are listed to the right,
[0034] Figure SlO. PcG occupied miRNAs are generally expressed in a tissue specific manner. Mature miRNAs derived from genes occupied by Suzl2 and H3K27me3-modified nucleosomes were compared to the list of tissue specific miRNAs derived from the miRNA expression atlas (Landgraf et al, 2007). Vertical axis represents tissue-specificity and miRNAs with specificity score >1 are shown. miRNAs bound by Oct4/Sox2/Nanog/Tcβ and expressed in mES cells are not shown (largely ES cell specific miRNAs). Among the tissue- specific miRNAs there is significant enrichment (p < 0.005 by hypergeometric distribution) for miRNAs occupied by Suzl2 (green).
Detailed Description of the Invention
[0035] The invention relates at least in part to microRNAs and microRNA genes. In some aspects, the invention relates to identification of promoters for miRNA genes, e.g,. in mammalian cells. In some aspects, the invention relates to the regulation and role(s) of miRNAs in pluripotency and differentiation, e.g., in mammalian cells. The invention integrates miRNAs and their target genes into the core regulatory circuitry of pluripotency and self-renewal, e.g., in ES cells, induced pluripotent stem (iPS) cells, etc. The expanded understanding of this regulatory circuitry allows, among other things, improved control over reprogramming and differentiation of mammalian cells in vitro and improved ability to modulate cell state, e.g., in methods of in vitro cell reprogramming and differentiation. [0036] In one aspect, the invention provides a method of identifying a promoter of a miRNA gene. The method is based at least in part on in vivo chromatin signature of promoters and does not require active transcription of the miRNA or isolation of primary
miRNA transcripts, In one aspect, the invention provides a method of identifying a genomic region containing a high probability transcriptional start site (TSS) for a microRNA (miRNA) gene, the method comprising: (a) identifying a genomic region comprising a candidate transcriptional start site for an miRNA gene based at least in part on enrichment for histone H3 trimethylated at its lysine residue (H3K4me3) within such region; and (b) assigning a score to said region based at least in part on (i) its proximity to one or more annotated mature miRNA sequences, (ii) expressed sequence tag (EST) data, and/or (iii) conservation of the region between multiple species, wherein the following factors, if present, contribute positively to the score: (I) proximity of the region to one or more annotated mature miRNA sequences, (II) identification of the region as containing the start site of a known transcript that spans a miRNA or of an EST that spans a miRNA, and (III) conservation of the region between multiple mammalian species; and the following factors, if present, contribute negatively to the score: (IV) if the H3K4me3 enriched region is assignable instead to a transcript or EST that does not overlap the miRNA; (V) intervening H3K4me3 sites between the region and the miRNA; and (c) identifying the genomic region as containing a high probability transcriptional start site for a microRNA (miRNA) gene based at least in part on the score. The invention provides computer-readable medium having computer-executable instructions stored thereon for performing at least part of the method, e.g., step (b) and/or (c) when provided with suitable data. Further provided are computer systems comprising the computer-readable medium and a processor for performing the instructions. Optionally the system comprises means for inputting and/or outputting or displaying data and/or results.
[0037] The invention provides the recognition that in vivo chromatin signatures can be used to identify promoters and/or high probability transcriptional start sites (TSSs), e.g., in mammalian cells. Such in vivo chromatin signatures can comprise enrichment for histone H3 trimethylated at its lysine residue (H3K4me3). Inventive methods for identifying promoters and/or TSSs are exemplified herein using miRNA genes. In some embodiments, the invention provides methods to identify high probability transcriptional start site (TSS) for other genes, e.g., genes for other non-coding RNAs (which may be short or long), or protein- coding genes. In one aspect, the invention provides a method of identifying a genomic region containing a high probability transcriptional start site (TSS) for a gene, the method comprising: (a) identifying a genomic region comprising a candidate transcriptional start site
for a gene based at least in part on enrichment for histone H3 trimethylated at its lysine residue (H3K4me3) within such region; and (b) assigning a score to said region based at least in part on (i) its proximity to one or more annotated RNA sequences, (ii) expressed sequence tag (EST) data, and/or (iii) conservation of the region between multiple species, wherein the following factors, if present, contribute positively to the score: (I) proximity of the region to one or more annotated RNA sequences, (II) identification of the region as containing the start site of a known transcript that spans an RNA or of an EST that spans an RNA, and (III) conservation of the region between multiple mammalian species; and the following factors, if present, contribute negatively to the score: (IV) if the H3K4me3 enriched region is assignable instead to a transcript or EST that does not overlap the RNA; (V) intervening H3K4me3 sites between the region and the RNA; and (c) identifying the genomic region as containing a high probability transcriptional start site for a gene based at least in part on the score. Computer-readable medium having computer-executable instructions stored thereon for performing at least part of the method, e.g., step (b) and/or (c) when provided with suitable data are provided. Further provided are computer systems comprising the computer- readable medium and a processor for performing the instructions. Optionally the system comprises means for inputting and/or outputting or displaying data and/or results. [0038] The invention provides genomic regions comprising promoters of human and mouse mammalian miRNA genes (see, e.g., Table S6 and Table S7). Identification of the promoters of miRNA genes is of great scientific and practical interest for a number of reasons. Knowledge of the promoter facilitates methods of modulating expression of the miRNAs themselves and facilitiates methods of identifying agents that modulate expression or activity of the miRNAs. The invention provides such methods. Modulating miRNA expression in turn modulates expression of miRNA target genes (e.g., genes whose expression is inhibited by the miRNA), By modulating expression or activity of a particular miRNA, expression of multiple target genes can be modulated. The invention provides a number of regulatory interactions such as autoregulatory loops, coherent and incoherent feed-forward loops, and various other network motifs, etc., that are of use in controlling gene expression.
[0039] The invention provides genomic regions comprising miRNA promoters that are bound by key ES cell transcription factors (e.g., Oct4, Nanog, Sox2, and/or TcO) in ES cells and or are bound by Polycomb group protein(s) in ES cells.
[0040] The invention provides computer-readable media containing information describing the genomic regions and/or TF binding sites. Further provided are methods comprising accessing the information and, optionally, retrieving or analyzing it. [0041] The invention also discloses miRNAs whose promoters are, in ES cells, bound by at least one key ES cell transcription factor (see, e.g., Tables S6 and S7). For purposes of the present invention, a miRNA or miRNA gene whose promoter is, in ES cells, bound by at least one key ES cell transcription factor is referred to as an "ESTF-bound miRNA" or "ESTF-bound miRNA gene", respectively. A promoter that, in ES cells, is bound by at least one key ES cell transcription factor is referred to as an "ESTF-bound promoter". In some embodiments, a miRNA promoter disclosed herein is bound by one of the afore-mentioned TFs, while in other embodiments a promoter is bound by 2, 3, or 4 of the transcription factors (TFs), i.e., it is "co-occupied" by multiple TFs. Further provided are miRNA precursors (e.g., stem-loop structures) comprising the miRNAs. One of skill in the art will be able to consult databases such as miRBase (http://microrna.sanger.ac.uk/sequences/), which contains sequences of miRNA precursors corresponding to known miRNAs. [0042] The invention also provides genomic regions containing miRNA promoters that are bound by Polycomb group protein(s) in ES cells and, optionally, also bound by one or more key ES cell transcription factors. The invention also provides miRNAs whose promoters are bound by Polycomb group protein(s) in ES cells. Such binding is typically associated with repression of the miRNA gene. The invention provides the recognition that miRNAs that were bound by Polycomb group protein(s) in ES cells were among the transcripts that are specifically induced in differentiated cell types. Based at least in part on this recognition, the invention provides methods of identifying miRNAs that serve as key determinants of cell fate decisions, e.g., as "master regulators" controlling cell identity. The subset of miRNAs that are both cell-type specific and whose promoters are bound by Polycomb group proteins in ES cells are of great interest in this regard. These miRNAs, which are repressed in pluripotent cells and are expressed in differentiated cell types, are candidates for playing key roles in specifying cell fate. Modulation of such miRNAs has the potential to modulate reprogramming in a variety of contexts. For example, in some embodiments, derepressing miRNA(s) whose promoter(s) are bound by Polycomb and that are expressed specifically or selectively in one or more cell lineages or cell types is of use to direct the differentiation of pluripotent cells along such cell lineages or to such cell types. In
some embodiments, increasing expression or activity of miRNA(s) whose promoters are bound by Polycomb and that are expressed specifically or selectively in one or more cell lineages or cell types is of use to direct the differentiation of pluripotent cells along such cell lineages or to such cell types. In some embodiments, such lineage is neuronal, ectodermal, mesodermal, endodermal, etc.
[0043] The invention provides isolated nucleic acids comprising the genomic regions (e.g., any genomic region identified as a TSS in Table S6 or S7). In some embodiments the nucleic acid further comprises a miRNA sequence, e.g., the corresponding miRNA sequence listed in Table S6 or S7. In some embodiments the nucleic acid further comprises a sequence that encodes a miRNA precuror, e.g., the precursor of the corresponding miRNA sequence listed in Table S6 or S7. In some embodiments the nucleic acid comprises up to 100 bp, up to 500 bp, up to 1 kB, up to 5 kB, up to 8 kB, or up to 10 kB of genomic sequence on either the 5' side, the 3' side, or both sides of the identified genomic region. The invention further provides isolated nucleic acids at least 20 or at least 25 bp in length, whose sequence falls within or overlaps an identified genomic region. In some embodiments the isolated nucleic acid comprises a binding site for Oct4, Nanog, Sox2, Tcf3, or any 2, 3, or all 4 of these TFs. [0044] The invention further provides nucleic acid constructs, e.g., plasmids or other vectors (e.g., expression vectors), comprising any of the afore-mentioned isolated nucleic acid sequences. In some embodiments, the nucleic acid construct comprises a heterologous nucleic acid sequence, i.e., a sequence not normally found adjacent to the isolated nucleic acid sequence in the genome of the organism from which it was derived. In some embodiments, the heterologous nucleic acid and promoter are operably linked, whereby the promoter and heterologous nucleic acid are positioned with respect to one another so that the promoter directs expression of the heterologous nucleic acid, in cells that contain appropriate TFs. In some embodiments the heterologous nucleic acid encodes a reporter molecule, e.g., a fluorescent protein or protein having enzymatic activity that can be used to assess expression from the miRNA promoter or a selectable marker such as a drug resistance marker or nutritional marker. The invention further provides cells, e.g., isolated cells, containing the construct, and transgenic animals, cells of which contain the construct, e.g., integrated into the genome. Cells could be of any cell type or lineage. In some embodiments the cells are ES cells or iPS cells.
[0045] Constructs of the invention and cells containing them are of use, e.g., in methods to detect and/or quantify expression directed by the miRNA gene promoter and/or in methods to identify agents (e.g., small molecules such as organic compounds having molecular weight less than about 1 kD, or less than about 1.5 or 2 kD; polypeptides, peptides, nucleic acids, etc.) that modulate expression from such promoters. Such agents may, for example, inhibit or promote binding of a TF to the promoter. Methods of identifying modulators of ESTF bound or Polycomb group protein bound miRNA are thus aspects of the invention. Such agents may be designed or may be isolated by screening, e.g., compound libraries. It may also be of interest to determine whether compounds (e.g., proposed or approved pharmaceutical agents or other agents to which individuals, e.g., humans, may be exposed in the environment) have effects on activity of the miRNA promoters disclosed herein. [0046] Polymorphisms and mutations in promoter regions can alter expression of an operably linked nucleic acid and may be associated with disease. The invention provides methods of identifying polymorphisms or mutations associated with disease, e.g., in humans. Certain of the methods comprise analyzing sequences of the genomic regions in a plurality of individuals and determining whether particular sequence variant(s) are associated with disease, e.g., whether particular sequence variant(s) occur with greater frequency in individuals suffering from a disease than in control individuals, e.g., individuals not suffering from the disease (and typically matched for parameters such as age, etc.). Once such polymorphisms or mutations are identified, they may be used to provide diagnostic or prognostic information and/or in developing therapies for the disease. The methods are of use, in certain embodiments, to identify polymorphisms and/or mutations associated with cancer, e.g., cancer of the breast, prostate, kidney, lung, liver, gastrointestinal tract (e.g., colon) testis, stomach, pancreas, thyroid, brain or other nervous system tissue, connective tissue, skin, and/or hematopoietic system (e.g., leukemia, lymphoma). In some embodiments, the disease is one that is associated with altered or aberrant or inappropriate differentiation or dedifferentiation or development. Furthermore, identification of the promoters of miRNA genes provides a means to explore the underlying basis for alterations in miRNA expression that may be associated with a condition, e.g., a disease. It may be of interest to determine whether miRNA promoter activity is altered in cells or organisms suffering from a condition of interest.
[0047] The invention provides methods comprising modulating the expression or activity of a miRNA whose promoter is, in ES cells, bound by a key ES cell transcription factor. "Modulatating the expression or activity of a miRNA" can involve causing or facilitating a qualitative or quantitative change, alteration, or modification in the expression or activity of the miRNA. Such alteration may, for example, be an increase or decrease in miRNA level or activity within a cell. The invention provides cells wherein miRNA expression or activity has been modulated according to the inventive methods. The methods are of use, e.g., in reprogramming cells, e.g., in vitro. Such reprogramming could involve reprogramming differentiated cells to a pluripotent state, reprogramming cells from a first at least partly differentiated state to a second at least partly differentiated state, modulating, e.g., promoting or inhibiting differentiation of pluripotent cells to a partly or fully differentiated state (e.g., to a cell lineage or cell type of interest), etc.
[0048] A variety of methods for modulating miRNA level or activity are known in the art. Any suitable method can be used in the present invention. Cells can be contacted in vitro with molecules that are taken up and modulate miRNA expression or activity or molecules can be administered to individuals. miRNA or miRNA precursors can be introduced into cells to increase miRNA level and result in an increase in miRNA-mediated inhibition of miRNA target gene expression. Nucleic acids that encode miRNA or miRNA precursors can be introduced into cells and stably or transiently expressed therein. miRNA can be inhibited by antisense-based approaches. In some embodiments, an miRNA is inhibited by introducing an oligonucleotide (e.g., synthetic oligonucleotides, which may be chemically synthesized) that is complementary to the miRNA or miRNA precursor into a cell (e.g., in vitro) or administering such oligonucleotides to an organism. It will be appreciated that the oligonucleotide need not be perfectly complementary to the miRNA or miRNA precursor, e.g., it may have 1, 2, 3, 4, 5, or more mismatches and/or be at least 70%, at least 80%, or at least 90% complementary to the miRNA. In some embodiments the oligonucleotide is at least about 19 nt in length. In some embodiments the oligonucleotide is between about 17 nt and about 50 nt in length. It will be appreciated that such oligonucleotides may contain one or more non-standard nucleotides, modified nucleotides (e.g., having modified bases and/or sugars) or nucleotide analogs, and/or have a modified backbone and/or be attached or have attached thereto, one or more non-nucleic acid moieties. In some embodiments, the oligonucleotide has one or more modifications, e.g., to provide
RNase protection and/or pharmacologic properties such as enhanced tissue and cellular uptake. In some embodiments the oligonucleotide differs from normal RNA by having partial or complete 2'-O-methylation of sugar, phosphorothioate backbone and/or a cholesterol- moiety at the 3'-end. USPTO Patent Applications 20080171715, 20070213292 and PCT publications WO/2006/137941 WO/2008/025025 disclose a variety of compounds and methods of use to modulate miRNA. Certain agents that modulate miRNA level or activity are available from a variety of commercial suppliers (e.g., Thermo Scientific (Dharmacon), Ambion, etc.).
[0049] The inventors found that miRNA gene promoters that are, in ES cells, bound by at least one key ES cell transcription factor are also bound by such factor(s) in iPS cells. Results and regulatory circuitry derived with ES cells, and compositions and methods of the invention, are applicable in the context of other pluripotent cells, e.g., iPS cells. [0050] Methods of the invention may be applied in the context of a variety of mammalian and avian species. Mammals of interest include rodents (e.g., mice, rats, rabbits), primates (e.g., human, monkeys, apes), domesticated animals such as caprine, ovine, bovine, porcine, canine, feline species. Methods of the invention may, as appropriate, be applied to somatic cells that are at least partly differentiated along a cell lineage of interest, In some embodiments, the methods are applied to terminally differentiated somatic cells. Mammalian somatic cells useful in various embodiments of the present invention include, for example, fibroblasts, neurons, glial cells, pancreatic islet cells, epidermal cells, epithelial cells, endothelial cells, hepatocytes, hair follicle cells, keratinocytes, hematopoietic cells, melanocytes, chondrocytes, lymphocytes (B and T lymphocytes), erythrocytes, macrophages, monocytes, mononuclear cells, cardiac muscle cells, skeletal muscle cells, etc., Sertoli cells, granulosa cells, and precursor cells that are committed or partly differentiated along cell lineage leading to any of the afore-mentioned cell types. In some embodiments of the invention, adult stem cells are used. In some embodiments, precursor cells, such as neural precursor cells, hematopoietic precursor cells, or muscle precursor cells are used. [0051] In various embodiments, methods of the invention can be practiced on primary cells, non-immortalized cells, immortalized cells, genetically modified cells, cells that are considered "wild type" or "normal", cells obtained from an individual suffering from a disease, etc.
[0052] Certain methods of the invention are practiced on pluripotent cells, e.g., ES cells or iPS cells. Methods for generating such cells are known in the art. For example, iPS cells can be generated by introducing genes encoding transcription factors Oct4, Sox2, c-Myc and Klf4 (with c-Myc being dispensable), or Oct4, Nanog, Sox2, and Lin28 into somatic cells, e.g., via retroviral infection. See, e.g., Meissner, A., et al., Nat Biotechnol., 25(10): 1 177-81 (2007); Yu, J., et al, Science, 318(5858):1917-20 (2007); and Nakagawa, M., et al., Nat Biotechnol., 26(1): 101-6 (2008) for further discussion of somatic cell reprogramming. Modifications of the originally described methods, including use of proteins or small molecules to replace one or more of the factors, and/or to enhance reprogramming speed and/or efficiency are of use in the present invention. In some embodiments, transient transfection is used to introduce the reprogramming factors. In some embodiments, the reprogramming factors are introduced using an approach that avoids the use of viruses as vectors. For example, a non-integrating episomal vector may be used. In some embodiments a single multiprotein expression vector that comprises the coding sequences of two or more of the reprogramming factors (e.g., Klf4, Oct4 and Sox2) linked with 2A peptides is used. In some embodiments, a recombinase-excisable virus (or non-virus expression cassette) is used. See, e.g., Soldner, F., et al. Cell. 136(5):964-77, 2009. In some embodiments, molecules such as histone deacetylase inhibitors, methyltransferase inhibitors, Wnt pathway agonists, molecules that enhance expression of endogenous genes such as Oct4, Sox2, or molecules that can substitute for one or more reprogramming factors (e.g., Klf4), may be used. See, e.g., PCT/US2008/010249 (WO/2009/032194) and PCT/US2008/004516 (WO/2008/124133); Lyssiotis, et al., Proc Natl Acad Sci U S A. 106(22):8912-7, 2009. [0053] In some embodiments, methods of the invention are performed in vivo. In some embodiments, cells are obtained from an individual and subjected to a method of the invention ex vivo (outside the body).
[0054] In some embodiments, an agent that modulates expression or activity of an ESTF- bound miRNA is administered to an individual to treat a condition. In some embodiments, the condition is cancer.
[0055] In some embodiments, an agent that modulates activity or expression of an miRNA whose promoter is bound by Polycomb and that is expressed specifically or selectively in one or more cell lineages or cell types is of use to treat a disease, e.g., a disease
associated with altered or aberrant or inappropriate differentiation or dedifferentiation or development. In some embodiments, the condition is cancer.
[0056] The present invention further provides methods for treating a condition in an individual in need of treatment for a condition. In certain embodiments, somatic cell(s) are obtained and reprogrammed using a method of the invention, e.g., (i) an ESTF-bound miRNA or miRNA gene is modulated in reprogramming the cell to pluripotency; and/or (ii) the cell is reprogrammed to pluripotency and an ESTF-bound miRNA or miRNA gene is modulated in differentiating the resulting pluripotent cell to a desired cell type or linage; and/or (iii) an ESTF-bound miRNA or miRNA gene is modulated in differentiating the somatic cell to a desired cell type or linage without necessarily reprogramming the cell to pluripotency as an intermediate step. The invention further provides embodiments of the afore-mentioned methods applied to Polycomb group protein bound miRNA and miRNA genes. The invention further provides embodiments of the afore-mentioned methods applied to ES cells.
[0057] The reprogrammed cells may be expanded in culture. In certain embodiments of methods of treatment of the invention, cells are obtained from the individual to whom they or their progeny are eventually administered after manipulation ex vivo. [0058] Pluripotent reprogrammed cells (e.g., reprogrammed cells and/or their progeny that retain the property of pluripotency) may be maintained under conditions suitable for the cells to develop into cells of a desired cell type or cell lineage. In some embodiments, the cells are differentiated in vitro using protocols known in the art. The reprogrammed cells of a desired cell type are introduced into the individual to treat the condition. In certain embodiments, the somatic cells obtained from the individual contain a mutation in one or more genes. In these instances, in certain embodiments the somatic cells obtained from the individual are first treated to repair or compensate for the defect, e.g., by introducing one or more wild type copies of the gene(s) into the cells such that the resulting cells express the wild type version of the gene. The cells are then introduced into the individual. [0059] In certain embodiments, the somatic cells obtained from the individual are engineered to express one or more genes following their removal from the individual. The cells may be engineered by introducing a gene or expression cassette comprising a gene into the cells. The introduced gene may be one that is useful for purposes of identifying, selecting, and/or generating a reprogrammed cell. In certain embodiments the introduced
gene(s) contribute to initiating and/or maintaining the reprogrammed state or differentiating the cell to a desired cell type or lineage.
[0060] In certain embodiments the methods of the present invention can be used to treat, prevent, or stabilize a neurological disease such as Alzheimer's disease, Parkinson's disease, Huntington's disease, or ALS, lysosomal storage diseases, multiple sclerosis, or a spinal cord injury, diseases associated with muscle atrophy or dysfunction or damage. In some embodiments, human hematopoietic stem cells derived from cells reprogrammed according to the present invention may be used in medical treatments requiring bone marrow transplantation or replenishment of hematopoietic cells. Such cells are also of use to treat anemia, diseases that compromise the immune system such as AIDS, etc. In certain embodiments somatic cells obtained from an individual suffering from a disease and reprogrammed and/or differentiated in vitro using a method of the invention are used as an in vitro model system to study the disease and/or to identify agents (e.g., small molecules) useful for treating the disease. For example, small molecules can be screened to identify compounds that may be of use to treat a disease. It will be understood that cells used in an inventive method herein are usually descendants of the original cells obtained from a subject. [0061] Reprogrammed cells that produce a growth factor or hormone such as insulin, etc., may be administered to a mammal for the treatment or prevention of endocrine disorders. Reprogrammed epithelial cells may be administered to repair damage to the lining of a body cavity or organ, such as a lung, gut, exocrine gland, or urogenital tract. It is also contemplated that reprogrammed cells may be administered to a mammal to treat damage or deficiency of cells in an organ such as the bladder, brain, esophagus, fallopian tube, heart, intestines, gallbladder, kidney, liver, lung, ovaries, pancreas, prostate, spinal cord, spleen, stomach, testes, thymus, thyroid, trachea, ureter, urethra, or uterus.
[0062] Cells may be combined with a matrix to form a tissue or organ in vitro or in vivo that may be used to repair or replace a tissue or organ in a recipient mammal. In certain embodiments, methods of the invention can be used to treat individuals in need of a functional organ. In the methods, somatic cells are obtained from an individual in need of a functional organ, and reprogrammed using the methods of the invention to produce reprogrammed somatic cells. Such reprogrammed somatic cells are then cultured under conditions suitable for development of the reprogrammed somatic cells into a desired organ, which is then introduced into the individual.
[0063] The invention also relates in part to methods of performing chromatin immunoprecipitation (ChIP) experiments and to improvements in such methods. The invention also relates in part to methods for analysis of data obtained from chromatin immunoprecipitation (ChIP) experiments and improvements in such methods. In some embodiments the methods comprise sequencing of genomic DNA bound by a transcription factor in pluripotent cells and/or analysis of data obtained from such sequencing. In some embodiments, the methods allow for identification of sites to which a transcription factor binds to within a resolution of 25 base pairs (bp). The methods are of use to map binding sites of any TF of interest, e.g., using ChIP followed by sequencing (ChIP-Seq). In some embodiments such methods are of particular use in conjunction with high throughput DNA sequencing methods, e.g., methods that employ parallel sequencing of large numbers (e.g., millions) of fragments and/or obtaining large numbers of short reads (e.g., less than 100 bp in length). Such sequencing techniques can comprise sequencing by synthesis (e.g., using Solexa technology), sequencing by ligation (e.g., using SOLiD technology from Applied Biosystems), 454 technology, or pyrosequencing. In some embodiments thousands, tens of thousands or more sequencing reactions are performed in parallel, generating millions or even billions of bases of DNA sequence per "run". See, e.g., Shendure J & Ji H. Nat Biotechnol., 26(10): 1135-45, 2008, for a non-limiting discussion of some of these technologies. It will be appreciated that sequencing technologies are evolving and improving rapidly.
[0064] It is contemplated that the various embodiments described herein are applicable to the various aspects of the invention. It is also contemplated that the various embodiments of the invention and elements thereof can be combined with one or more other such embodiments and/or elements whenever appropriate. It is contemplated that various embodiments of the invention may be applied to any of the miRNA genes or promoters disclosed herein, or to any subset of the miRNA genes or promoters disclosed herein (e.g., those active in one or more cell lineages or cell types).
[0065] The publication entitled "Connecting microRNA genes to the core transcriptional regulatory circuitry of embryonic stem cells" (Marson, A., et al., Cell, Vol. 134, 1-13, August 8, 2008) is incorporated herein by reference.
[0066] All references, e.g., patent applications, patents, scientific articles, and other publications, cited herein are incorporated herein by reference. All databases and electronic
resources cited herein are incorporated herein by reference to the extent permitted by applicable patent laws and regulations.
Examples Overview of the Examples
[0067] MicroRNAs (miRNAs) are crucial for normal embryonic stem (ES) cell self- renewal and cellular differentiation, but how miRNA gene expression is controlled by the key transcriptional regulators of ES cells has not been established. We describe here a new map of the transcriptional regulatory circuitry of ES cells that incorporates both protein- coding and miRNA genes, and which is based on high-resolution ChIP-seq data, systematic identification of miRNA promoters, and quantitative sequencing of short transcripts in multiple cell types. We find that the key ES cell transcription factors are associated with promoters for most miRNAs that are preferentially expressed in ES cells and with promoters for a set of silent miRNA genes. This silent set of miRNA genes is co-occupied by Polycomb Group proteins in ES cells and expressed in a tissue-specific fashion in differentiated cells. These data reveal how key ES cell transcription factors promote the miRNA expression program that contributes to normal self-renewal and cellular differentiation, and integrate miRNAs and their targets into an expanded model of the regulatory circuitry controlling ES cell identity.
[0068] Embryonic stem (ES) cells hold significant potential for clinical therapies because of their distinctive capacity to both self-renew and differentiate into a wide range of specialized cell types. Understanding the transcriptional regulatory circuitry of ES cells and early cellular differentiation is fundamental to understanding human development and realizing the therapeutic potential of these cells. Transcription factors that control ES cell pluripotency and self-renewal have been identified ([Chambers and Smith, 2004] and [Niwa, 2007]), and a draft of the core regulatory circuitry by which these factors exert their regulatory effects on protein-coding genes has been described ([Boyer et al., 2005], [Loh et al., 2006], [Lee et al., 2006], [Boyer et al., 2006], [Jiang et al., 2008], [Cole et al., 2008], [Kim et al., 2008] and [Tarn et al., 2008]). MicroRNAs (miRNAs) are also likely to play key roles in ES cell gene regulation ([Kanellopoulou et al., 2005], [Murchison et al., 2005] and [Wang et al., 2007J), but little is known about how miRNAs participate in the core regulatory circuitry controlling self-renewal and pluripotency in ES cells.
[0069] Several lines of evidence indicate that miRNAs contribute to the control of early development. miRNAs appear to regulate the expression of a significant percentage of all genes in a wide array of mammalian cell types ([Lewis et al., 2005], [Lim et al., 2005], [Krek et al., 2005] and [Farh et al., 2005]). A subset of miRNAs is preferentially expressed in ES cells or embryonic tissue ([Houbaviy et al., 2003], [Suh et al., 2004], [Houbaviy et al., 2005] and [Mineno et al., 2006]). Dicer-deficient mice fail to develop (Bernstein et al., 2003), and ES cells deficient in miRNA-processing enzymes show defects in differentiation and proliferation ([Kanellopoulou et al., 2005], [Murchison et al., 2005] and [Wang et al., 2007]). Specific miRNAs have been shown to participate in mammalian cellular differentiation and embryonic development (Stefani and Slack, 2008). However, how transcription factors and miRNAs function together in the regulatory circuitry that controls early development has not yet been examined.
[0070] The major limitation in connecting miRNA genes to the core transcriptional circuitry of ES cells has been sparse annotation of miRNA gene transcriptional start sites and promoter regions. Mature miRNAs, which specify posttranscriptional gene repression, arise from larger transcripts that are then processed (Bartel, 2004). Over 400 mature miRNAs have been confidently identified in the human genome (Landgraf et al., 2007), but only a minority of the primary transcripts have been identified and annotated. Prior attempts to connect ES cell transcriptional regulators to miRNA genes have searched for transcription factor binding sites only close to the annotated mature miRNA sequences ([Boyer et al., 2005], [Loh et al., 2006] and [Lee et al., 2006]). Additionally, studies of the core transcriptional circuitry of ES cells have compared transcription factor occupancy to mRNA expression data but have not examined systematically miRNA expression in ES cells and differentiated cell types, limiting our knowledge of transcriptional regulation of miRNA genes in these cells ([Boyer et al., 2005], [Loh et al., 2006], [Lee et al., 2006], [Boyer et al., 2006], [Jiang et al., 2008], [Cole et al., 2008], [Kim et al., 2008] and [Tam et al., 2008]).
[0071] To incorporate miRNA gene regulation into the model of transcriptional regulatory circuitry of ES cells, we began by generating new, high-resolution, genome-wide maps of binding sites for key ES cell transcription factors using massive parallel sequencing of chromatin immunoprecipitation (ChIP-seq). These data reveal highly overlapping occupancy of Oct4, Sox2, Nanog, and Tcβ at the transcriptional start sites of miRNA transcripts, which we systematically mapped based on a method that uses chromatin
landmarks and transcript data. We then carried out quantitative sequencing of short transcripts in ES cells, neural precursor cells (NPCs). and mouse embryonic fibroblasts (MEFs), which revealed that Oct4, Sox2, Nanog, and TcB occupy the promoters of most miRNAs that are preferentially or uniquely expressed in ES cells. Our data also revealed that a subset of the Oct4/Sox2/Nanog/Tcf3-occupied miRNA genes are silenced in ES cells by Polycomb group proteins but are expressed later in development in specific lineages. High- resolution transcription factor location analysis, systematic mapping of the primary miRNA transcriptional start sites in mouse and human, and quantitative sequencing of miRNAs in three different cell types provide a valuable data resource for studies of the gene expression program in ES and other cells and the regulatory mechanisms that control cell fate. The data also produce an expanded model of ES cell core transcriptional regulatory circuitry that now incorporates transcriptional regulation of miRNAs, and posttranscriptional regulation mediated by miRNAs, into the molecular understanding of pluripotency and early cellular differentiation.
Example 1 : High-resolution genome-wide location analysis in ES cells with ChIP-seq
[0072] To connect miRNA genes to the core transcriptional circuitry of ES cells, we first generated high-resolution genome-wide maps of Oct4, Sox2, Nanog, and TcB occupancy (Figure 1 ; similar data were recently described for Oct4, Sox2, and Nanog [Chen et al., 2008]). ChIP-seq allowed us to map transcription factor binding sites and histone modifications across the entire genome at high resolution ([Barski et al., 2007], [Johnson et al., 2007], [Mikkelsen et al., 2007] and [Robertson et al., 2007]), and we optimized the protocol to allow for robust analysis of transcription factor binding in murine ES cells (Example 7). Oct4, Sox2, Nanog, and TcB were found to co-occupy 14,230 sites in the genome (Figures IA, Sl, and S2 and Tables S1-S3 (Tables S1-S3 are available in Marson, et al., 2008)). Approximately one quarter of these occurred within 8 kb of the transcription start site of 3,289 annotated genes, another one quarter occurred within genes but more than 8 kb from the start site, and almost half occurred in intergenic regions distal from annotated start sites (Example 7). Binding of the four factors at sites surrounding the Sox2 gene (Figure IB) exemplified two key features of the data: all four transcription factors co-occupied the identified binding sites and the resolution was sufficient to determine the DNA sequence associated with these binding events to a resolution of <25 bp. Composite analysis of all
bound regions provided higher resolution and suggested how these factors occupy their common DNA-sequence motif (Figure S4, Table S4). Knowledge of these binding sites provided data necessary to map these key transcription factors to the promoters of miRNA genes.
Example 2: Identification of miRNA Promoters
[0073] Imperfect knowledge of the start sites of primary miRNA transcripts has limited our ability to identify the transcription factor binding events that control miRNA gene expression in vertebrates. Previous strategies to identify the 5' ends of primary miRNAs have been hampered because they relied on isolation of transient primary miRNA transcript, required knowledge of the specific cell type in which each given miRNA is transcribed, or focused only on potential start sites proximal to mature miRNAs ([Lee et al, 2002], [Fukao et al., 2007], [Mikkelsen et al., 2007], [Zhou et al., 2007] and [Barrera et al., 2008]). To identify systematically transcriptional start sites for miRNA genes in the mouse and human genomes, we took advantage of the recent observation that histone H3 is trimethylated at its lysine 4 residue (H3K4me3) at the transcriptional start sites of most genes in the genome, even when genes are not productively transcribed, and the knowledge that this covalent modification is restricted to sites of transcription initiation ([Barski et al., 2007] and [Guenther et al., 2007]). We used the genomic coordinates of the H3K4me3-enriched loci derived from multiple cell types (Table S5; [Barski et al., 2007], [Guenther et al., 2007] and [Mikkelsen et al., 2007]) to create a library of candidate transcription start sites in both human and mouse (Figures 2 and S5).
[0074] High-confidence promoters were identified for over 80% of miRNAs in both mouse and human (Figures 2, S5, and Tables S6 and S7). These promoters were associated with 185 murine primary microRNA transcripts (pri-miRNAs) (specifying 336 mature miRNAs) and 294 human pri-miRNAs (specifying 441 mature miRNAs) (Tables S6 and S7). To identify promoters for miRNA genes, the association of candidate transcriptional start sites with regions encoding mature miRNAs was scored based on proximity to annotated mature miRNA sequences (Landgraf et al., 2007), available EST data, and conservation between species (Figures 2A, S5, and S6 and Example 7). Notably, existing EST data provided evidence that the predicted transcripts do in fact originate at the identified start sites and continue through the annotated loci of mature miRNAs (Figures 2B, S5, and S6).
[0075] Four additional lines of evidence indicated that this approach identified genuine transcriptional start sites for miRNA genes. In addition to the chromatin signature of promoters, a high fraction of these regions contained CpG islands, a DNA sequence element often associated with promoters (Figure 2B and Tables S6 and S7). Second, in some instances where evidence of primary miRNA transcripts, which may be present only transiently before processing, was not available in published databases at the identified transcriptional start sites, chromatin marks associated with transcriptional elongation including nucleosomes methylated at H3 lysine 36 (H3K36me3) and H3 lysine 79 (H3K79me2) provided evidence that such transcripts are actively produced (Figure 2C and Mikkelsen et al., 2007). Third, most miRNA promoters showed evidence of H3K4me3 enrichment in multiple tissues, as observed at the promoters of most protein-coding genes ([Barski et al., 2007], [Guenther et al., 2007] and [Heintzman et al., 2007]) (Figure 2D). Finally, there was a high degree of correlation (8/10) between the identified miRNA transcriptional start sites and those that have been mapped previously by other methods (Example 7).
Example 3: Occupancy of miRNA Promoters by Core ES Cell Transcription Factors
[0076] The binding sites of the ES cell transcription factors Oct4, Sox2, Nanog, and Tcf3 were next mapped to these high-confidence miRNA promoters (Figure 3). In murine ES cells, Oct4, Sox2, Nanog, and Tcf3 co-occupied the promoters for 55 distinct miRNA transcription units, which included three clusters of miRNAs that are expressed as large polycistrons, thus suggesting that these regulators have the potential to directly control the transcription of 81 distinct mature miRNAs (Figure 3 A and Table S6). This set of miRNAs occupied by Oct4/Sox2/Nanog/Tcf3 represented roughly 20% of annotated mammalian miRNAs, similar to the 20% of protein-coding genes that were bound at their promoters by these key transcription factors (Table S2).
[0077] To determine if transcription factor occupancy of miRNA promoters is conserved across species, we performed genome-wide location analysis for Oct4 in human ES cells using microarray-based analysis. We found extensive conservation of the set of miRNA genes that were occupied at their promoters by Oct4, as exemplified by the mir-302 cluster (Figures 3A and 3B and Tables S7 and S8). Transcription factor occupancy does not mean necessarily that the adjacent gene is regulated by that factor; conserved transcription factor
occupancy of a promoter, however, suggests gene regulation by that factor. Thus, our data identified a set of miRNA genes that are bound at their promoters by key ES cell transcription factors in mouse and human cells (Figure 3C), suggesting that core ES cell transcription factor occupancy of these particular miRNA genes has functional significance. [0078] The dependence of Oct4/Sox2/Nanog/Tcf3 -bound miRNA genes on the core ES cell transcription factors was assessed by examining changes in miRNA expression following perturbation of individual transcription factors. First, we studied the effects of Oct4 depletion in ZHBTc4 ES cells, which allow for conditional repression of Oct4 with doxycycline treatment (Niwa et al., 2000). miRNA expression was examined at 12 and 24 hr after doxycycline treatment, w7hen Oct4 was silenced but the cells remained ES-like morphologically and still expressed Sox2 (Figure S7). Effects on miRNA regulation were measured globally using quantitative sequencing of short RNAs (18-30 nucleotides) in these cells (Morin et al., 2008). Although mature miRNAs have exceptionally long half-lives, small reductions in the levels of the majority of Oct4/Sox2/Nanog/Tcf3 -bound miRNAs were observed (Figure S7E). To measure more directly the transcriptional activity of Oct4/Sox2/Nanog/Tcβ -occupied miRNA genes, the levels of primary miRNA transcripts were assessed by quantitative PCR (Figures 3D and S7). All five of the primary miRNAs examined here were reduced significantly upon Oct4 depletion (Figure 3D). [0079] TcB has been shown to repress the expression of key pluripotency protein-coding genes in ES cells under standard culture conditions ([CoIe et al., 2008], [Tam et al., 2008] and [Yi et al., 2008]). Following shRNA-mediated knockdown of Tcf3 in ES cells, qPCR revealed that levels of primary transcripts for ES cell miRNA genes pri-mir-290-295 and the pri-mir-302 cluster were elevated, though this effect was modest (Figure S8). In summary, the observation that these key ES cell miRNAs were generally downregulated in response to Oct4 depletion and upregulated upon TcO depletion, as would be predicted based on published effects of these factors on protein-coding genes, demonstrates that the core transcriptional circuitry of ES cells can play a functional role in the regulation of miRNA genes.
Example 4: Regulation of Oct4/Sox2/Nanog/Tcf3-Bound miRNA Genes during Differentiation
[0080] Oct4 and Nanog are silenced as ES cells begin to differentiate ([Chambers and Smith, 2004] and [Niwa, 2007]). If Oct4/Sox2/Nanog/Tcf3 are required for activation or repression of its target miRNAs, the targets should be differentially expressed when ES cells are compared to a differentiated cell type. To test this hypothesis, Solexa sequencing of 18- 30 nucleotide transcripts in ES cells, MEFs, and NPCs was performed to obtain quantitative information on the abundance of miRNAs in pluripotent cells relative to two differentiated cell types (Figure 4A and Table Sl). In each cell type examined, a small subset of miRNAs predominated, with pronounced changes in miRNA abundance observed among the cell types (Example 7).
[0081] When the abundance of all miRNAs bound by Oct4/Sox2/Nanog/Tcβ in ES cells was examined in the three cell types, approximately half of the miRNAs dropped more than an order of magnitude in abundance in MEFs and NPCs relative to ES cells, as predicted (Figure 4B). A small number of the Oct4/Sox2/Nanog/Tcf3-occupied miRNAs, which will be further discussed below, were scarce in ES cells and showed increased abundance in MEFs and NPCs.
[0082] Oct4/Sox2/Nanog/Tcβ-occupied miRNAs were, in general, preferentially expressed in embryonic stem cells (Figure 4C). Whereas most miRNAs are unchanged in expression in ES cells relative to MEFs or NPCs, a significant portion of Oct4/Sox2/Nanog/Tcf3 -occupied miRNAs are 100-fold more abundant in ES cells than in MEFs (p < 5 x 10" 15), and 1000-fold more abundant in ES cells than in NPCs (p < 5 x 10^9). This group of Oct4/Sox2/Nanog/TcO -bound miRNAs that was significantly more abundant in ES cells than in NPCs and MEFs was also found to be actively expressed in induced pluripotent stem (iPS) cells (generated as described in Wernig et al., 2007), at levels comparable to those in ES cells (Figure S9). This is consistent with our evidence (Figures 3, S7, and S8) that core ES cell transcription factors regulate the expression of these miRNAs in pluripotent cells.
Example 5: Polyconib Group Proteins Co-Occupy Tissue-Specific miRNAs that are Silenced in ES Cells
[0083] Previous studies have revealed thai core ES cell transcription factors occupy a set of transcriptionally active genes but also occupy, with Polycomb group proteins, a set of transcriptionally repressed genes that are poised for expression upon cellular differentiation (TLee et al.. 2006]. \ Bernstein et al, 20061 and [Bover et al., 20061). As noted above, Oct4/Sox2/Nanog/Tcβ similarly occupied a group of miRNA genes that were transcriptionally inactive in ES cells but were activated selectively in particular differentiated cell types (Figure 4B). We reasoned that Polycomb complexes might also co-occupy Oct4/Sox2/Nanog/Tcf3 -occupied miRNA genes that are inactive in ES cells. Indeed, new ChIP-seq data for the Polycomb group protein Suzl2 in murine ES cells supported this hypothesis (Figure 5A and Tables S6, S7, and SlO). As expected, these promoters were also enriched for nucleosomes with histone H3K27me3, a chromatin modification catalyzed by Polyeomb group proteins (Figure 5A and Table S6 and Mikkelsen et al., 2007), In keeping with the repressive function of the Polycomb group proteins reported at protein-coding genes, miRNAs occupied at their promoters by Suzl2 in ES cells were significantly less abundant in ES cells compared to all other miRNAs (Figure 5B). Approximately one quarter of the Oct4/Sox2/Nanog/TcO -occupied miRNAs belonged to the repressed set of miRNA genes bound by Suzl2 in murine ES cells (Tables S6 and S7).
[0084] To further examine the behavior of this set of miRNAs during embryonic cell-fate commitment, we returned to our quantitative sequencing data of short transcripts in ES cells, MEFs, and NPCs (Figure 5C). Notably, miRNAs that were bound by Polycomb group proteins in ES cells were among the transcripts that were specifically induced in each of the differentiated cell types. For example, transcript levels of miR-9, a miRNA previously identified in neural cells and that promotes neural differentiation ([Lagos-Quintana et al., 2002] and [Krichevsky et al., 2006]), were significantly elevated in NPCs relative to ES cells, but this miRNA remained scarce in MEFs. Similarly, miR-218 and miR-34b/34c expression was induced in MEFs but remained at low levels in NPCs (Figure 5C). Consistent with Polycomb-mediated repression of these lineage-specific miRNAs, the repressive chromatin mark deposited by Polycomb group proteins, H3K27me3, was selectively lost at the promoters of the miRNAs in the cells in which they were induced (Figure 5C and Mikkelsen et al., 2007).
[0085] The tissue-specific expression pattern of miRNAs repressed by Polycomb in ES cells is consistent with these miRNAs serving as determinants of cell-fate decisions in a manner analogous to the developmental regulators whose genes are repressed by Polycomb in ES cells ([Lee et al, 2006], [Bernstein et al., 2006] and [Boyer et al., 2006]). Such a function in cell-fate determination would require that these miRNAs remain silenced in pluripotent ES cells. Indeed, the miRNAs that were repressed in ES cells by Polycomb group proteins appear to be induced, later in development, in a highly restricted subset of differentiated tissues specific to each miRNA (Figure SlO), unlike the majority of miRNAs identified in mouse (Landgraf et al., 2007). The miRNAs with promoters bound by Polycomb group proteins in ES cells were significantly enriched (p < 0.005) among the set of the most tissue-specific mammalian miRNAs (Figure SlO and Landgraf et al., 2007). This suggests a model whereby Polycomb group proteins repress a set of tissue-specific miRNA genes in ES cells, a subset of which are co-occupied by Oct4, Sox2, Nanog, and TcD (Figure 5D).
Example 6: Discussion
[0086] Here we provide new high-resolution, genome-wide maps of core ES cell transcription factors, identify promoter regions for most miRNA genes, and deduce the association of the ES cell transcription factors with these miRNA genes. We also provide quantitative sequence data of short RNAs in ES cells, NPCs, and MEFs to examine changes in miRNA transcription. The key transcriptional regulators in ES cells collectively occupied the promoters of many of the miRNAs that were most abundant in ES cells, including those that were downregulated as ES cells differentiate. In addition, these factors also occupied the promoters of a second, smaller set of miRNAs that were repressed in ES cells and were selectively expressed in specific differentiated cell types. In ES cells, this second group of miRNAs were co-occupied by Polycomb group proteins, which are also known to silence key lineage-specific, protein-coding developmental regulators. Together these data reveal two key groups of miRNAs that are direct targets of Oct4/Sox2/Nanog/Tcf3 : one group of miRNAs that is preferentially expressed in pluripotent cells and a second, Polycomb- occupied group that is silenced in ES cells and is poised to contribute to cell-fate decisions during mammalian development.
miRNΛ Contribution to ES Cell Identity
[0087] Several miRNA polycistrons, which encode the most abundant miRNAs in ES cells and which are silenced during early cellular differentiation ([Houbaviy et al., 2003], [Houbaviy et al., 2005] and [Suh et al., 2004]), were occupied at their promoters by Oct4, Sox2, Nanog, and Tcβ. The most abundant in murine ES cells was the mir-290-295 cluster, which contains multiple mature miRNAs with seed sequences similar or identical to those of the miRNAs in the mir-302 cluster and the mir-17-92 cluster. miRNAs with the same seed sequence also predominate in human embryonic stem cells (Laurent et al., 2008). miRNAs in this family have been implicated in cell proliferation ([O'Donnell et al., 2005], [He et al., 2005] and [Voorhoeve et al., 2006]), consistent with the impaired self-renewal phenotype observed in miRNA-deficient ES cells ([Kanellopoulou et al., 2005], [Murchison et al., 2005] and [Wang et al., 2007]). Additionally, the zebrafish homolog of this miRNA family, miR- 430, contributes to the rapid degradation of maternal transcripts in early zygotic development (Giraldez et al., 2006), and mRNA expression data suggest that this miRNA family also promotes the clearance of transcripts in early mammalian development (Farh et al., 2005). [0088] In addition to promoting the rapid clearance of transcripts as cells transition from one state to another during development, miRNAs also likely contribute to the control of cell identity by fine-tuning the expression of genes. miR-430, the zebrafish homolog of the mammalian mir-290-295 family, serves to precisely tune the levels of Nodal antagonists Lefty 1 and Lefty 2 relative to Nodal, a subtle modulation of protein levels that has pronounced effects on embryonic development (Choi et al., 2007). Recently, a list of 250 murine ES cell mRNAs that appear to be under the control of miRNAs in the miR-290-295 cluster was reported (Sinkkonen et al., 2008). This study reports that Lefty 1 and Lefty2 are evolutionarily conserved targets of the miR-290-295 miRNA family. These miRNAs also maintain the expression of de novo DNA methyltransferases 3a and 3b (Dnmt3a and Dnmt3b), perhaps by dampening the expression of the transcriptional repressor Rbl2, helping to poise ES cells for efficient methylation of Oct4 and other pluripotency genes during differentiation.
[0089] Knowledge of how the core transcriptional circuitry of ES cells connects to both miRNAs and protein-coding genes reveals recognizable network motifs downstream of Oct4/Sox2/Nanog/TcD, involving both transcriptional and posttranscriptional regulation, that provide new insights into how this circuitry controls ES cell identity (Figure 6). Leftyl and
Lefty2, both actively expressed in ES cells, are directly occupied at their promoters by Oct4/Sox2/Nanog/TcO. mir-290-295, which is also directly occupied by Oct4/Sox2/Nanog/Tcf3, depends on Oct4 for proper expression (Figure 3D). Therefore, core ES cell transcription factors appear to promote the active expression of Lefty 1 and Lefty2 but also fine-tune the expression of these important signaling proteins by activating a family of miRNAs that target the Lefty 1 and Lefty2 3'UTRs. This network motif whereby a regulator exerts both positive and negative effects on its target, termed "incoherent feed-forward" regulation (Alon, 2007), provides a mechanism to fine-tune the steady-state level or kinetics of a target's activation (Figure 6A). Over a quarter of the proposed targets of the miR-290- 295 miRNAs (Sinkkonen et al., 2008) are likely under the direct transcriptional control of Oct4/Sox2/Nanog/Tcf3 based on our binding maps, suggesting that these miRNAs could participate broadly in tuning the effects of ES cell transcription factors (Figure 6A and Example 7).
[0090] The miRNA expression program directly downstream of Oct4/Sox2/Nanog/Tcf3 could help poise ES cells for rapid and efficient differentiation, consistent with the phenotype of miRNA-deficient cells ([Kanellopoulou et al., 2005], [Murchison et al, 2005] and [Wang et al., 2007]). Oct4/Sox2/Nanog/Tcf3 likely contributes to this poising by their occupancy of the Let-7g promoter. Mature Let-7 transcripts are scarce in ES cells but were among the most abundant miRNAs in both MEFs and NPCs (Figure 3). Primary pvi-Let-7g transcript is abundant in ES cells, but its maturation is blocked by Lin28 (Viswanathan et al., 2008). We now report that the promoters of both Let-7g and Lin28 are occupied by Oct4/Sox2/Nanog/TcO, suggesting that the core ES cell transcription factors promote the transcription of both primary pτi-Let- 7g and Lin28, which blocks the maturation of Let-7g. Indeed, proper expression of pή-Let-7g is dependent on Oct4 (Figure 3D). In this way Let-7 and Lin28 participate in an incoherent feed-forward circuit downstream of Oct4/Sox2/Nanog/Tcf3 to contribute to rapid cellular differentiation (Figure 6B). Notably, ectopic expression of Lin28 in human fibroblasts promotes the induction of pluripotency (Yu et al., 2007), suggesting that blocked maturation of pri-Let- 7 transcripts plays an important role in the pluripotent state. Additionally, Dnmt3a and Dnmt3b, which are indirectly upregulated by the miR-290-295 miRNAs (Sinkkonen et al., 2008), are also occupied at their promoters by Oct4/Sox2/Nanog/Tcf3, providing examples of "coherent" regulation of
important target genes by ES cell transcription factors and the ES cell miRNAs maintained by those transcription factors (Figure 6C). [0091 ] Multilayer Regulatory Circuitry of ES Cell Identity
[0092] The regulatory circuitry we present for miRNAs in ES cells can now be integrated into the model of core regulatory circuitry of pluripotency that we have proposed previously (TBoyer et al., 2005], [Lee et ah, 2006] and [Cole et al., 2008]), as illustrated in Figure 7. Our data reveal that Oct4, Sox2, Nanog, and TcO occupy the promoters of two key sets of miRNAs, similar to the two sets of protein-coding genes regulated by these factors: one set that is actively expressed in pluripotent ES cells and another that is silenced in these cells by Polycomb group proteins and whose later expression might serve to facilitate establishment or maintenance of differentiated cell states.
[0093] The expanded circuit diagram presented here integrates transcription factor occupancy of miRNA genes and existing data on miRNA targets into our model of the molecular control of the pluripotent state. These data suggest that miRNAs that are activated in ES cells by Oct4/Sox2/Nanog/TeO serve to modulate the direct effects of these transcription factors, participating in incoherent feed-forward regulation to tune levels of key genes and modifying the gene expression program to help poise ES cells for efficient differentiation. Thus, the core ES cell transcription factors and the miRNAs under their control coordinately contribute transcriptional and posttranscriptional gene regulation to the network that maintains ES cell identity. [0094] Concluding Remark
[0095] Knowledge of how protein-coding genes are controlled by key ES cell transcription factors and chromatin regulators has provided important insights into the molecular control of ES cell identity and cellular reprogramming (Jaenisch and Young,
). This knowledge also has begun to shed light on human disease, as elements of the ES cell gene expression program are recapitulated in cancer cells ([Wong et al., 2008] and [Ben- Porath et al., 2008]). We now connect miRNA genes to the core circuitry of ES cells with high-resolution genome- wide ChIP-seq data and quantitative sequencing of short transcripts in multiple cell types. This information should prove useful as investigators continue to probe the role of miRNAs in pluripotency, cell state, disease, and regenerative medicine.
[0096] Experimental Procedures for Examples 1-6 [0097] Cell Culture
[0098] V6.5 (C57BL/6-129) murine ES cells were grown under typical ES cell conditions (see Example 7) on irradiated MEFs. For location analysis, cells were grown for one passage off of MEFs on gelatinized tissue-culture plates. NPCs derived from V6.5 ES cells and MEFs prepared and cultured from DR-4 strain mice were grown using standard protocols as previously described (see Example 7). ZHBTc4 cells harboring a doxycycline- repressible Oct4 allele (Ni wa et al., 2000), a gift from A. Smith, were cultured under standard ES cell conditions on gelatin. Cultures were treated with 2 μg/ml doxycycline
(SIGMA, D-9891) for 12 hr or 24 hr.
[0099] ChIP-seq
[00100] Detailed descriptions of antibodies, antibody specificity, and ChIP methods used in this study have been published previously and are provided in Example 7.
[00101] Purified immunoprecipitated DNA was prepared for sequencing according to a modified version of the Solexa Genomic DNA protocol. Fragmented DNA was end repaired and subjected to 18 cycles of linker-mediated (LM)-PCR using oligos purchased from
Illumina. Amplified fragments between 150 and 300 bp were isolated by agarose gel electrophoresis and purified. Fligh-quality samples were confirmed by the appearance of a smooth smear of fragments from 100-1000 bp with a peak distribution between 150 and 300 bp. Three nanograms of linker-ligated DNA was applied to the flow-cell using the Solexa
Cluster Station fluidics device. Samples were then subjected to 26 bases of sequencing according to Illumina's standard protocols.
[00102] Images acquired from the Solexa sequencer were processed through the bundled
Solexa image extraction pipeline and aligned to both mouse NCBI build 36 and 37 using
ELAND. Analysis is described in Example 7.
[00103] Whole-Genome Array Design and Data Extraction
[00104] The design of the oligonucleotide-based whole-genome array set and data extraction methods are described in Lee et al., 2006, The microarrays used for location analysis in this study were manufactured by Agilent Technologies (http://www.agilent.com).
[00105] Quantitative Short RNA Sequencing
[00106] A method of cloning the 18-30 nt transcripts previously described (Lau et al, was modified to allow for Solexa (Illumina) sequencing (unpublished data). RNA extraction was performed using Trizol, followed by RNeasy purification (QIAGEN) for rnES cells, MEFs and NPCs. Single-stranded cDNA libraries of short transcripts were generated using size-selected RNA from mES cells, NPCs, MEFs, and ZHBTc4 cells (see Example 7). Single-stranded DNA samples were resuspended in 10 mM Tris (EB buffer)/0.1% Tween and used as indicated in the standard Solexa sequencing protocol (Illumina). [00107] Quantitative PCR of Primary miRNAs
[00108] Real-time PCR primers were designed using the standard specifications of PrimerExpress (Applied Biosystems) to amplify regions within the .^200 nt immediately upstream of the tested miRNA hairpins or in the middle oϊmir -290-295 polycistron but outside of any miRNA hairpin regions (Example 7 and Figure S8B). Primers were used in SYBR Green quantitative PCR assays on the Applied Biosystems 7500 Real Time PCR system. Expression levels were calculated relative to Gapdh mRNA levels, which were quantified in parallel by Taqman analysis. Detailed methods and primer sequences can be found in Example 7. [00109] References
Alon, 2007 U. Alon, Network motifs: theory and experimental approaches, Nat. Rev, Genet, 8 (2007), pp. 450-461.
Barrera et al., 2008 L.O. Barrera, Z. Li, A.D. Smith, K.C. Arden, W.K. Cavenee, M.Q. Zhang, R.D. Green and B. Ren, Genome- wide mapping and analysis of active promoters in mouse embryonic stem cells and adult organs, Genome Res. 18 (2008), pp. 46-59. Barski et al.. 2007 A. Barski, S. Cuddapah, K. Cui, T. Y. Roh, D.E. Schones, Z. Wang, G. Wei, I. Chepelev and K. Zhao, High-resolution profiling of histone methylations in the human genome, Cell 129 (2007), pp. 823-837.
Bartel, 2004 D. P. Bartel, MicroRNAs: genomics, biogenesis, mechanism, and function, Cell 116 (2004), pp. 281-297.
Ben-Porath et al., 2008 I. Ben-Porath, M.W. Thomson, VJ. Carey, R. Ge, G.W. Bell, A. Regev and R.A. Weinberg, An embryonic stem cell-like gene expression signature in poorly differentiated aggressive human tumors, Nat. Genet. 40 (2008), pp. 499-507. Bernstein et al..
2006 B.E. Bernstein, T.S. Mikkelsen, X. Xie, M. Kamal, D.J. Huebert, J. Cuff, B. Fry, A. Meissner, M. Wernig and K. Plath et al, A bivalent chromatin structure marks key developmental genes in embryonic stem cells, Cell 125 (2006), pp. 315-326.
Bernstein et al., 2003 E. Bernstein, S.Y. Kim, M.A. Carmell, E.P. Murchison, H. Alcorn, M.Z.
Li, A.A. Mills, SJ. Elledge, K. V. Anderson and G.J. Hannon, Dicer is essential for mouse development, Nat. Genet, 35 (2003), pp. 215-217.
Bover et al., 2005 L.A. Boyer, T.I. Lee, M.F. Cole, S.E. Johnstone, S.S. Levine, J.P. Zucker,
M. G. Guenther, R.M. Kumar, H.L. Murray and R.G. Jenner et al, Core transcriptional regulatory circuitry in human embryonic stem cells, Cell 122 (2005), pp. 947-956.
Bover et al., 2006 L.A. Boyer, K. Plath, J. Zeitlinger, T. Brambrink, L.A. Medeiros, T.I. Lee,
S.S. Levine, M. Wernig, A. Tajonar and M. K. Ray et al, Polycomb complexes repress developmental regulators in murine embryonic stem cells, Nature 441 (2006), pp. 349-353.
Chambers and Smith, 2004 I. Chambers and A. Smith, Self-renewal of teratocarcinoma and embryonic stem cells, Oncogene 23 (2004), pp. 7150-7160.
Chen et al., 2008 X. Chen, H. Xu, P. Yuan, F. Fang, M. Huss, V.B. Vega, E. Wong, Y.L.
Orlov, W. Zhang and J. Jiang et al , Integration of external signaling pathways with the core transcriptional network in embryonic stem cells, Cell 113 (2008), pp. 1 106-1 1 17. Choi et al.,
2007 W. Y. Choi, A.J. Giraldez and A. F. Schier, Target protectors reveal dampening and balancing of Nodal agonist and antagonist by miR-430, Science 318 (2007), pp. 271-274. Cole et al., 2008 M.F. Cole, S.E. Johnstone, J.J. Newman, M.H. Kagey and R. A. Young, Tcf3 is an integral component of the core regulatory circuitry of embryonic stem cells, Genes Dev. 22 (2008), pp. 746-755.
Farh et al., 2005 K.K. Farh, A. Grimson, C. Jan, B.P. Lewis, W.K. Johnston, LP. Lim, CB.
Burge and D. P. Bartel, The widespread impact of mammalian MicroRNAs on mRNA repression and evolution, Science 310 (2005), pp. 1817-1821.
Fukao et al., 2007 T. Fukao, Y. Fukuda, K. Kiga, J. Sharif, K. Hino, Y. Enomoto, A.
Kawamura, K. Nakamura, T. Takeuchi and M. Tanabe, An evolutionarily conserved mechanism for microRNA-223 expression revealed by microRNA gene profiling, Cell 129
(2007), pp. 617-631.
Giraldez et al., 2006 A.J. Giraldez, Y. Mishima, J. Rihel, R.J. Grocock, S. Van Dongen, K.
Inoue, A.J. Enright and A. F. Schier, Zebrafish MiR-430 promotes deadenylation and clearance of maternal mRNAs, Science 312 (2006), pp. 75-79.
Guenther et al,, 2007 M. G. Guenther, S. S. Levine, L.A. Boyer, R. Jaenisch and R.A. Young, A chromatin landmark and transcription initiation at most promoters in human cells, Cell 130 (2007), pp. 77-88.
Grimson et al,, 2007 A. Grimson, K.K. Farh, W. K. Johnston, P. Garrett-Engele, L. P. Lim and D. P. Bartel, MicroRNA targeting specificity in mammals: determinants beyond seed pairing, MoI. Cell 27 (2007), pp. 91-105.
He et al.. 2005 L. He, J.M. Thomson, M.T. Hemann, E. Hernando-Monge, D. Mu, S. Goodson, S. Powers, C. Cordon-Cardo, S. W. Lowe, G.J. Hannon and S. M. Hammond, A microRNA polycistron as a potential human oncogene, Nature 435 (2005), pp. 828-833. Heintzman et al., 2007 N.D, Heintzman, R.K. Stuart, G. Hon, Y. Fu, CW. Ching, R.D. Hawkins, L. O. Barrera, S. Van Calcar, C. Qu and K. A. Ching et al, Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome, Nat. Genet. 39 (2007), pp. 31 1-318.
Houbaviy et al., 2003 H. B. Houbaviy, M. F. Murray and P. A. Sharp, Embryonic stem cell- specific MicroRNAs, Dev. Cell 5 (2003), pp. 351-358.
Houbaviy et al., 2005 H. B. Houbaviy, L. Dennis, R. Jaenisch and P. A. Sharp, Characterization of a highly variable eutherian microRNA gene, RNA 11 (2005), pp. 1245-1257. Jaenisch and Young, 2008 R. Jaenisch and R. Young, Stem cells, the molecular circuitry of pluripotency and nuclear reprogramming, Cell 132 (2008), pp. 567-582. Jiane et al., 2008 J. Jiang, Y.S. Chan, Y.H. Loh, J, Cai, G.Q. Tong, CA. Lim, P. Robson, S. Zhong and H. H. Ng, A core KIf circuitry regulates self-renewal of embryonic stem cells, Nat. Cell Biol 10 (2008), pp. 353-360.
Johnson et al., 2007 D. S. Johnson, A. Mortazavi, R. M. Myers and B. Wold, Genome-wide mapping of in vivo protein-DNA interactions, Science 316 (2007), pp. 1497-1502, Kanellopoulou et al., 2005 C Kanellopoulou, S. A. Muljo, A. L. Kung, S. Ganesan, R. Drapkin, T. Jenuwein, D. M. Livingston and K. Rajewsky, Dicer-deficient mouse embryonic stem cells are defective in differentiation and centromeric silencing, Genes Dev. 19 (2005), pp. 489-501. Kim et al., 2008 J. Kim, J. Chu, X. Shen, J. Wang and S. H. Orkin, An extended transcriptional network for pluripotency of embryonic stem cells, Cell 132 (2008), pp. 1049-1061. Krek et al., 2005 A. Krek, D. Grun, M.N. Poy, R. Wolf, L. Rosenberg, E.J. Epstein, P. MacMenamin, I. da Piedade, K. C Gunsalus, M. Stoffel and N. Rajewsky, Combinatorial microRNA target predictions, Nat. Genet. 37 (2005), pp. 495-500.
Krichevsky et al., 2006 A.M. Krichevsky, K.C, Sonntag, O. Isacson and K. S. Kosik, Specific microRNAs modulate embryonic stem cell-derived neurogenesis, Stem Cells 24 (2006), pp. 857-864.
Lagos-Quintana et al., 2002 M. Lagos-Quintana, R. Rauhut, A. Yalcin, J. Meyer, W. Lendeckel and T. Tuschl, Identification of tissue-specific microRNAs from mouse, Curr. Biol. 12 (2002), pp. 735-739.
Landgraf et al., 2007 P. Landgraf, M. Rusu, R. Sheridan, A. Sewer, N. Iovino, A. Aravin, S. Pfeffer, A. Rice, A. O. Kamphorst and M. Landthaler et al, A mammalian microRNA expression atlas based on small RNA library sequencing, Cell 129 (2007), pp. 1401-1414. Lau et al., 2001 N. C. Lau, L. P. Lim, E.G. Weinstein and D. P. Bartel, An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans, Science 294 (2001), pp. 858- 862.
Laurent et al., 2008 L.C. Laurent, J. Chen, I. Ulitsky, FJ. Mueller, C. Lu, R. Shamir, J.B. Fan and J. F. Loring, Comprehensive microRNA profiling reveals a unique human embryonic stem cell signature dominated by a single seed sequence, Stem Cells 26 (2008), pp. 1506-1516. Lee et al., 2006 T.I. Lee, R.G. Jenner, L.A. Boyer, M.G. Guenther, S. S. Levine, R.M. Kumar, B. Chevalier, S. E. Johnstone, M. F. Cole and K. Isono et al, Control of developmental regulators by Polycomb in human embryonic stem cells, Cell 125 (2006), pp. 301-313. Lee et al., 2002 Y. Lee, K, Jeon, J.T. Lee, S. Kim and V.N. Kim, MicroRNA maturation: stepwise processing and subcellular localization, EMBO J. 21 (2002), pp. 4663-4670. Lewis et al., 2005 B. P. Lewis, CB. Burge and D. P. Bartel, Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets, Cell 120 (2005), pp. 15-20.
Lim et al., 2005 L.P. Lim, N.C. Lau, P. Garrett-Engele, A. Grimson, J.M. Schelter, J. Castle, D. P. Bartel, P.S. Linsley and J.M. Johnson, Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs, Nature 433 (2005), pp. 769-773. Loh et al., 2006 Y.H. Loh, Q. Wu, J.L. Chew, V.B. Vega, W. Zhang, X. Chen, G. Bourque, J. George, B. Leong and J. Liu et al, The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells, Nat, Genet. 38 (2006), pp. 431-440. Mikkelsen et al., 2007 T. S. Mikkelsen, M. Ku, D.B. Jaffe, B. Issac, E. Lieberman, G. Giannoukos, P. Alvarez, W. Brockman, T.K. Kim and R.P. Koche et al, Genome-wide maps of chromatin state in pluripotent and lineage-committed cells, Nature 448 (2007), pp. 553-560.
Mineno et al., 2006 J. Mineno, S. Okamoto, T. Ando, M. Sato, H. Chono, H. Izu, M.
Takayama, K. Asada, O. Mirochnitchenko, M, Inouye and I. Kato, The expression profile of microRNAs in mouse embryos, Nucleic Acids Res. 34 (2006), pp. 1765-1771.
Morin et al,, 2008 R.D. Morin, M.D. O'Connor, M. Griffith, F. Kuchenbauer, A. Delaney, A.L.
Prabhu, Y. Zhao, H. McDonald, T. Zeng and M. Hirst et al, Application of massively parallel sequencing to microRNA profiling and discovery in human embryonic stem cells, Genome
Res, 18 (2008), pp. 610-621.
Murchison et al., 2005 E. P. Murchison, J. F. Partridge, O. H. Tarn, S. Cheloufi and G.J. Hannon,
Characterization of Dicer-deficient murine embryonic stem cells, Proc. Natl, Acad. Sci. USA
102 (2005), pp. 12135-12140.
Niwa, 2007 H. Niwa, How is pluripotency determined and maintained?, Development 134
(2007), pp. 635-646.
Niwa et al., 2000 H. Niwa, J. Miyazaki and A.G. Smith, Quantitative expression of Oct-3/4 defines differentiation, dedifferentiation or self-renewal of ES cells, Nat. Genet. 24 (2000), pp.
372-376.
O'Donnell et al., 2005 K.A. O'Donnell, E.A. Wentzel, K.I. Zeller, CV. Dang and J.T. Mendell, c-Myc-regulated microRNAs modulate E2F1 expression, Nature 435 (2005), pp. 839-843.
Robertson et al., 2007 G. Robertson, M. Hirst, M. Bainbridge, M. Bilenky, Y. Zhao, T. Zeng,
G. Euskirchen, B. Bernier, R. Varhol and A. Delaney et al , Genome-wide profiles of STATl
DNA association using chromatin immunoprecipitation and massively parallel sequencing,
Nat. Methods 4 (2007), pp. 651-657.
Sinkkonen et al., 2008 L. Sinkkonen, T. Hugenschmidt, P. Berninger, D. Gaidatzis, F. Mohn,
CG. Artus-Revel, M. Zavolan, P. Svoboda and W. Filipowicz, MicroRNAs control de novo
DNA methylation through regulation of transcriptional repressors in mouse embryonic stem cells, Nat. Struct. MoI. Biol. 15 (2008), pp. 259-267.
Stefani and Slack, 2008 G. Stefani and F.J. Slack, Small non-coding RNAs in animal development, Nat. Rev. MoI. Cell Biol. 9 (2008), pp. 219-230.
Suh et al., 2004 M.R. Suh, Y. Lee, J.Y. Kim, S.K. Kim, S.H. Moon, J.Y. Lee, K.Y. Cha, H.M.
Chung, H. S. Yoon and S. Y. Moon et al, Human embryonic stem cells express a unique set of microRNAs, Dev. Biol 270 (2004), pp. 488-498.
Tarn et al., 2008 W.L. Tam, CY. Lim, J. Han, J. Zhang, Y.S. Ang, H.H. Ng, H. Yang and B.
Lim, Tcf3 regulates embryonic stem cell pluripotency and self-renewal by the transcriptional
control of multiple lineage pathways, Stem Cells (2008) 10,1634/stemeells, 2007-1 1 15
Published online May 8, 2008.
Viswanathan et al., 2008 S. R, Viswanathan, G. Q. Daley and R.I. Gregory, Selective blockade of microRNA processing by Lin28, Science 320 (2008), pp. 97-100.
Voorhoeve et al., 2006 P.M. Voorhoeve, C. Ie Sage, M. Schrier, A.J. Gillis, H. Stoop, R.
Nagel, Y.P. Liu, J. van Duijse, J. Drost and A. Griekspoor et al , A genetic screen implicates miRNA-372 and miRNA-373 as oncogenes in testicular germ cell tumors, Cell 124 (2006), pp.
1 169-1181.
Wang et al.. 2007 Y. Wang, R. Medvid, C. Melton, R. Jaenisch and R. Blelloch, DGCR8 is essential for microRNA biogenesis and silencing of embryonic stem cell self-renewal, Nat.
Genet, 39 (2007), pp. 380-385.
Wernig et al., 2007 M. Wernig, A. Meissner, R. Foreman, T. Brambrink, M. Ku, K.
Hochedlinger, B. E. Bernstein and R. Jaenisch, In vitro reprogramming of fibroblasts into a pluripotent ES-cell-like state, Nature 448 (2007), pp. 318-324.
Wong et al., 2008 D.J. Wong, H. Liu, T. W. Ridky, D. Cassarino, E. Segal and H. Y. Chang,
Module map of stem cell genes guides creation of epithelial cancer stem cells, Cell Stem Cell 2
(2008), pp. 333-344.
Yi et al., 2008 F. Yi, L. Pereira and B.J. Merrill, Tcf3 functions as a steady state limiter of transcriptional programs of mouse embryonic stem cell self renewal, Stem Cells. (2008)
10/1634/stemcells.2008-0229v 1 Published online May 15, 2008.
Yu et al., 2007 J. Yu, M. A. Vodyanik, K. Smuga-Otto, J. Antosiewicz-Bourget, J. L. Frane, S.
Tian, J. Nie, G.A. Jonsdottir, V. Ruotti and R. Stewart et al, Induced pluripotent stem cell lines derived from human somatic cells, Science 318 (2007), pp. 1917-1920.
Zhou et al., 2007 X. Zhou, J. Ruan, G. Wang and W. Zhang, Characterization and identification of microRNA core promoters in four model species, PLoS Comput Biol 3 (2007),
P- e37 10.1371/iournal.pcbi.0030Q37.
Example 7: Additional Experimental Procedures, Results, and Discussion [00110] Contents of Example 7
[00111] Growth Conditions and Quality Control for Human Embryonic Stem Cells [00112] Growth Conditions and Quality Control for Murine Embryonic Stem Cells and Oct4-repressible ZHBTc4 cells
[00113] Antibodies
[00114] Chromatin Immunoprecipitation
[00115] ChIP-seq Sample Preparation and Analysis
[00116] Confirmation of ChIP-seq enrichment by RT-PCR
[00117] Comparison of ChIP-seq and ChIP-chip enrichment
[00118] Identifications of regions enriched for Oct4/Sox2/Nanog/Tcf3
[00119] DNA Motif Discovery and High-resolution Binding-Site Analysis
[00120] Identification of miRNA start sites in Human and Mouse Genomes
[00121] ChIP-chip Sample Preparation and Analysis
[00122] Comparing Enriched Regions to Known Genes and miRNAs
(00123] Growth Conditions for Neural precursors, Mouse Embryonic Fibroblasts, and
Induced Pluripotent
[00124] Stem Cells
[00125] Analysis of Mature miRNA Frequency by Solexa Sequencing
[00126] miRNA Microarray Expression Analysis
[00127] Tissue Specificity of miRNAs
[00128] Functional Regulation of miRNAs by Oct4 and TcO in mES cells.
[00129] Identification of Oct4/Sox2/Nanog/Tcf3 occupied Feed-Forward Loops
[00130] Index of Tables. Please see Marson, et al. (2008) for Tables. Tables S6 and S7 are provided herein.
[00131] Table Sl Summary of solexa experiments.
[00132] Table S2 Gene occupancy for ChIP-seq data
[00133] Table S3 Regions enriched for Oct4/Sox2/Nanog/Tcf3 in mouse ES cells by
ChIP-seq and associated genomic features
[00134] Table S4 Motif base frequency for Oct4/Sox2/Tcfi/Nanog motif
[00135] Table S5 Regions enriched for H3K4me3-modified nucleosomes in mouse ES cells by ChIP-seq and associated genomic features
[00136] Table S6 Mouse miRNA promoters and associated proteins and genomic features
[00137] Table S7 Human miRNA promoters and associated proteins and genomic features
[00138] Table S8 Regions enriched for Oct4 in human ES cells
[00139] Table S9 miRNA expression in murine ES, neural precursors, embryonic fibroblasts and Oet4->repressible ZHBTc4 cells
[00140] Table SlO Regions enriched for Suzl2 in mouse ES cells [00141] Table SI l miRNA microarray expression data
[00142] Index of Supplementary Files The following files contain data formatted for upload into the UCSC genome browser (Kent et al., 2002). To upload the files, first copy the files onto a computer with internet access. Then use a web browser to go to http://genome.ucsc. edu/cgi-bin/hgCustom?hgsid=105256378 for mouse and http://genome.ucsc. edu/cgi-bin/hgCustom?hgsid= 104842340 for human. In the "Paste URLs or Data" section, select "Browse..." on the right of the screen. Use the pop-up window to select the copied files, then select "Submit". The upload process may take some time. [00143] mouse_miRNA track.mm8.bed - Map of predicted miRNA genes in mouse. Transcripts with EST or gene evidence are shown as black lines. Presumed transcripts are shown as grey lines. Positions of the mature miRNAs are annotated as thicker lines. humanjmiRNAj.rack.hg 17. bed - Map of predicted miRNA genes in human. Transcripts with EST or gene evidence are shown as black lines. Presumed transcripts are shown as grey lines. Positions of the mature miRNAs are annotated as thicker lines. mESjregulator_ChIPseq.mm8.WIG.gz - ChIP-seq data for Oct4, Sox2, Nanog and Tcf3 in mES cells. Top track for each data set illustrates the normalized number of reads assigned to each 25bp bin. Bars in the second track identify regions of the genome enriched at p < 10-9. mES_chromatinjChIPseq.mm8.WIG.gz - ChIP-seq data for H3K4me3, H3K79me2, H3K36me3 and Suzl2 in mES cells. Top track for each data set illustrates the normalized number of reads assigned to each 25bp bin. Bars in the second track identify regions of the genome enriched at p < 10"9.
[00144] Growth Conditions and Quality Control for Human Embryonic Stem Cells [00145] Human embryonic stem (ES) cells were obtained from WiCeIl (Madison, WI; NIH Code WA09) and grown as described. Cell culture conditions and harvesting have been described previously (Boyer et al., 2005; Lee et al., 2006; Guenther et al., 2007). Quality control for the H9 cells included immunohistochemical analysis of pluripotency markers, alkaline phosphatase activity, teratoma formation, and formation of embryoid bodies and has been previously published as supplemental material (Boyer et al., 2005; Lee et al., 2006).
[00146] Growth Conditions for Murine Embryonic Stem Cells and Oct-4 Repressible ZHBTc4 Cells
[00147] V6.5 (C57BL/6-129) murine ES cells were grown under typical ES cell culture conditions on irradiated mouse embryonic fibroblasts (MEFs) as previously described (Boyer et al., 2006). Briefly, cells were grown on gelatinized tissue culture plates in Dulbecco's modified Eagle medium supplemented with 15% fetal bovine serum (characterized from Hyclone), 1000 U/ml leukemia inhibitory factor (LIF, Chemicon; ESGRO ESGl 106), nonessential amino acids, L-glutamine, Penicillin/Streptomycin and β-mercaptoethanol. Immunostaining was used to confirm expression of pluripotency markers, SSEA 1 (Developmental Studies Hybridoma Bank) and Oct4 (Santa Cruz, SC-5279). For location analysis, cells were grown for one passage off of MEFs, on gelatinized tissue-culture plates, [00148] Cells harboring a doxycycline-repressible Oct4 allele (ZHBTc4 cells, Niwa et al., 2000), a gift from A. Smith, were cultured under standard ES cell conditions, described above, on gelatin. Cultures were treated with 2μg/ml doxycycline (SIGMA, D-9891) for 12hrs or 24hrs. For immunostaining, ZHBTc4 cells were treated for 24hrs with doxycycline. Treated cells and non-treated controls were fixed in 4% paraformaldehyde and stained with primary antibodies against Oct4 (Santa Cruz, SC-5279) and Sox2 (R&D Systems, MAB2018,) and secondary Cy3conjugated antibodies (Figure S7d). Oct4 shutdown was confirmed by reverse transcriptase (RT)-PCR of the Oct4 mRNA using the oligos: cacgagtggaaagcaactca and agatggtggtctggctgaac. [00149] Antibodies
[00150] Oct4-bound genomic DNA was enriched from whole cell lysate using an epitope specific goat polyclonal antibody purchased from Santa Cruz (sc-8628) and compared to a reference whole cell extract (Boyer et al., 2005). Regions occupied with high confidence for this antibody identified by ChIP-seq in mES cells are listed in Table S3 and by ChIP-chip on genome-wide tiling arrays in hES cells are on Table S8. Oct4 ChIP-seq data can be visualized on the UCSC browser by uploading supplemental file: mES_regulator_ChIPseq.mm8.WIG.gz
[00151] Sox2-bound genomic DNA was enriched from whole cell lysate using an affinity purified goat polyclonal antibody purchased from R&D Systems (AF2018) and compared to a reference whole cell extract (Boyer et al., 2005). Regions occupied with high confidence for this antibody identified by ChIP-seq in mES cells are listed in Table S3. Sox2 ChIP-seq data can be visualized on the UCSC browser by uploading supplemental file: mES_regulator_ChIPseq.mm8.WIG.gz
[00152] Nanog-bound genomic DNA was enriched from whole cell lysate using an affinity purified rabbit polyclonal antibody purchased from Bethyl Labs (bl 1662) and compared to a reference whole cell extract (Boyer et al., 2005). Regions bound with high confidence for this antibody are listed in Table S3. Nanog ChIP-seq data can be visualized on the UCSC browser by uploading supplemental file: mES_regulator_ChIPseq.mm8.WIG.gz
[00153] Tcβ-bound genomic DNA was enriched from whole cell lysate using an epitope specific goat polyclonal antibody purchased from Santa Cruz (sc-8635) and compared to a reference whole cell extract (Cole et al., 2008). Regions occupied with high confidence for this antibody identified by ChIP-seq in mES cells are listed in Table S3. TcO ChIP-seq data can be visualized on the UCSC browser by uploading supplemental file: mES_regulator_ChIPseq.mm8.WIG.gz
[00154] Suzl2-bound genomic DNA was enriched from whole cell lysate using an affinity purified rabbit polyclonal antibody purchased from Abeam (AB 12073) and compared to a reference whole cell extract (Lee et al., 2006). Regions bound with high confidence for this antibody are listed in Table SlO. Suzl2 ChIP-seq data can be visualized on the UCSC browser by uploading supplemental file mES_chomatin_ChIPseq.mm8.WIG.gz [00155] H3K4me3 -modified nucleosomes were enriched from whole cell lysate using an epitope-specific rabbit polyclonal antibody purchased from Abeam (AB8580) (Santos-Rosa et al., 2002; Guenther et al., 2007). Samples were analyzed using ChIP-seq. Comparison of this data with ChIP-seq published previously (Mikkelsen et al., 2007) showed near identify in profile and bound regions (Table S5). H3K4me3 ChIP-seq data can be visualized on the UCSC browser by uploading supplemental file: mES chomatin_ChIPseq.mm8.WIG.gz [00156] H3K79me2-modified nucleosomes were isolated from mES whole cell lysate using Abeam antibody AB3594 (Guenther et al., 2007). Chromatin immunoprecipitations against H3K36me3 were compared to reference WCE DNA obtained from mES cells. Samples were analyzed using ChIP-seq and were used for visual validation of predicted miRNA promoter association with mature miRNA sequences only (Figure 2). H3K79me2 ChIP-seq data can be visualized on the UCSC browser by uploading supplemental file: mES_chomatin_ChIPseq.mm8.WIG.gz
[00157] H3K36me3 -modified nucleosomes were isolated from mES whole cell lysate using rabbit polyclonal antibody purchased from Abeam (AB9050) (Guenther et al., 2007).
Chromatin immunoprecipitations against H3K36me3 were compared to reference WCE DNA obtained from mES cells. Samples were analyzed using ChIP-seq and were used for visual validation of predicted miRNA promoter association with mature miRNA sequences only (Figure 2). H3K36me3 ChIP-seq data can be visualized on the UCSC browser by uploading supplemental file; mES_ehomatin_ChϊPseq.mm8.WIG.gz [00158] Chromatin Immunoprecipitation
[00159] Protocols describing all materials and methods have been previously described (Lee et al. 2007) and can be downloaded from http://web.wi. mit.edu/young/hESJPRC. [00160] Briefly, we performed independent immunoprecipitations for each analysis.
Embryonic stem cells were grown to a final count of 5x10 - 1x10 cells for each location analysis experiment. Cells were chemically crosslinked by the addition of one-tenth volume of fresh 11 % formaldehyde solution for 15 minutes at room temperature. Cells were rinsed twice with IxPBS and harvested using a silicon scraper and flash frozen in liquid nitrogen. Cells were stored at -80 C prior to use.
[00161] Cells were resuspended, lysed in lysis buffers and sonicated to solubilize and shear crosslinked DNA. Sonication conditions vary depending on cells, culture conditions, crosslinking and equipment. We used a Misonix Sonicator 3000 and sonicated at approximately 28 watts for 10 x 30 second pulses (90 second pause between pulses). For ChIP of Oct4, Nanog, Tcf3 and Suzl2 in murine ES cells, SDS was added to lysate after sonication to a final concentration of 0.1%. Samples were kept on ice at all times. [00162] The resulting whole cell extract was incubated overnight at 40C with 100 μl of Dynal Protein G magnetic beads that had been pre-incubated with approximately 10 μg of the appropriate antibody. Beads were washed 4-5 times with RIPA buffer and 1 time with TE containing 50 mM NaCl. For ChIP of Oct4, Nanog, TcO and Suzl2 in murine ES cells, the following 4 washes for 4 minutes each were used instead of RIPA buffer: IX low salt (2OmM Tris pH 8.1, 15OmM NaCl, 2mM EDTA, 1% Triton X-100, 0.1% SDS), IX high salt (2OmM Tris pH 8.1, 50OmM NaCl, 2mM EDTA, 1% Triton X-100, 0.1% SDS), IX LiCl (1OmM Tris pH 8.1, 25OmM LiCl, ImM EDTA, 1% deoxycholate, 1% NP-40), and IX TE+ 5OmM NaCl. Bound complexes were eluted from the beads by heating at 650C with occasional vortexing and crosslinking was reversed by overnight incubation at 65°C. Whole cell extract DNA (reserved from the sonication step) was also treated for crosslink reversal. [00163] ChIP-Seq Sample Preparation and Analysis
(00164] All protocols for Illumina/Solexa sequence preparation, sequencing and quality control are provided by Illumina (http://www.illumina.com/pages. ilmn?ID=203). A brief summary of the technique and minor protocol modifications are described below. [00165] Sample Preparation
[00166] Immunoprecipitated (ChIP) DNA was prepared for sequencing according to a modified version of the Illumina/Solexa Genomic DNA protocol. Fragmented DNA was prepared for ligation of Solexa linkers by repairing the ends and adding a single adenine nucleotide overhang to allow for directional ligation. A 1 : 100 dilution of the Adaptor Oligo Mix (Illumina) was used in the ligation step. A subsequent PCR step with limited (18) amplification cycles added additional linker sequence to the fragments to prepare them for annealing to the Genome Analyzer flow-cell. After amplification, a narrow range of fragment sizes was selected by separation on a 2% agarose gel and excision of a band between 150- 300 bp (representing shear fragments between 50 and 200nt in length and ~100bp of primer sequence). The DNA was purified from the agarose and diluted to 10 nM for loading on the flow cell.
[00167] Polony generation on Solexa Flow-Cells
[00168] The DNA library (2-4 pM) was applied to the flow-cell (8 samples per flow-cell) using the Cluster Station device from Illumina. The concentration of library applied to the flow-cell was calibrated such that polonies generated in the bridge amplification step originate from single strands of DNA. Multiple rounds of amplification reagents were flowed across the cell in the bridge amplification step to generate polonies of approximately 1 ,000 strands in 1 μm diameter spots. Double stranded polonies were visually checked for density and morphology by staining with a 1 :5000 dilution of SYBR Green I (Invitrogen) and visualizing with a microscope under fluorescent illumination. Validated flow-cells were stored at 4 C until sequencing. [00169] Sequencing
[00170] Flow-cells were removed from storage and subjected to linearization and annealing of sequencing primer on the Cluster Station. Primed flow-cells were loaded into the Illumina Genome Analyzer 1 G. After the first base was incorporated in the Sequencing- by-Synthesis reaction the process was paused for a key quality control checkpoint. A small section of each lane was imaged and the average intensity value for all four bases was compared to minimum thresholds. Flow-cells with low first base intensities were re-primed
and if signal was not recovered the flow-cell was aborted. Flow-cells with signal intensities meeting the minimum thresholds were resumed and sequenced for 26 cycles. [00171 ] Solexa Data A nalysis
[00172] Images acquired from the Illumina/Solexa sequencer were processed through the bundled Solexa image extraction pipeline which identified polony positions, performed base- calling and generated QC statistics. Sequences were aligned using the bundled ELAND software using murine genome NCBI Build 36 and 37 (UCSC mm8, mm9) as the reference genome. Alignments to build 37 were used for analysis of the mmu-mir 290-295 cluster only as that cluster is not represented on build 36. Only sequences perfectly and uniquely mapping to the genome were used. A summary of the number of reads used is shown in Table Sl. [00173] The analysis methods used were derived from previously published methods (Johnson et al., 2007, Mikkelsen et al., 2007). Sequences from all lanes for each chromatin IP were combined, extended 200bp (maximum fragment length accounting for ~100bp of primer sequence), and allocated into 25 bp bins. Genomic bins containing statistically significant ChIP-seq enrichment were identified by comparison to a Poissonian background model, using a p-value threshold of 10 . A list of the minimum number of counts in a genomic bin required for each sample to meet this threshold are provided in Table Sl . Additionally, we used an empirical background model obtained from identical Solexa sequencing of DNA from whole cell extract (WCE) from matched cell samples (>5X normalized enrichment across the entire region, see below). A summary of the bound regions and their relation to gene targets can be found in Tables S2, S3, S5 and SlO. [00174] The p-value threshold was selected to minimize the expected false-positive rate. Assuming background reads are spread randomly throughout the genome, the probability of observing a given number of counts can be modelled as a Poisson process where the expectation can be calculated as the number of mapped reads times the number of bins per read (8) divided by the total number of bins available (we assumed 50% as a very conservative estimate).
[00175] The Poissonian background model assumes a random distribution of background reads, however we have observed significant deviations from this expectation in ChIP-seq datasets. These non-random events can be detected as sites of enrichment using control IPs and create a significant number of false positive events for actual ChIP-seq experiments. To remove these regions, we compared genomic bins and regions that meet the statistical
threshold for enrichment to an empirical distribution of reads obtained from Solexa sequencing of DNA from whole cell extract (WCE) from matched cell samples. We required that enriched regions have five-fold greater ChIP-seq density in the specific IP sample as compared with the non-specific WCE sample, normalized for the total number of reads. This served to filter out genomic regions that are biased to having a greater than expected background density of ChIP-seq reads. We observed that -200-500 regions in the genome showed non-specific enrichment in these experiments. [00176] Confirmation of ChIP-seq enrichment by RT-PCR
[00177] In order to ascertain the accuracy of our ChIP-seq data, we compared our results to RT (real time)-PCR studies done on Oct4 and Suzl2 in mES cells. Previous studies had preformed RT-PCR data for Oct4 binding in mES cells (Loh et al., 2006). Loh and colleagues tested 71 binding events identified by ChIP-PET and confirmed by RT-PCR enrichment for 69 of these regions. Our Solexa data also identifies 69 of these 71 regions as enriched for Oct4 binding (Figure SIa). Loh et al. also used RT-PCR to test enrichment at a set of 10 regions with low ChIP-PET signal (clusters of 2). They found 9 of these regions were not enriched by RT-PCR and our Solexa data found 8 of the 10 regions to not be enriched (the region enriched according the ChIP-PET data from Loh et al. was also identified as enriched in our Solexa data). Globally, 71 of 71 PCR fragments enriched by RTPCR were confirmed by ChIP-seq while 34 of 39 sequences identified as not-enriched were found to be below our binding threshold. These discrepancies may represent false positives derived from sequence technologies (the majority of these sites were just below threshold in Loh et al.) or may represent false negatives in the RT-PCR experiment. [00178] We also examined a quantitative PCR dataset for a Suzl2 ChIP in mES cells (Boyer et al., 2005). This dataset identified 83 regions with Suzl2 enrichment and 21 regions with no Suzl2 enrichment as measured by RT-PCR. We examined sequence data for the regions amplified by RT-PCR (with the addition of 200 bases on either side of the PCR product to account for chromatin fragment size) to determine the level of agreement between our Suzl2 ChIP-seq data and the RTPCR data from Boyer et al. Of the 83 regions identified as enriched for Suzl2 by RT-PCR we found that our Solexa data identified these regions as enriched for 63 of these (76%). Of the 21 regions identified as not enriched for Suzl2 by RT- PCR we found that our Solexa data found 18 of these to be not enriched (86%). This analysis reveals that RT-PCR results do largely confirm our ChIP-seq data. Given that RT-
PCR itself produces false positive and false negative results we further analyzed the data by incorporating the ChlP-chip data from Boyer et al. Of the 68 regions identified as enriched by both RT-PCR and ChlP-chip, 63 of these (93%) were enriched in our Solexa data. Of the 5 regions not directly enriched above threshold in the Solexa data, 3 of these were very near to regions identified as enriched from the Solexa data and the difference may be due, at least in part, to the much shorter fragment sizes used in ChIP-seq then in either ChlP-chip or RT- PCR. Of the 18 regions identified as not enriched by both RT-PCR and ChlP-chip, all of these (100%) were also determined to not be enriched according to our Solexa data. The results of this analysis are depicted in Figure SIb. The strong agreement between our ChIP- sq data and the ChIP-PCR data from Loh et al. and Boyer et al., despite differences in protocols and cell lines, strongly demonstrate the accuracy of our ChIP-seq results using the Solexa sequencing platform.
[00179] Comparison of ChIP-seq and ChlP-chip enrichment [00180] We took advantage of our previous genome wide analyses to investigate the similarity between genome-wide location analyses done either using ChIP-seq on the Illumina-Solexa sequencer or using tiling arrays purchased from Agilent Technology. Recent work by our lab has identified the binding sites for Oct4, Nanog and TcO in mES cells using ChlP-chip. These arrays contain over 6 million unique features tiling the non-repeat portion of the mouse genome at an average probe density of 1 oligo / 250 bp. Using these arrays, we identified 1 1,090 sites as significantly enriched for Oct4, 15,172 sites for Nanog and 13,348 sites for Tcf3. These sites showed extensive overlap with the regions identified by ChIP-seq (Figure S3a). Globally, we found that 61% (6,737) of ChlP-chip Oct4 bound sites that were also enriched in the ChIP-seq experiments (similar numbers were found for Nanog and TcD).
[00181] While the degree of overlap between ChlP-chip and ChIP-seq was relatively high, we wanted to understand the primary sources for the discrepancies between the two techniques. We first focused on the regions that were identified as enriched only in ChIP-seq. Direct examination of these regions revealed that the majority were in regions of the genome that were tiled poorly. In designing the 60bp oligonucleotides used on our genome-wide arrays, we required that each probe had a large degree of "uniqueness" (see supplemental for Boyer et al., 2005 and Lee et al., 2005). This was designed to minimize the degree of cross hybridization and required flexibility in the positioning of the oligonucleotides across the
genome. ChIP-seq requires significantly less uniqueness to map reads to the genome and so should be able to detect binding across a much larger fraction of the genome (~70% as reported in Mikklesen et al., 2007). When we examined the Agilent probe density across the ChIP-seq enriched regions, we found a broad range of probe densities, with almost half of all high-confidence targets in regions with less then 3 probes per kb (Figure S3b). While the portions of the genome tiled at > 3 probes per kb had strong overlaps, enriched regions of the genome with lower probe densities were much more difficult to identify by ChIP-chip. [00182] Finally, we compared our data to ChIP-seq data published by Mikkelsen et al. (2007). Previous studies on array platforms have often shown large differences between labs (see Boyer et al., 2005, Loh et al., 2006 for an example). We were curious if the protocol simplifications created by using a sequencing approach would allow for more reproducible data. Using the histone modification H3K4me3 as a test we found that there was an extremely high overlap between these two experiments. Of 19,632 H3K4me3 enriched regions that were identified, 96% (18,849) were overlapped in the comparison dataset. [00183] Identifications of regions enriched for Oct4/Sox2/Nanog/Tcf3 [00184] The identification of enriched regions in ChIP-chip and ChIP-seq experiments is typically done using threshold for making a binary determination of enriched or not enriched. Unfortunately, there is not actually a clear delineation between truly bound and unbound regions. Instead, enrichment is a continuum and the threshold is set to minimize false positives (high-confidence sites). This typically requires that thresholds be set at a level that allows a high false-negative rate (-30% for ChIP-chip, Lee et al). When multiple factors are compared, focusing only on the intersection of the different data sets compounds this effect, leading to higher false negative rates and the loss of many critical target genes. [00185] Oct4, Sox2, Nanog and TcB co-occupy promoters throughout the genome (Cole, Figure 1) and cluster analysis of enriched sites reveals apparent co-enrichment for all 4 factors at >90% of sites (Frampton & Young, unpublished data). However, the overlap for any two factors at the cut-off for high-confidence enrichment is only about two thirds (Figure S2, Tables S2 and S3). Therefore many of these sites must have enrichment that is below the high-confidence threshold for at least some of the participating factors. Variability in the enrichment observed for each factor at different binding sites is common in the data (Figures Ib, 3, and S3).
[00186] To determine a threshold of binding for multiple factors, we used two complementary methods to examine high-confidence targets of the four regulators. First, the classes of genes enriched by different numbers of factors at high-confidence were compared to the known classes of targets based on gene ontology (Figure S2b, http://gostat.wehi. edu.au/cgi-bin/goStat,pl,Beissbarth and Speed, 2004). The highest confidence targets (those with high levels of immunoenrichment observed for all for factors) preferentially encoded factors involved in DNA binding, regulation of transcription and development as has been previously shown (Boyer et al., 2005). These gene ontology categories continued to be overrepresented among high-confidence targets of either 3 of the 4 factors or 2 of 4 the factors, albeit at lower levels, but were barely enriched among high confidence targets of only one factor.
[00187] As a second test, we examined how different numbers of overlapping high- confidence targets affected the overlap with our previous genome-wide studies using ChIP- chip. Because not all regions of the genome are tiled with equal density on the microarrays used for ChIP-chip, we first determined the minimum probe density required to confirm binding detected by ChIP-seq (Figure S3). At most genes with high probe density, the ChIP- seq and ChIP-chip data were very highly correlated. However, regions of the genome with microarray coverage of less than three probes per kilobase were generally unreliable in detecting this enrichment. These regions, which had low probe coverage on the microarrays, represent approximately 1/3 of all sites co-enriched for the four factors by ChIP-seq. In regions where probe density was greater than three probe per kilobase the fraction of ChIP- seq sites confirmed by ChIP-chip experiments increased with additional factors co-binding with a large fall off below 2 factors (data not shown). Based on these two analyses, we elected to choose targets occupied at high-confidence by 2 or more of the 4 factors tested for further analysis in this manuscript. (Figures Ia and S2a (red line)). [00188] While a majority of the miRNA promoters identified as occupied by Oct4/Sox2/Nanog/Tcf3 are not occupied by all four factors at high-confidence, it is interesting to note that all of the miRNA genes that share highly similar seeds to miR-302 are occupied at high confidence by all four factors (miR-302 cluster, miR290 cluster and miR- 106a cluster), similar to the promoters of core transcriptional regulators of ES cells. By comparison, promoters also occupied by Suzl2 almost never showed high-confidence binding for all four factors (Table S4, see mmu-mir9-2 in Figure 3). Similar effects were
observed for protein-coding genes in mES cells (Lee et al., 2006). Whether this is caused by reduced epitope availability in PcG bound regions or reflects reduced protein binding is unclear.
[00189] DNA Motif Discovery and High-resolution Binding-Site Analysis [00190] DNA motif discovery was performed on the genomic regions that were enriched at high-confidence by anti-Oct4 chromatin immunoprecipitation. In order to obtain maximum resolution, a modified version of the ChIP-seq read mapping algorithm was used. Genomic bins were reduced in size from 25 bp to 10 bp. Furthermore, a read extension that placed greater weight towards the middle of the 200 bp extension was used. This model placed 1/3 count in the 8 bins from 0-40 and 160200 bp, 2/3 counts in the 8 bins from 40-80 and 120-160 bp and 1 count in the 4 bins from 80-120 bp. This allowed increased precision for determination of the peak of ChIP-seq density in each Oct4 bound region. One-hundred bp of genomic sequence, centered at the 500 largest peaks of Oct4 ChIP-seq density, were submitted to the motif disocvery tool MEME (Bailey and Elkan, 1995; Bailey et al., 2006) to search for over-represented DNA motifs. A single sixteen basepair motif was discovered by
-100 the MEME algorithm (Table S4, Figure S4i). This motif was significantly (p<10 ) over- represented in the Oct4 bound input sequences and occurred in 445 of the 500 one-hundred bp sequences.
[00191] As a default, MEME uses the individual nucleotide frequencies within input sequences to model expected motif frequencies. This simple model might result discovery of motifs which are enriched because of non-random di-, tri-, etc. nucleotide frequencies. Consequently, three different sets of control sequences of identical length were used to ensure the specificity of the motif discovery results. First, the sequences immediately flanking each input sequence were used as control sequences. Second, randomly selected sequences having the same distribution of distances from transcription start sites as the Oct4 input sequences were used as control sequences. Third, sequences from completely random genomic regions were used as control sequences. Each of these sets of control sequences were also examined using MEME. For each of these controls, the motif discovered from actual Oct4 bound sequences was not identified in the control sequences. The motif discovery process was repeated using different numbers and lengths of sequences, but the same motif was discovered for a wide array of input sequences.
[00192] When motif discovery was repeated with the top 500 Sox2, Nanog, and Tcf3 binding peaks, the same motif was identified. Overall, the motif occurs within 100 bp of the peak of ChIP-seq density at more than 90% of the top regions enriched in each experiment, while occuring in the same span at 24-28% of control regions and within 25 bp of the ChIP- seq peak at more than 80% of regions versus 9-1 1% of control regions. [00193] We next attempted to determine the precise sites in the genome bound by Oct4, Sox2, Nanog, and Tcf3 at basepair resolution using composite analysis of the bound regions for each factor. In particular, we examined if the different factors tended to associate with specific sequences within the assymetric DNA motif. A set of -2,000 of the highest confidence bound regions was determined for each factor based on a count threshold twofold higher than the threshold for high-confidence regions shown in Table Sl (Poisson: p <
-9
10 ). Regions without a motif within 50bp of the peak of ChIP-seq enrichment, typically -10% of regions, were removed from this analysis. The distance from the first base of the central motif in each bound region to the 5' end of all reads within 250bp was tabulated, seperating reads mapping to the same strand as the motif separate from reads mapping to the oppositie strand. The difference in ChIP-seq read frequency between reads mapping to the same strand as the motif and the reads mapping to the oppositite strand was calculated at every basepair within the 500 bp window Figure S4. We made the assumption that the precise peak of the ChIP-seq distribution was the point at which this strand bias was equal to zero.
[00194] To determine the precise position where the strand bias was equal to zero, we modelled the strand bias for each transcription factor with a simple function. We chose a function with 4 parameters (A, B, C, and M), one of which (M) was the point at which the curve crosses the x-axis.
f [ χ ϊ = Λ X arctaii ( : — ) x C
[00195] (1) ' B
[00196] Least squares curve fitting was performed using GNUplot (http://www.gnuplot.info/) with approximated initial conditions (A = -1000, B = 100, C = 2, M = 10). The variablity in M was detemined by bootsrapping (n=25) using a random set of half of the ChIP-seq reads in each dataset and is shown in Figure S4. [00197] Identification of miRNA start sites in Human and Mouse Genomes
[00198] To better understand the regulation of miRNAs, we sought to identify the sites of transcription initiation for all miRNAs in both human and mouse, at least to low resolution (~lkb). Most methods used to identify promoters require active transcription of the miRNA and isolation of rare primary miRNA transcripts. We decided to use an approach based on in vivo chromatin signature of promoters. This approach has two principle advantages. First, the required data has been published by a variety of laboratories and is readily accessible and second, it does not require the productive transcription of the miRNA primary transcript. [00199] Recent results using genome-wide location analysis of H3K4me3 indicate that between 60 and 80% of all protein-coding genes in any cell population have promoters enriched in methylated nucleosomes, even where the gene is not detected by typical transcription profiling (Guenther et al., 2007). Importantly, over 90% of the H3K4me3 enriched regions in these cells map to known or predicted promoters, suggesting that H3K4me3 can be used as a proxy for sites of active initiation. Our strategy to identify miRNA promoters, therefore, uses H3K4me3 enriched sites from as many sources as possible as a collection of promoters. In human, H3K4me3 sites were identified in ES cells (H9), hepatocytes, a pro-B cell line (REH cells) (Guenther et al., 2007) and T cells (Barski et al., 2007). Mouse H3K4me3 sites were identified from ES cells (V6.5), neural precursors, and embryonic fibroblasts (Mikkelsen et al., 2007). In total, we identified 34,793 high- confidence H3K4me3 enriched regions in human and 34,096 high-confidence regions enriched in mouse, collectively present at ~75% of all protein-coding genes. [00200] The list of miRNAs identified in the miRNA atlas (Landgraf et al., 2007) were used as the basis for our identification. The total list consists of 496 miRNAs in human, 382 miRNAs in mouse. -65% of the murine miRNAs can be found in both species. For each of these miRNAs, possible start sites were derived from both all H3K4me3 enriched regions within 250kb upstream of the miRNA as well as all known start sites for any miRNAs that were identified as being within known transcripts from RefSeq(Pruitt et al., 2005) Mammalian Gene Collection (MGC) (Gerhard et al., 2004) Ensembl (Hubbard et al., 2005), or University of California Santa Cruz (UCSC) Known Genes (genome. ucsc.edu)(Kent et al., 2002)for which EntrezGene (http://www.ncbi.nlm.nih.gov/entrez/) gene IDs had been generated. Where an annotated start site was found to overlap an H3K4me3 enriched region, the known start was used in place of the enriched region.
[00201] A scoring system was derived empirically to select the most likely start sites for each miRNA. Each possible site was given a bonus if it was either the start of a known transcript that spanned the miRNA or of an EST that spanned the miRNA. Scores were reduced if the H3K4me3 enriched region was assignable instead to a transcript or EST that did not overlap the miRNA. Additional positive scores were given to enriched sites within 5kb of the miRNA, while additional negative scores were given based on the number of intervening H3K4me3 sites between the test region and the miRNA. Finally, each enriched region was tested for conservation between human and mouse using the UCSC liftover program (Hinrichs et al., 2006). If two test regions overlapped, they were considered to be conserved (21%). In the cases where human and mouse disagreed on the quality of a site, if the site had an EST or gene overlapping the miRNA, that site was given a high score in both species. Alternatively, if one species had a non-overlapping site, that site was considered to be an unlikely promoter in both species. Finally, for miRNAs where a likely promoter was identified in only one species, we manually checked the homologous region of the other genome to search for regions enriched for H3K4me3 -modified nucleosomes that may have fallen below the high-confidence threshold. Start sites were considered to be likely if the total score was > 0 (Figure S5 and $6). In total, we identified likely start sites for -85% of all miRNAs in both species (Tables $6 and S7). Predicted miRNA genes can be visualised on the UCSC browser by uploading the supplemental files:mouse_miRNA_track.mm8.bed and humanjniRNA_track.hg 17. bed
[00202] Several lines of evidence suggest the high quality of these predictions. First, previous studies have found that miRNAs within 50kb of each other are likely to be co- regulated (Lagos-Quintana et al., 2001 ; Lau et al., 2001). While the nature of these clusters was not presupposed in our analysis, nearly all miRNAs within a cluster end up identifying the same promoter region (see Figures 2, 3, and 5 ). The only exceptions to this are found in the large clusters of repeat derived miRNAs found in chromosome 12 of mouse and chromosome 14 in human where a single H3K4me3 enriched region splits the clusters. Second, consistent with the frequent association of CpG islands with the transcriptional start sites for protein-coding genes, -50% of the miRNA promoters identified here overlap CpG islands (Tables S6 and S7). Finally, for miRNAs that were active in ES cells, histone modifications associated with elongation were able to "connect" the mature miRNAs to the predicted transcription start site (Figure 2).
[00203] To further ascertain the accuracy of our promoter predictions, we compared our predicted start sites to those identified in recent studies. Predictions were tested against mmu- mir-34b I mmu-mir-34c (Corney et al., 2007), hsa-mir-34a (Chang et al., 2007) mmu-mir- 101a, mmu-mir-202, mmu-mir-22, mmu-mir-124a-l, mmu-mir433 (Fukao et al., 2007), mmu- mir-290-295, hsa-mir-371-373 (Houbivay et al., 2005) and hsa-mir-17/18a/19a/20a/19b- l/92a-l (O'Donnell et al., 2005). Additional miRNA promoters in these manuscripts were not predicted strongly by the above algorithm. For these 23 miRNAs, H3K4me3 sites were identified within lkb of all but two of the sites, mmu-mir-202 was predicted about 20kb upstream of the annotated start site, but may reflect an H3K4me3 site absent from the tissues sampled, mmu-mir-433 is in the middle of a large cluster of miRNAs on mouse chromosome 12. The annotated TSS lies within the cluster between mir-433 and mir-431 suggesting the promoter may be incorrect. Overall, the accuracy of the promoter predictions is believed to be -80% (8/10). Additional H3K4me3 data sets and EST data should allow for improved accuracy in predicting and validating these initiation sites.
[00204] Among the predicted miRNA promoters validated by previous studies is the start site for the mir-290-295/371-373 polycistron (Houbaviy et al., 2005). This miRNA cluster includes the most abundant miRNAs in murine embryonic stem cells and the seed sequences specified by multiple mature miRNAs in this cluster form the basis of the network illustrations in Figures 6 and 7. In their study, Houbaviy and colleagues tested the ES cell specificity of this promoter using a heterologous reporter assay including a construct spanning from -2kb to +5kb surrounding the start site oϊmir-290-295. This region is notable in that, while it excludes the largest peaks of Oct4/Sox2/Nanog/Tcf3, it does contain a smaller (yet significantly enriched) region located over the promoter (small peak at the promoter in Figure 3a). This promoter proximal construct showed 5- 10x higher maximal expression in ES cells relative to more differentiated cells. Expression of this construct was dependent on a small portion of the construct that included the TATAA box and a proximal site of Oct4/Sox2/Nanog/Tcf3 occupancy. [00205] ChIP-ehip Sample Preparation and Analysis
[00206] Immunoprecipitated DNA and whole cell extract DNA were purified by treatment with RNAse A, proteinase K and multiple phenol: chloroform :isoamyl alcohol extractions. Purified DNA was blunted and ligated to linker and amplified using a two-stage PCR protocol. Amplified DNA was labeled and purified using Bioprime random primer labeling
kits (Invitrogen): immunoenriched DNA was labeled with Cy5 fluorophore, whole cell extract DNA was labelled with Cy3 fluorophore.
[00207] Labeled DNA was mixed (~5 μg each of immunoenriched and whole cell extract
DNA) and hybridized to arrays in Agilent hybridization chambers for up to 40 hours at 40°C.
Arrays were then washed and scanned.
[00208] Slides were scanned using an Agilent DNA microarray scanner BA. PMT settings were set manually to normalize bulk signal in the Cy3 and Cy5 channel. For efficient batch processing of scans, we used Genepix (version 6.0) software. Scans were automatically aligned and then manually examined for abnormal features. Intensity data were then extracted in batch.
[00209] 44k Human Whole Genome Array
[00210] The human promoter array was purchased from Agilent Technology
(www.agilent.com). The array consists of 1 15 slides each containing -44,000 60mer oligos designed to cover the non-repeat portion of the human genome. The design of these arrays are discussed in detail elsewhere (Lee et al., 2006).
[00211 ] Data Normalization and A nalysis
[00212] We used GenePix software (Axon) to obtain background-subtracted intensity values for each fluorophore for every feature on the whole genome arrays. Among the
Agilent controls is a set of negative control spots that contain 60-mer sequences that do not cross-hybridize to human genomic DNA. We calculated the median intensity of these negative control spots in each channel and then subtracted this number from the intensities of all other features.
[00213] To correct for different amounts of each sample of DNA hybridized to the chip, the negative control-subtracted median intensity value of control oligonucleotides from the
Cy3-enriched DNA channel was then divided by the median of the control oligonucleotides from the Cy5-enriched DNA channel. This yielded a normalization factor that was applied to each intensity in the Cy5 DNA channel.
[00214] Next, we calculated the log of the ratio of intensity in the Cy3-enriched channel to intensity in the Cy5 channel for each probe and used a whole chip error model (Hughes et al., 2000) to calculate confidence values for each spot on each array (single probe p- value).
This error model functions by converting the intensity information in both channels to an X score which is dependent on both the absolute value of intensities and background noise in
each channel using an f-score calculated as described (Boyer et al., 2005) for promoter regions or using a score of 0.3 for tiled arrays. When available, replicate data were combined, using the X scores and ratios of individual replicates to weight each replicate's contribution to a combined X score and ratio. The X scores for the combined replicate are assumed to be normally distributed which allows for calculation of a p-value for the enrichment ratio seen at each feature. P-values were also calculated based on a second model assuming that, for any range of signal intensities, IPxontrol ratios below 1 represent noise (as the immunoprecipitation should only result in enrichment of specific signals) and the distribution of noise among ratios above 1 is the reflection of the distribution of noise among ratios below 1.
[00215] High-Confidence Enrichment
[00216] To automatically determine bound regions in the datasets, we developed an algorithm to incorporate information from neighboring probes. For each 60-mer, we calculated the average X score of the 60-mer and its two immediate neighbours. If a feature was flagged as abnormal during scanning, we assumed it gave a neutral contribution to the average X score. Similarly, if an adjacent feature was beyond a reasonable distance from the probe (1000 bp), we assumed it gave a neutral contribution to the average X score. The distance threshold of 1000 bp was determined based on the maximum size of labelled DNA fragments put into the hybridization. Since the maximum fragment size was approximately 550 bp, we reasoned that probes separated by 1000 or more bp would not be able to contribute reliable information about a binding event halfway between them. [00217] This set of averaged values gave us a new distribution that was subsequently used to calculate p-values of average X (probe set p-values). If the probe set p-value was less than 0.001, the three probes were marked as potentially bound. [00218] As most probes were spaced within the resolution limit of chromatin immunoprecipitation, we next required that multiple probes in the probe set provide evidence of a binding event. Candidate bound probe sets were required to pass one of two additional filters: two of the three probes in a probe set must each have single probe p-values < 0.005 or the centre probe in the probe set has a single probe p-value < 0.001 and one of the flanking probes has a single point p-value < 0.1. These two filters cover situations where a binding event occurs midway between two probes and each weakly detects the event or where a binding event occurs very close to one probe and is very weakly detected by a neighboring
probe. Individual probe sets that passed these criteria and were spaced closely together were collapsed into bound regions if the center probes of the probe sets were within 1000 bp of each other.
[00219] Comparing Enriched Regions to Known Genes and miRNAs [00220] Enriched regions were compared relative to transcript start and stop coordinates of known genes compiled from four different databases: RefSeq (Pruitt et al., 2005), Mammalian Gene Collection (MGC) (Gerhard et al., 2004), Ensembl (Hubbard et al., 2005), and University of California Santa Cruz (UCSC) Known Genes (genome.ucsc.edu) (Kent et al., 2002). All human coordinate information was downloaded in January 2005 from the UCSC Genome Browser (hgl7, NCBI build 35). Mouse data was downloaded in June of 2007 (mm8, NCBI build 36). To convert bound transcription start sites to more useful gene names, we used conversion tables downloaded from UCSC and Ensembl to automatically assign EntrezGene (http://www.ncbi.nlm.nih.gov/entrez/) gene IDs and symbols to the RefSeq, MGC, Ensembl, UCSC Known Gene. Comparisons of Oct4, Sox2, Nanog, TcD, H3K4me3 and Suzl2 to annotated regions of the genomes can be found in Tables S3, S5, S8 and SlO
[00221] For miRNAs start sites, two separate windows were used to evaluate overlaps. For chromatin marks and non-sequence specific proteins, miRNA promoters were considered bound if they were within lkb of an enriched sequence. For sequence specific factors such as Oct4, we used a more relaxed region of 8kb surrounding the promoter, consistent with previous work we have published (Boyer et al., 2005). A full list of the high confidence start sites bound to promoters can be found in Tables S6 and S7.
[00222] Growth Conditions for Neural Precursors, Mouse Embryonic Fibroblasts, Induced Pluripotent Stem Cells.
[00223] To generate neural precursor cells, ES cells were differentiated along the neural lineage using standard protocols. V6.5 ES cells were differentiated into neural progenitor cells (NPCs) through embryoid body formation for 4 days and selection in ITSFn media for 5-7 days, and maintained in FGF2 and EGF2 (R&D Systems) (Okabe et al., 1996). [00224] Mouse embryonic fibroblasts were prepared from DR-4 strain mice as previously described (Tucker et al., 1997). Cells were cultured in Dulbecco's modified Eagle medium supplemented with 10% cosmic calf serum, β-mercaptoethanol, nonessential amino acids, L- glutamine and penniclin/streptomycin. Murine induced pluripotent stem cells (iPS) were
generated as described in Wernig et al., 2007. iPS cells were cultured under the same conditions as mES cells.
[00225] Analysis of Mature miRNA Frequency by Solexa Sequencing
[00226] Short RNA cloning
[00227] A method of cloning the 18-30nt transcripts previously described (Lau et al., 2001) was modified to allow for Solexa (Ulumina) sequencing (manuscript submitted). Single-stranded cDNA libraries of short transcripts were generated using size selected RNA. RNA extraction was performed using Trizol, followed by RNeasy purification (Qiagen). [00228] 5μg of RNA was size selected and gel purified, 3' Adaptor (pTCGTATGCCGTCTTCTGTTG [idT]) (SEQ ID NO: 1) was ligated to RNA with T4 RNA ligase and also, separately with RNA Ligase (Rnl2(l-249)k->Q). Ligation products were gel purified and mixed. 5' adaptor (GUUC AGAGUUCUACAGUCCGACGAUC) (SEQ ID NO: 2) was ligated with T4 RNA Ligase.
[00229] RT-PCR (Superscript II, Invitrogen) was performed with 5' primer (CAAGCAGAAGACGGCATA) (SEQ ID NO: 3). Splicing of overlapping ends PCR (SOEPCR) was performed (Phusion, NEB) with 5' primer and 3' PCR primer (AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGA) (SEQ ID NO: 4), generating cDNA with extended 3' adaptor sequence. PCR product (40 μl) was denatured (850C, 10 min, formamide loading dye), and the differently sized strands were purified on a 90% formamide, 8% acrylamide gel, yielding single-stranded DNA suitable Solexa sequencing.
[00230] The single-stranded DNA samples were resuspended in 1OmM Tris (EB buffer)/0.1% Tween and then used as indicated in the standard Solexa sequencing protocol (Ulumina). Each library was run on one lane of the Solexa sequencer. [00231] Polony generation on Solexa Flow-Cells
[00232] The DNA library (2-4 pM) was applied to the flow-cell (8 samples per flow-cell) using the Cluster Station device from Ulumina. The concentration of library applied to the flow-cell was calibrated such that polonies generated in the bridge amplification step originate from single strands of DNA. Multiple rounds of amplification reagents were flowed across the cell in the bridge amplification step to generate polonies of approximately 1,000 strands in 1 μm diameter spots. Double stranded polonies were visually checked for density and morphology by staining with a 1 :5000 dilution of SYBR Green I (Invitrogen) and
visualizing with a microscope under fluorescent illumination. Validated flow-cells were stored at 4 C until sequencing. [00233] Sequencing and Analysis
[00234] Flow-cells were removed from storage and subjected to linearization and annealing of sequencing primer on the Cluster Station. Primed flow-cells were loaded into the Illumina Genome Analyzer 1 G. After the first base was incorporated in the Sequencing- by-Synthesis reaction the process was paused for a key quality control checkpoint. A small section of each lane was imaged and the average intensity value for all four bases was compared to minimum thresholds. Flow-cells with low first base intensities were re-primed and if signal was not recovered the flow-cell was aborted. Flow-cells with signal intensities meeting the minimum thresholds were resumed and sequenced for 36 cycles. Images acquired from the Illumina/Solexa sequencer were processed through the bundled Solexa image extraction pipeline which identified polony positions, performed base-calling and generated QC statistics. Sequences were then assigned to a miRNA if they perfectly matched at least the first 20bp of the mature miRNA sequences downloaded from targetScan (http://www.targetscan.org/). Mature miRNA frequencies were then normalized to each other by determining the expected frequency in mapped reads/million. A full list of the miRNAs detected can be found in Table S9
[00235] In each cell type examined, a small subset of mature miRNA transcripts predominated (Figure 4a). Members of the miR-290-295 cluster, which encodes multiple miRNAs with the same seed sequence, constituted approximately two thirds of all mature miRNA transcripts in murine ES cells. Let-7 family members constituted roughly one quarter and one half of miRNAs in MEFs and NPCs, respectively. The miR-290-295 cluster, which dominated the expression profile of ES cells, but was scarce in both MEFs and NPCs, is occupied at its promoters by Oct4, Sox2, Nanog and Tcf3 (Figure 3a), consistent with the hypothesis that these factors are important for maintaining the expression of the miR-290- 295 miRNA cluster in ES cells. [00236] Read Normalization
[00237] Rigorous comparison of the time points in the Oct4 depletion experiment required normalization of the total number of miRNAs in each cell. Normalization was preformed using Northern analysis of the samples using multiple probes. First the total RNA in each lane was normalized using both a probe against tRNA and a probe againt the U6-RNA. This
normalization was then applied to a series of individual mature miRNAs, including mmu- miR-16, -19b, -21 , -291, and -293 (data not shown). On average, these samples showed the very similar results as when the Solexa small RNA reads were normalized to total miRNA content with a relatively large error of +/- 20%. We therefore assumed that the approximation that the total miRNA content of the cells was largely unchanged within 24 hours and normalized the final data presented in Table S9 using the total numbers of miRNAs sequenced.
[00238] Northern blots of mature miRNAs
[00239] Northern blots were performed as in Houbaviy et al. (2003), except hybridization was carried out in Oligo Hyb (Ambion) at 37 C. Probe sequences were DNA oligos complementary to annotated miRBase miRNAs, Probes used in normalization were U6 snoRNA: 5'-GGGCCATGCTAATCTTCTCTGT^' (SEQ ID NO: 5) (Houbaviy et al. 2005) and tRNA-gln: 5'-TGGAGGTTCCACCGAGAT-S' (SEQ ID NO: 6). [00240] miRNA Microarray Expression Analysis
[00241] RNA from murine embryonic stem cells (mES, V6.5), mouse embryonic fibroblasts (MEFs) and murine induced pluripotent (iPS) cells was extracted with RNeasy (Qiagen) reagents. 5 μg total RNA from treated and control samples were labeled with Hy3™ and Hy5™ fluorescent label, using the miRCURY™ LNA Array labeling kit (Exiqon, Denmark) following the procedure described by the manufacturer. The labeled samples were mixed pair-wise and hybridized to the miRNA arrays printed using miRCURY™ LNA oligoset version 8.1 (Exiqon, Denmark). Each miRNA was printed in duplicate, on codelink slides (GE), using GeneMachines Omnigrid 100. The hybridization was performed at 6OC overnight using the Agilent Hybridization system - SurHyb, after which the slides were washed using the miRCURY™ LNA washing buffer kit (Exiqon, Denmark) following the procedure described by the manufacturer. The slides were then scanned using Axon 4000B scanner and the image analysis was performed using Genepix Pro 6.0.
[00242] Data from experiments with mES cells, MEFs and iPS cells were then combined. Median signal intensities for all microarray probes were background subtracted and tabulated. The data were then quantile normalized by assigning each probe the average signal intensity for all probes of the same intensity rank across the six experiments. Signal
intensities were then floored at one unit and log transformed. Control probes were removed from further analysis.
[00243] We next looked to identify miRNA probes that were differentially enriched in mES cell and MEF samples and to compare them to data from iPS cells. Statistically significant differential expression for the two samples was calculated using the online NIA Array Analysis Tool (http://lgsun.grc.nia.nih.gov/ANOVA/). Probes from 3 MEF samples and 2 mES cell samples were tested for differential expression using the following settings: [00244]
Threshold z-value to remove outliers: 10, 000
Error Model: Max(Average,Bayesian)
Error variance averaging window: 100
Proportion of highest error variances to be removed: 0, 01
Bayesian degrees of freedom: 5
FDR threshold: 0.10
[00245] Of 1008 probes, 230 were determined to be differentially expressed between the MEF and mES samples.
[00246] For clustering and heat map display, expression data were Z-score normalized. Centroid linkage, Spearman rank correlation distance, and hierarchical clustering of genes and arrays was performed using Gene Cluster 3.0 (http : //bonsai . ims . u- tokyo.ac.ip/~mdehoon/software/cluster/software.htm#ctv). Heatmaps were generated using Java Treeview (http://itreeview.sourceforge.net/) with color saturation at 0.6 standard deviations. Complete miRNA microarray expression data, differential expression analysis results, and clustergram data are provided (Table SIl). [00247] Tissue-Specificity of miRNAs
[00248] To determine the global tissue-specificity for miRNAs we used data from the recent publication of the miRNA atlas (Landgraf et al., 2007). Specificity scores were taken from Table S34 Node 0 from Landgraf et al. (2007). Of the 45 distinct mature miRNAs with specificity scores >1 that are not bound only by Oct4/Sox2/Nanog/TcD, 16 were identified as Suzl2 targets. These 16 represent over 40% of the distinct mature miRNAs whose
-4 promoters are occupied by Suzl2 (p < 5x10 for specificity scores > 1.3) [00249] Functional regulation of miRNAs by Oct4 and TeO in mES cells
[00250] The occupancy of the promoters of the ES cell specific miRNAs by Oct4/Sox2/Nanog/Tcf3 implicates these critical factors as possible regulators of miRNA
transcription. However, binding data alone cannot test the importance of the regulatory circuitry at any given gene. Gene regulation is complex and factors associated with the gene are likely to be redundant and may play different roles at different genes. In order to understand the role of members of the core regulatory circuitry at the miRNA genes, we have performed perturbation experiments on Oct4 and TcO. [00251] miRNA regulation by Oct4
[00252] Oct4 is a critical regulator of ES cell pluripotency and disruption of Oct4 leads to rapid differentiation of the ES cells (Niwa et al., 2000). In order to understand the roll of Oct4 in regulating ES cell miRNAs, we utilized a system where both endogenous copies of Oct4 had been disrupted and Oct4 is supplied by an exogenous transgene under the control of a doxycyline regulated promoter (Niwa et al., 2000 and Figure S7a). Oct4 mRNA is rapidly lost in these cells upon doxycycline induction (Figure S7b). This system allowed us to examine the cells at early timepoints (12 and 24 hours) when the cells remain ES-like morphologically and still express Sox2 (Figure S7c), however we must assume this state is only transient as these cells are presumably in the early steps of differentiating. [00253] We used two separate approaches to ascertain the role of Oct4 on miRNAs in these cells. First we directly tested the levels of selected primary transcripts of the miRNA promoters occupied by Oct4/Sox2/Nanog/Tcf3. Real time PCR primers were designed within the ~200nt immediately upstream of the tested miRNA hairpins or in the middle oϊmir-290- 295 polycistron, but outside of any hairpin regions (Figure 3d and see below for sequences). Primers were used to test samples at 0, 12 and 24 hours post-exposure to doxycycline (Figure S8b). Within 24 hours, the levels of all five miRNAs tested were reduced significantly (p < 0.001). The reduction of pή-let-7g is particularly interesting since the production of mature let-7g, found at high levels in differentiated cell types, is inhibited by the Oct4 target Lin28 (see Figure 6).
[00254] As a second approach, we examined the levels of the mature miRNAs by quantitative sequencing of short RNAs. This approach has the advantage of covering all miRNAS expressed in mES cells but is limited by the long half-life of mature miRNAs (believed to be as long as 24 hours D. Bartel, personal communication). Using the Solexa sequencer, we were able to identify 5.8 million reads across the six samples that matched known miRNAs. These samples were normalized based on Northern blot analysis (see above, data not shown). Using this methodology, we were able to observe changes in the
relative abundance of Oct4/Sox2/Nanog/TcO occupied miRNAs, with a general tendency for these miRNAs to decrease in relative abundance. Changes in miRNA expression were subtle, as expected, but some miRNAs were down regulated as much as 4-fold (Figure S7d, Table S9),
[00255] miRNA regulation by Tcβ
[00256] TcD is a terminal component of the canonical Wnt pathway in ES cell has been integrated into the core circuitry regulating ES cells. Recent reports have indicated that TcO depletion causes impaired differentiation in ES cells and upregulation of pluripotency genes, including Oct4, Sox2 and Nanog (Cole et al., Genes and Dev 2008; Tarn et al., Stem Cells 2008; Yi et al., Stem Cells 2008). Genes encoding several key pluripotency factors were observed to increase in expression, albeit only mildly, but other genes decreased in expression or remained expressed at the same level. The different regulatory effects at different target genes may depend on the proteins associated with Tcβ at the each promoter. [00257] In order to understand the role of TcO in regulating miRNAs in ES cells, we performed knockdown experiments on V6.5 ES cells using lentiviral shRNAs as described in Cole et al. Independent knockdowns resulted in over 70% depletion of endogenous TcO (Figure S8a). Unlike Oct4, TcO depleted ES cells are stable, allowing longer time points in these experiments. As described above, we investigated the effect of TcO depletion on selected primary transcripts (Figure S8b,c). Depletion of TcO resulted in small but significant increases in the levels of the primary transcript for the mir-302 and mir-290-295 clusters by 72 hours post-infection. In addition, we see the primary transcript for the non-ES cell specific mir 106-363 cluster decreases in steady state levels. Unlike the results we observe for Oct4, the primary transcript for let-7g does not show significant changes in expression.
[00258] Tcβ knockdown
[00259] TcO knockdown experiments were performed essentially as in Cole et al., 2008 with minor modifications. Lentivirus was produced according to Open Biosystems Trans- lentiviral shRNA Packaging System (TLP4614). The shRNA constructs targeting murine Tcβ were designed using an siRNA rules-based algorithm consisting of sequence, specificity, and position scoring for optimal hairpins that consist of a 21 -base stem and a 6- base loop (RMM4534-NM-009332). A knockdown control virus targeting EGFP was produced from vector obtained from the RNAi Consortium. V6.5 mES cells were plated at
-30% confluence on the day of infection. Cells were seeded in mES media with 6 μg/mL polybrene (Sigma, H9268-10G) and Tcβ knockdown or control virus was immediately added. After 24 h, infection media was removed and replaced with mES media with 2 μg/mL
Puromycin (Sigma, P8833). RNA was harvested at 72 h after infection.
[00260] Knockdown efficiency was measured using real-time PCR to measure levels of
Tcβ mRNA (Figure S8a).
[00261 ] qR T-PCR of primary miRNAs
[00262] qPCR primers were designed using the standard specifications of PrimerExpress
(Applied Biosystems) or Primer3 (Rozen and Skaletsky, 2000) for real time primer design.
Primers were then used in SybrGreen quantitative PCR assays on the Appied Biosystems
7500 Real Time PCR system. Expression levels were calculated relative to Gapdh mRNA levels, which were quantified with in parallel with by Taqman analysis. The effects here were observed with high concentrations of cDNA template.
[00263] Identification of Oct4/Sox2/Nanog/Tcβ occupied Feed-Forward Loops [00264] To identify feed forward loops, we examined the recent data set identifying functional targets of the miR-290-295 cluster (Sinkkonen et al., 2008). In their study, Sinkkonen et al. identified miR-290-295 targets by both looking at mRNAs that increase in level in a Dicer -/- cell line and overlap that data set with mRNAs that decrease in expression when miR-290-295 mimic siRNA is added back to the cells. Because the promoter of the mir-290-295 polycistron is occupied by Oct4/Sox2/Nanog/Tef3, any targets of the miRNA cluster that are also occupied by the 4 factors would represent feed forward targets. Of the 245 miR-290-295 cluster targets identified in the Sinkkonen et al. study (2008), promoters
for 64 are occupied by Oct4/Sox2/Nanog/Tcf3. This is approximately 50% more interactions
-4 then would be expected by random (binomial p-value < 1x10 ).
[00265] Interestingly, only a small minority of these genes are also occupied by significant quantities of the PRC2 subunit Suzl2. Of the 64 targets whose promoters are occupied by
Oct4/Sox2/Nanog/TcG, only 5 are occupied by domains of Suzl2 binding >500bp (larger region sizes have been correlated with gene silencing, Lee et al., 2006). This may be because
PcG bound proteins are not functional targets of mir-290-295 in mES cells. Alternatively these proteins are not expressed in ES cells following Dicer deletion and are thus excluded from the target list (Sinkkonen et al., 2008), but may be targets at other stages of development. In the later case, the miRNAs may serve as a redundant silencing mechanism, along with Polycomb group complexes, to help prevent even low levels of expression of the developmental regulators in ES cells.
[00266] References for Example 7
[00267] Bailey, T. L., and Elkan, C. (1995). The value of prior knowledge in discovering motifs with MEME. Proc Int Conf Intell Syst MoI Biol 3, 21-29.
[00268] Bailey, T. L., Williams, N., Misleh, C, and Li, W. W. (2006). MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res 34, W369-
373.
[00269] Barski, A., Cuddapah, S., Cui, K., Roh, T. Y., Schones, D. E., Wang, Z., Wei, G.,
Chepelev, L, and Zhao, K. (2007). High-resolution profiling of histone methylations in the human genome. Cell 129, 823-837.
[00270] Beissbarth, T., and Speed, T. P. (2004). GOstat: find statistically overrepresented
Gene Ontologies within a group of genes. Bioinformatics 20, 1464-1465.
[00271] Boyer, L. A., Lee, T. L, Cole, M. F., Johnstone, S. E., Levine, S. S., Zucker, J. P.,
Guenther, M. G., Kumar, R. M., Murray, H. L., Jenner, R. G., et al. (2005). Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 122, 947-956.
[00272] Boyer, L. A., Plath, K., Zeitlinger, J., Brambrink, T., Medeiros, L. A., Lee, T. L,
Levine, S. S., Wernig, M., Tajonar, A., Ray, M. K., et al. (2006). Polycomb complexes repress developmental regulators in murine embryonic stem cells. Nature 441, 349-353.
[00273] Calabrese, J. M., Seila, A. C, Yeo, G. W., and Sharp, P. A. (2007). RNA sequence analysis defines Dicer's role in mouse embryonic stem cells. Proc Natl Acad Sci U
S A 104, 1809718102.
[00274] Chang, T. C, Wentzel, E. A., Kent, O. A., Ramachandran, K., Mullendore, M.,
Lee, K. H., Feldmann, G., Yamakuchi, M., Ferlito, M., Lowenstein, C. J., et al. (2007).
Transactivation of miR-34a by p53 broadly influences gene expression and promotes apoptosis. MoI Cell 26, 745-752.
[00275] Cole, M. F., Johnstone, S. E., Newman, J. J., Kagey, M. H., and Young, R. A.
(2008). TcG is an integral component of the core regulatory circuitry of embryonic stem cells. Genes Dev 22, 746-755.
[00276] Corney, D. C, Flesken-Nikitin, A., Godwin, A. K., Wang, W., and Nikitin, A. Y.
(2007). MicroRNA-34b and MicroRNA-34c are targets of p53 and cooperate in control of cell proliferation and adhesion-independent growth. Cancer Res 67, 8433-8438.
[00277] Fukao, T., Fukuda, Y., Kiga, K., Sharif, J., Hino, K., Enomoto, Y., Kawamura,
A., Nakamura, K., Takeuchi, T., and Tanabe, M. (2007). An evolutionarily conserved mechanism for microRNA-223 expression revealed by micro RNA gene profiling. Cell 129,
617-631.
[00278] Gerhard, D. S., Wagner, L., Feingold, E. A., Shenmen, C. M., Grouse, L. H.,
Schuler, G., Klein, S. L., Old, S., Rasooly, R., Good, P., et al. (2004). The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC).
Genome Res 14, 2121-2127.
[00279] Grimson, A., Farh, K. K., Johnston, W. K., Garrett-Engele, P., Lim, L. P., and
Bartel, D. P. (2007). MicroRNA targeting specificity in mammals: determinants beyond seed pairing. MoI Cell 27, 91-105.
[00280] Guenther, M. G., Levine, S. S., Boyer, L. A., Jaenisch, R., and Young, R. A.
(2007). A chromatin landmark and transcription initiation at most promoters in human cells.
Cell 130, 77-88.
[00281] Hinrichs, A. S., Karolchik, D., Baertsch, R., Barber, G. P., Bejerano, G., Clawson,
H., Diekhans, M., Furey, T. S., Harte, R. A., Hsu, F., et al. (2006). The UCSC Genome
Browser Database: update 2006. Nucleic Acids Res 34, D590-598.
[00282] Hubbard, T., Andrews, D., Caccamo, M., Cameron, G., Chen, Y., Clamp, M.,
Clarke, L., Coates, G., Cox, T., Cunningham, F., et al. (2005). Ensembl 2005. Nucleic Acids
Res 33, D447-453.
[00283] Johnson, D., Martazavai, A., Myers, R., Wold, B., (2007). Genome-wide mapping of inn vivo protein-DNA interactions. Science 316, 1441-2.
[00284] Kent, W. J., Sugnet, C. W., Furey, T. S,, Roskin, K. M., Pringle, T. H., Zahler, A.
M., and Haussler, D. (2002). The human genome browser at UCSC. Genome Res 12, 996-
1006. Lagos-Quintana, M., Rauhut, R., Lendeckel, W., and Tuschl, T. (2001). Identification of novel genes coding for small expressed RNAs. Science 294, 853-858.
[00285] Landgraf, P., Rusu, M., Sheridan, R., Sewer, A., lovino, N., Aravin, A., Pfeffer,
S., Rice, A., Kamphorst, A. O., Landthaler, M., et al. (2007). A mammalian microRNA expression atlas based on small RNA library sequencing. Cell 129, 1401-1414.
[00286] Lau, N. C, Lim, L. P., Weinstein, E. G., and Bartel, D. P. (2001). An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294,
858-862.
[00287] Lee, T. L, Jenner, R. G., Boyer, L. A., Guenther, M. G., Levine, S. S., Kumar, R.
M., Chevalier, B., Johnstone, S. E., Cole, M. F., Isono, K., et al. (2006a). Control of developmental regulators by Polycomb in human embryonic stem cells. Cell 125, 301-313.
[00288] Lee, T. I., Johnstone, S. E., and Young, R. A. (2006b). Chromatin immunoprecipitation and microarray-based analysis of protein location. Nat Protoc 1, 729-
748.
[00289] Loh, Y. H., Wu, Q., Chew, J. L., Vega, V. B., Zhang, W., Chen, X., Bourque, G.,
George, J., Leong, B., Liu, J., et al. (2006). The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells. Nat Genet 38, 431-440.
[00290] Mikkelsen, T. S., Ku, M., Jaffe, D. B., Issac, B., Lieberman, E., Giannoukos, G.,
Alvarez, P., Brockman, W., Kim, T. K., Koche, R. P., et al. (2007). Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448, 553-560.
[00291] Niwa, H., Miyazaki, J., and Smith, A. G. (2000). Quantitative expression of Oct-
3/4 defines differentiation, dedifferentiation or self-renewal of ES cells. Nat Genet 24, 372-
376.
[00292] O'Donnell, K. A., Wentzel, E. A., Zeller, K. L, Dang, C. V., and Mendell, J. T.
(2005). c-Myc-regulated microRNAs modulate E2F1 expression. Nature 435, 839-843.
[00293] Okabe, S., Forsberg-Nilsson, K., Spiro, A. C, Segal, M., and McKay, R. D.
(1996). Development of neuronal precursor cells and functional postmitotic neurons from embryonic stem cells in vitro. Mech Dev 59, 89-102.
[00294] Pall, G. S., Codony-Servat, C, Byrne, J., Ritchie, L. & Hamilton, A.
Carbodiimide mediated cross-linking of RNA to nylon membranes improves the detection of siRNA, miRNA and piRNA by northern blot. Nucleic Acids Res 35, e60 (2007).
[00295] Pruitt, K. D., Tatusova, T., and Maglott, D. R. (2005). NCBI Reference Sequence
(RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.
Nucleic Acids Res 33, D501-504.
[00296] Rozen, S., Skaletsky, H. (2000) Primer3 on the WWW for general users and for biologist programmers. Methods MoI. Biol. 132, 365-86
[00297] Santos-Rosa, H., Schneider, R., Bannister, A. J., Sherriff, J., Bernstein, B. E.,
Emre, N. C, Schreiber, S. L., Mellor, J., and Kouzarides, T. (2002). Active genes are tri- methylated at K4 of histone H3. Nature 419, 407-411.
[00298] Sinkkonen, L., Hugenschmidt, T., Berninger, P., Gaidatzis, D., Mohn, F., Artus-
Revel, C. G., Zavolan, M., Svoboda, P., and Filipowicz, W. (2008). MicroRNAs control de novo DNA methylation through regulation of transcriptional repressors in mouse embryonic stem cells. Nat Struct MoI Biol 15, 259-267.
[00299] Tarn, W. L., Lim, C. Y., Han, J., Zhang, J., Ang, Y. S., Ng, H. H., Yang, H., and
Lim, B. (2008). TcD Regulates Embryonic Stem Cell Pluripotency and Self-Renewal by the
Transcriptional Control of Multiple Lineage Pathways. Stem Cells.
[00300] Tucker, K. L., Wang, Y., Dausman, J., and Jaenisch, R. (1997). A transgenic mouse strain expressing four drug-selectable marker genes. Nucleic Acids Res 25, 3745-
3746.
[00301] Valoczi, A. et al. Sensitive and specific detection of microRNAs by northern blot analysis using LNA-modified oligonucleotide probes. Nucleic Acids Res 32, el 75 (2004).
[00302] Voorhoeve, P. M., Ie Sage, C, Schrier, M., Gillis, A. J., Stoop, H., Nagel, R., Liu,
Y. P., van Duijse, J., Drost, J., Griekspoor, A., et al. (2006). A genetic screen implicates miRNA372 and miRNA-373 as oncogenes in testicular germ cell tumors. Cell 124, 1 169-
1181.
[00303] Wernig, M., Meissner, A., Foreman, R., Brambrink, T., Ku, M., Hochedlinger, K.,
Bernstein, B. E., and Jaenisch, R. (2007). In vitro reprogramming of fibroblasts into a pluripotent ES-cell-like state. Nature 448, 318-324.
[00304] Yi, F., Pereira, L., and Merrill, B. J. (2008). TcO Functions as a Steady State Limiter of Transcriptional Programs of Mouse Embryonic Stem Cell Self Renewal. Stem
Cells.
* * *
[00305] One skilled in the art readily appreciates that the present invention is well adapted to carry out the objects and obtain the ends and advantages mentioned, as well as those inherent therein. The methods, systems and kits are representative of certain embodiments, are exemplary, and are not intended as limitations on the scope of the invention. Modifications therein and other uses will occur to those skilled in the art. These modifications are encompassed within the spirit of the invention and are defined by the scope of the claims. It will be readily apparent to a person skilled in the art that varying substitutions and modifications may be made to the invention disclosed herein without departing from the scope and spirit of the invention.
[00306] The articles "a" and "an", unless clearly indicated to the contrary, should be understood to include the plural referents. Claims or descriptions that include "or" between one or more members of a group are considered satisfied if one, more than one, or all of the group members are present in, employed in, or otherwise relevant to a given product or process unless indicated to the contrary or otherwise evident from the context. The invention includes embodiments in which exactly one member of the group is present in, employed in, or otherwise relevant to a given product or process. The invention also includes embodiments in which more than one, or all of the group members are present in, employed in, or otherwise relevant to a given product or process. Furthermore, it is to be understood that the invention provides all variations, combinations, and permutations in which one or more limitations, elements, clauses, descriptive terms, etc., from one or more of the listed claims is introduced into another claim dependent on the same base claim (or, as relevant, any other claim) unless otherwise indicated or unless it would be evident to one of ordinary skill in the art that a contradiction or inconsistency would arise. Where elements are presented as lists, e.g., in Markush group or similar format, it is to be understood that each subgroup of the elements is also disclosed, and any element(s) can be removed from the group. It should it be understood that, in general, where the invention, or aspects of the invention, is/are referred to as comprising particular elements, features, etc., certain embodiments of the invention or aspects of the invention consist, or consist essentially of, such elements, features, etc. For
purposes of simplicity those embodiments have not in every case been specifically set forth herein. It should also be understood that any embodiment of the invention, can be explicitly excluded from the claims, regardless of whether the specific exclusion is recited in the specification.
[00307] Where ranges are given herein, the invention includes embodiments in which the endpoints are included, embodiments in which both endpoints are excluded, and embodiments in which one endpoint is included and the other is excluded. It should be assumed that both endpoints are included unless indicated otherwise. Furthermore, it is to be understood that unless otherwise indicated or otherwise evident from the context and understanding of one of ordinary skill in the art, values that are expressed as ranges can assume any specific value or subrange within the stated ranges in different embodiments of the invention, to the tenth of the unit of the lower limit of the range, unless the context clearly dictates otherwise. It is also understood that where a series of numerical values is stated herein, the invention includes embodiments that relate analogously to any intervening value or range defined by any two values in the series, and that the lowest value may be taken as a minimum and the greatest value may be taken as a maximum. Numerical values, as used herein, include values expressed as percentages. For any embodiment of the invention in which a numerical value is prefaced by "about" or ''approximately", the invention includes an embodiment in which the exact value is recited. For any embodiment of the invention in which a numerical value is not prefaced by "about" or "approximately", the invention includes an embodiment in which the value is prefaced by "about" or "approximately". "Approximately" or "about" is intended to encompass numbers that fall within a range of ±10% of a number, in some embodiments within ±5% of a number, in some embodiments within ±1%, in some embodiments within ±0.5% of a number, in some embodiments within ±0.1% of a number unless otherwise stated or otherwise evident from the context (except where such number would impermissibly exceed 100% of a possible value).
[00308] Certain claims are presented in dependent form for the sake of convenience, but Applicant reserves the right to rewrite any dependent claim in independent form to include the limitations of the independent claim and any other claim(s) on which such claim depends, and such rewritten claim is to be considered equivalent in all respects to the dependent claim in whatever form it is in (either amended or unamended) prior to being rewritten in
independent format. It should also be understood that, unless clearly indicated to the contrary, in any methods claimed herein that include more than one act, the order of the acts of the method is not necessarily limited to the order in which the acts of the method are recited, but the invention includes embodiments in which the order is so limited.
Table S6: Mouse miRNA promoters and associated proteins and genomic features
PROMOTER SCORING BINDING miRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3 chr1 20664158 chii 20471175- mmu-mιr-206 20664237 (+) 0 20472525 E 0 chr1 20667938 chii 20471175- mmu-mιr-133b 20668016 (+) 0 20472525 E 0 chr1 23226539 chii 23209700- mmu-mιr-30a 23226621 (+) -5 23213275 EFNW CpG 0 chii 23245979 chri 23209700- mmu-mir-30c-2 23246060 (+) -5 23213275 EFNW CpG chii 74327521 chii 74324705- mmu-mιr-26b 74327597 (+) 30 74324905 EWNF CpG 0 GENIC <5kb Oct4 Nanog chii 74833858 chii 74833934- mmu-mιr-375 74833934 (-) 10 74835100 EW CpG 0 <5kb Oct4 Suz12 K27me3 chii 94680777 chii 94617259- mmu-mιr-149 94680858 (+) 19 94617459 known 1 GENIC
EST C
D54994 chrt 94680777 chri 94662155- 5.CD54 mmu-mιr-149 94680858 (+) 10 94662355 FENW CpG 0 9969 _ K27me3 chii 13002990
5-130029980 chii 12993075 mmu-mιr-128a (+) 25 1-129930951 FNEW CpG GENIC Cons chii 13402563
9-134025719 Chii 13401887 mmu-mιr-135b (+) 19 6-134019076 ENWF GENIC Oct4 Sox2 Nanog Tcf3 Suz12 K27me3
miRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3 non- overlap gene
ChM 13978286 overlap 9-139782950 chr1 13971491 in mmu-mιr-181 a-2 (+) 0 0-139715099 N manual Human Cons non- overlap gene chr1 13978305 overlap 1-139783132 chr1 13971491 in mmu-mιr-181 b-1 (+) 0 0-139715099 N manual Human Cons chr1 16034231 8-160342396 chr1 16019891 mmu-mιr-488 (+) 24 6-160199116 NEW GENIC Cons Oct4 Sox2 Tcf3 K27me3 chr1 16405451 06 O 1-164054591 chr1 16405225 mmu-mιr-199a-2 (+) 10 0-164061375 F <5kb Nanog chr1 16406006 4-164060146 chr1 16405225 mmu-mιr-214 (+) 10 0-164061375 F <5kb Nanog chr1 17860900 7-178609084 (- chr1 17863660 mmu-mιr-350 0 0-178640175 N chr1 18701410 3-187014179 chr1 18690830 mmu-mιr-194-1 (+) 0 0-186912075 EW Oct4 Sox2 Nanog chr1 18701438
8-187014466 chr1 18690830 mmu-mιr-215 (+) 0 0-186912075 EW Oct4 Sox2 Nanog chr1 19520818 non-
1-195208258 (- chr1 19520835 overlap mmu-mιr-205 ) 0 0-195211700 EST <5kb Oct4 Sox2 Nanog
imiRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3 chr1 19673776 non- 2-196737847 chii 19667610 overlap mmu-mιr-29b-2 (+) -10 0-196679875 EFNW CpG 0 EST chr1 19673827 non-
8-196738355 chii 19667610 overlap mmu-mιr-29c (+) -10 0-196679875 EFNW CpG 0 EST chr2 10384797 chr2 10290334- mmu-mιr-466f-1 10384890 (+) 20 10290534 EWFN CpG 0 GENIC Suz12 K27me3 chr2 10386528 chr2 10290334- mmu-mιr-466f-2 10386621 (+) 20 10290534 EWFN CpG 0 GENIC Suz12 K27me3 chr2 10389806 chr2 10290334- mmu-mir-466f-3 10389899 (+) 20 10290534 EWFN CpG 0 GENIC Suz12 K27me3
OC chr2 10390107 chr2 10290334- mmu-mιr-297a-5 10390196 (+) 20 10290534 EWFN CpG 0 GENIC Suz12 K27me3 chr2 10391784 chr2 10290334- mmu-mιr-467c 10391880 (+) 20 10290534 EWFN CpG 0 GENIC Suz12 K27me3 chr2 10392072 chr2 10290334- mmu-rnιr-466b-1 10392153 (+) 20 10290534 EWFN CpG 0 GENIC Suz12 K27me3 chr2 10394198 chr2 10290334- mmu-mιr-467a-3 10394275 (+) 20 10290534 EWFN CpG 0 GENIC Suz12 K27me3 chr2 10394483 chr2 10290334- mmu-mιr-466e-1 10394566 (+) 20 10290534 EWFN CDG 0 GENIC Suz12 K27me3 chr2 10396655 chr2 10290334- mmu-mιr-467a-4 10396732 (+) 20 10290534 EWFN CpG 0 GENIC Suz12 K27me3
miRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG Tcf3 Suz12 me3 chr2 10396941 chr2 10290334- mmu-mιr-466e-2 10397024 (+) 20 10290534 EWFN CpG GENIC Suz12 K27me3 chr2 10399100 chr2 10290334- mmu-mιr-467a-1 10399176 (+) 20 10290534 EWFN CpG GENIC - Suz12 K27me3 chr2 10399387 chr2 10290334- mmu-mιr-466c-1 10399470 (+) 20 10290534 EWFN CpG GENIC Suz12 K27me3 chr2 10401535 chr2 10290334- mmu-mιr-467a-2 10401612 (+) 20 10290534 EWFN CpG GENIC - Suz12 K27me3 chr2 10401822 chr2 10290334- mmu-mιr-466c-2 10401905 (+) 20 10290534 EWFN CpG GENIC - Suz12 K27me3
OC
K) chr2 10403992 chr2 10290334- mmu-mιr-467a-5 10404069 (+) 20 10290534 EWFN CpG GENIC - Suz12 K27me3 chr2 10404280 chr2 10290334- mmu-mιr-466b-4 10404361 (+) 20 10290534 EWFN CpG GENIC - Suz12 K27me3 mmu-mιr-467a- chr2 10406456 chr2 10290334-
11 10406533 (+) 20 10290534 EWFN CpG GENIC Suz12 K27me3 chr2 10406742 chr2 10290334- mmu-mιr-466b-5 10406823 (+) 20 10290534 EWFN CpG GENIC Suz12 K27me3 chr2 10408905 Chr2 10290334- mmu-mιr-467a-6 10408982 (+) 20 10290534 EWFN CpG GENIC Suz12 K27me3 chr2 10409192 Chr2 10290334- mmu-mιr-466c-3 10409275 (+) 20 10290534 EWFN CpG - Suz12 K27me3
mϊRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position : SCOl TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3 chr2 10411367 chr2 10290334- mmu-mιr-467a-8 10411444 (+) 20 10290534 EWFN CpG 0 GENIC - Suz12 K27me3 chr2 10411655 chr2 10290334- mmu-mιr-466b-6 10411736 (+) 20 10290534 EWFN CpG 0 GENIC - Suz12 K27me3 mmu-m[r-467a- chr2 10413837 chr2 10290334-
12 10413914 (+) 20 10290534 EWFN CpG 0 GENIC - Suz12 K27me3 chr2 10414123 chr2 10290334- mmu-mιr-466b-8 10414204 (+) 20 10290534 EWFN CpG 0 GENIC - Suz12 K27me3 chr2 10416250 chr2 10290334- mmu-mιr-467a-9 10416327 (+) 20 10290534 EWFN CpG 0 GENIC Suz12 K27me3
00
W chr2 10416538 chr2 10290334- mmu-mιr-466b-2 10416619 (+) 20 10290534 EWFN CpG 0 GENIC - Suz12 K27me3 mmu-mir-467a- chr2 10418679 chr2 10290334-
10 10418756 (+) 20 10290534 EWFN CpG 0 GENIC Suz12 K27me3 chr2 10418966 chr2 10290334- mmu-mιr-466b-7 10419047 (+) 20 10290534 EWFN CpG 0 GENIC - Suz12 K27me3 chr2 10421130 chr2 10290334- mmu-mιr-467a-7 10421207 (+) 20 10290534 EWFN CpG 0 GENIC Suz12 K27me3 chr2 10421418 chr2 10290334- mmu-mir-466b-3 10421498 (+) 20 10290534 EWFN CpG 0 GENIC - Suz12 K27me3 chr2 10423574 chr2 10290334- mmu-mir-467e 10423660 (+) 20 10290534 EWFN CpG 0 GENIC Suz12 K27me3
miRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG Suz12 me3 chr2 10425483 chr2 10290334- mmu-mιr-467d 10425567 (+) 20 10290534 EWFN CpG GENIC Suz12 K27me3 chr2 10425771 chr2 10290334- mmu-mιr-466a 10425853 (+) 20 10290534 EWFN CpG GENIC Suz12 K27me3 chr2 10426869 chr2 10290334- mmu-mιr-297c 10426966 (+) 20 10290534 EWFN CpG GENIC Suz12 K27me3 chr2 10427162 chr2 10290334- mmu-mιr-669c 10427244 (+) 20 10290534 EWFN CpG GENIC Suz12 K27me3 chr2 10427751 chr2 10290334- mmu-mιr-669a-4 10427846 (+) 20 10290534 EWFN CpG GENIC Suz12 K27me3
00 4- chr2 10429529 chr2 10290334- mmu-mιr-297b 10429620 (+) 20 10290534 EWFN CpG GENIC Suz12 K27me3 chr2 10429820 chr2 10290334- mmu-mιr-466d-1 10429915 (+) 20 10290534 EWFN CpG GENIC Suz12 K27me3 chr2 10431601 chr2 10290334- mmu-mιr-466d-2 10431682 (+) 20 10290534 EWFN CpG GENIC Suz12 K27me3 chr2 10432448 chr2 10290334- mmu-mιr-466g 10432527 (+) 20 10290534 EWFN CpG GENIC Suz12 K27me3 chr2 10432744 chr2 10290334- mmu-mιr-466h 10432824 (+) 20 10290534 EWFN CpG GENIC Suz12 K27me3 chr2 10433673 chr2 10290334- mmu-mιr-297a-3 10433774 (+) 20 10290534 EWFN CpG Suz12 K27me3
miRNA H3K4m lInntteenrven- GENE/ Proxϊ- Con- H3K27 name position score TSS position e3 CpG ing S EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3 chr2 10434920 chr2 10290334- mmu-mιr-297a-4 10435017 (+) 20 10290534 EWFN CpG 0 GENIC Suz12 K27me3 chr2 14178757 chr2 14147067- mmu-mir-511 14178835 (+) 25 14147267 known 0 GENIC Cons chr2 26413364 chr2 26402991 - mmu-mir-126 26413442 (+) 19 26403191 EW 1 GENIC chr2 29667644 chr2 29666575- mmu-mιr-219-2 29667728 (-) 15 29669250 EW CpG 0 <5kb Cons Suz12 K27me3 chr2 32140484 chr2 32136375- mmu-mιr-199b 32140564 (+) 9 32138625 F 1 <5kb
06 chr2 32140484 chr2 32138775- mmu-mιr-199b 32140564 (+) 15 32141975 FEW 0 <5kb Cons non- overlap EST overlap chr238674740 chr238645205- in mmu-mιr-181a-138674825 (+) 0 38645407 FN Human manual Oct4 Sox2 Nanog Tcf3 non- overlap EST overlap chr238675844 chr2 38645205- in mmu-mιr-181b-238675923 (+) 0 38645407 FN Human manual Oct4 Sox2 Nanog Tcf3 chr2 74526903 chr2 74512731 - mmu-mιr-10b 74526984 {+) 25 74512931 FNEW CpG GENIC Cons - - Nanog - Suz12 K27me3
miRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3 non- chr2 84541945 chr2 84542025- overlap mmu-mιr-130a 84542025 (-) 0 84546150 ENFW CpG 0 EST <5kb Oct4 - Nanog - chr2 94042206 ch r2 94041175- mmu-mir-129-2 94042288 (-) 10 94045400 EFNW CpG 0 <5kb Oct4 Suz12 K27me3 chr2 11687658
2-116876658 chr2 11682765 mmu-mir-674 (+) 0 0-116829025 N chr2 12233224
4-122332322 chr2 12232922 mmu-mιr-147 (+) 35 7-122329427 EF CpG GENIC <5kb Cons Oct4 chr2 13097949
7-130979576 chr2 13095384 mmu-mir-103-2 (+) 20 0-130954040 EFNW CpG GENIC chr2 15531432
06
4-155314402 chr2 15530255 mmu-mιr-499 (+) 19 2-155302752 known GENIC Tcf3 ch r2 16901009
0-169010171 (- chr2 16905115 mmu-mιr-297a-6 ) 0 0-169052700 E non- chr2 17390998 overlap
7-173910062 (- chr2 17392582 gene G mmu-mιr-296 ) -15 5-173933325 ENWF CpG nas Oct4 Sox2 Nanog Tcf3 Suz12 K27me3 non- chr2 17391044 overlap
0-173910524 (- chr2 17392582 gene G mmu-mιr-298 ) -15 5-173933325 ENWF CpG nas Oct4 Sox2 Nanog Tcf3 Suz12 K27me3 chr2 18031845
5-180318535 chr2 18031548 EST CX mmu-mιr-1-1 (+) 20 7-180315687 EW 729663 <5kb K27me3
miRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3 chr2 18032779 9-180327876 chr2 18031548 EST CX mmu-mιr-133a-2 (+) 10 7-180315687 EW 729663 K27me3 chr2 18082344 5-180823520 chr2 18081900 mmu-mιr-124-3 (+) 15 0-180825000 EWF CpG <5kb Cons Oct4 Suz12 K27me3 chr3 17987829 chr3 17986635- EST BB mmu-mιr-124-2 17987903 (+) 22 17986835 EW CpG 389896 <5kb Cons Oct4 Sox2 Nanog Tcf3 Suz12 K27me3 chr3 69097694 chr3 69092800- mmu-mir-15b 69097773 (+) 25 69093000 FNEW CpG GENIC Cons chr3 69097837 chr3 69092800- mmu-mir-16-2 69097921 (+) 25 69093000 FNEW CpG GENIC Cons
EST B 90
M95018
2.CJ15
4154.C chr3 88301530 chr3 88293050- J 15292 mmu-mιr-9-1 88301611 (+) 14 88293250 N Cons K27me3 chr3 88301530 chr3 88300321- EST CX mmu-mιr-9-1 88301611 (+) 25 88300521 NEFW CpG 221670 <5kb Cons Suz12 K27me3 chr3 89313043 chr3 89313125- EST CF mmu-mιr-92b 89313125 (-) 20 89315675 ENWF 182364 <5kb Oct4 Sox2 Tcf3 non- overlap chr3 90155947 chr3 90134875- gene U mmu-mιr-190b 90156026 (+) -15 90141050 FNEW CpG bap2l
imiRNA H3K4m IInntteeπrven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing S EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3 chr3 11842585
9-118425937 chr3 11842507 EST BE mmu-mιr-137 (+) 25 5-118425859 NEW 0 864749 <5kb Cons Suz12 K27me3 chr3 12228562
5-122285704 (- chr3 12228570 EST BY mmu-mιr-760 ) 22 4-122287675 ENFW CpG 0 724070 <5kb Cons Sox2 Nanog chr3 12753725
0-127537329 chr3 12753700 mmu-mιr-302b (+) 10 0-127537941 (W) 0 <5kb manual Oct4 Sox2 Nanog Tcf3 chr3 12753737
7-127537459 chr3 12753700 mmu-mιr-302c (+) 10 0-127537941 (W) 0 <5kb manual Oct4 Sox2 Nanog Tcf3 chr3 12753751
3-127537593 chr3 12753700 mmu-mιr-302a (+) 10 0-127537941 (W) 0 <5kb manual Oct4 Sox2 Nanog Tcf3
00 chr3 12753764 90
1-127537719 chr3 12753700 mmu-mιr-302ci (+) 10 0-127537941 (W) 0 <5kb manual Oct4 Sox2 Nanog Tcf3 chr3 12753775
3-127537832 chr3 12753700 mmu-mir-367 C+) 10 0-127537941 (W) 0 <5kb manual Oct4 Sox2 Nanog Tcf3 chr3 15747956
8-157479648 chr3 15746957 mmu-mιr-186 (+) 25 9-157469779 NFEW CpG 0 GENIC Cons - chr4 36834044 Chr4 37072975- mmu-mιr-876 36834124 (-) 0 37074125 N 0 chr4 36857181 chr4 37072975- mmu-mιr-873 36857257 (-) 0 37074125 N 0 chr440911586 chr4 40911103- mmu-mιr-207 40911671 (+) 30 40911303 FENW CpG 0 GENIC <5kb chr4 56989276 chr4 57041383- mmu-mιr-32 56989357 (-) 24 57041583 EFNW CpG 1 GENIC Cons Tcf3
miRNA H3K4m IntervenGENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3 chr4 62743218 chr4 62701672- mmu-mιr-455 62743295 (+) 20 62701872 EFN CpG 0 GENIC Suz12 K27me3 chr4 87593278 chr4 87565760- mmu-mιr-491 37593353 (+) 25 87565960 NFEW CpG 0 GENIC Cons
Chr4 88381798 chr4 88399300- mmu-mιr-31 88381876 (-) 0 88401975 F 0 Oct4 Sox2 chr4 94157175 Ch r4 94106437- mmu-mir-872 94157255 (+) 20 94106637 FNEW CpG 0 GENIC chr4 10084487
9-100844955 (- chr4 10085406 EST AI mmu-mιr-101 a ) 10 0-100854260 NEFW CpG 0 853970 chr4 12026717
1-120267251 (- chr4 12032314 EST BI 90 mmu-mιr-30c-1 ) 14 6-120323346 EFNW CpG 1 851768 Cons chr4 12026717
1-120267251 (- chr4 12032911 mmu-mir-30c-1 ) 18 0-120329310 known 2 GENIC chr4 12027024
3-120270326 (- chr4 12032314 EST BI mmu-mιr-30e ) 14 6-120323346 EFNW CpG 1 851768 Cons chr4 12027024
3-120270326 (- chr4 12032911 mmu-mιr-30e ) 18 0-120329310 known 2 GENIC chr4 14891226
3-148912348 chr4 14889045 mmu-mιr-34a (+) 0 0-148895975 EFNW CpG 0 K27me3 chr4 15489770 EST B
6-154897784 (- chr4 15490310 Q71586 mmu-mιr-429 ) 15 9-154903309 EW 0 4 Cons Sox2
mi RNA H3K4m interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mal seived Oct4 Sox2 Nanog Tcf3 Suz12 me3 chr4 15489870 EST B 1-154898779 (- chr4 15490310 Q71586 mmu-mir-200a ) 15 9-154903309 EW 0 4 Cons Sox2 chr4 15489947 EST B 5-154899551 (- chr4 15490310 Q71586 mmu-mιr-200b ) 15 9-154903309 EW 0 4 Cons Sox2 chr5 9381710- chr5 9272032- mmu-mir-879 9381785 (+) 19 9272232 EFNW CpG 1 GENIC Suz12 K27me3 chr5 24102187 chr5 24099189- EST BY mmu-mιr-671 24102267 (+) 19 24099389 FNEW 1 764410 <5kb chr5 48512204 chr5 48271302- mmu-mιr-218-1 48512287 (+) 19 48271502 FEW CpG GENIC Nanog Suz12 K27me3 chr5 65249453 chr5 65249108- O mmu-mιr-574 65249530 (+) 35 65249308 FENW CpG GENIC <5kb Cons Oct4 Sox2 chr5 13839511
0-138395189 (- chr5 13840155 mmu-mιr-25 ) 20 0-138401750 ENFW CpG GENIC chr5 13839531
4-138395394 (- chr5 13840155 mmu-mιr-93 ) 20 0-138401750 ENFW CpG GENIC chr5 13839552
5-138395605 (- chr5 13840155 mmu-mιr-106b ) 20 0-138401750 ENFW CpG GENIC chr5 13962330
5-139623382 (- chr5 13962430 mmu-mιr-339 ) 10 0-139627075 EFNW CpG <5kb chr5 13962330 5-139623382 (- chr5 13971407 mmu-mιr-339 ) 22 6-139714276 EFNW CpG GENIC Cons
imiRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name jposition score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3 ch r6 3671301- chrβ 3714613- mmu-mιr-653 3671385 (-) 20 3714813 EW 0 GENIC Suz12 K27me3 chr6 3671912- chr6 3714613- mmu-mιr-489 3671991 (-) 20 3714813 EW 0 GENIC Suz12 K27me3 chr6 27886673 chr6 28084279- mmu-mιr-592 27886752 (-) 20 28084479 EWNF 0 GENIC Suz12 K27me3 chr6 28972624 chr6 28970400- mmu-mιr-129-1 28972707 (+) 10 28972325 E 0 <5kb Oct4 Sox2 Nanog chr6 30115922 chrβ 30114875- mmu-mιr-182 30116006 (-) 10 30130825 EFW CpG 0 <5kb Oct4 Sox2 Nanog Tcf3 chr6 30119465 chrβ 30114875- mmu-mir-96 30119548 (-) 10 30130825 EFW CpG 0 <5kb Oct4 Sox2 Nanog Tcf3 chr6 30119671 chr6 30114875- mmu-mιr-183 30119752 (-) 10 30130825 EFW CpG 0 <5kb Oct4 Sox2 Nanog Tcf3 ch r6 30691314 chrβ 30683456- EST CF mmu-mιr-335 30691395 (+) 9 30683656 EW 1 750324 K27me3 chr6 30691314 chr6 30687962- mmu-mιr-335 30691395 (+) 25 30688162 NEFW CpG 0 GENIC Cons chr6 30992826 chrβ 31006975- mmu-mιr-29a 30992905 (-) 0 31008175 F 0 Sox2 chr6 30993178 Chr6 31006975- mmu-mιr-29b-1 30993262 (-) 0 31008175 F 0 Sox2 chr6 36351905 chrβ 36318190- EST CB mmu-mιr-490 36351983 (+) 15 36318390 FEW 0 598620 Cons Suz12 K27me3 chr6 51199405 chrβ 51198300- mmu-mιr-148a 51199484 (-) 15 51201975 FENW CpG 0 <5kb Cons chr6 52159666 chrβ 52159743- EST CJ mmu-mir-196b 52159743 (-) 10 52171275 FENW CpG 0 054740 Suz12 K27me3
mi RNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3
EST BB 612857, AV4701 66.CK3 30690, chrδ 12468352 BM122 6-124683608 (- chr6 12468315 485,BX mmu-mιr-141 20 1-124685075 EW 0 633058 <5kb Oct4 -
EST BB
612857,
AV4701
66.CK3
30690, chr6 12468393 BM 122 κ> 3-124684015 (- chr6 12468315 485,BX mmu-mιr-200c ) 20 1-124685075 EW 633058 <5kb Oct4 chr6 13636462 1-136364706 (- chr6 13642762 mmu-mιr-220 ) 5-136429100 Oct4 Sox2 ch r7 3218627- chr7 3218001-
3218709 3219675 mmu-mir-290 (+)[mm9] 10 [mm9] W <5kb Cons Oct4 Sox2 Nanog chr7 3218920- chr7 3218001-
3219001 3219675 mmu-mιr-291a (+)[mm9] 10 [mm9] W <5kb Cons Oct4 Sox2 Nanog chr7 3219190- chr7 3218001-
3219271 3219675 mmu-mιr-292 (+)[mm9] 10 [mm9] W <5kb Cons Oct4 Sox2 Nanog chr7 3219483- chr7 3218001-
3219561 3219675 mmu-mir-291 b (+)[mm9] 10 [mm9] W <5kb Cons Oct4 Sox2 Nanog
imiRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mai served Oct4 Sox2 Nanog Tcf3 Suz12 me3 chr7 3220344- chr7 3218001-
3220423 3219675 mmu-mιr-293 (+)[mm9] 10 [mm9] W 0 <5kb Cons Oct4 Sox2 Nanog chr7 3220642- chr7 3218001-
3220725 3219675 mmu-mιr-294 (+)[mm9] 10 [mm9] W 0 <5kb Cons Oct4 Sox2 Nanog chr7 3220774- chr7 3218001 -
3220842 3219675 mmu-mιr-295 (+)[mm9] 10 [mm9] W 0 <5kb Cons Oct4 Sox2 Nanog chr7 18339998 chr7 18335127- EST BI mmu-mιr-330 18340079 (+) 18335327 EW 739116
EST AI
643291 , chr7 18339998 chr7 18339354- AA5602 mrnu-mιr-330 18340079 (+) 25 18339554 EWF 84 <5kb Cons chr7 18545165 chr7 18540460- mmu-mιr-343 18545239 (+) 20 18540660 FNEW CpG GENIC Oct4 Sox2 chr7 44989794 chr7 44988600- EST EH mmu-mιr-150 44989870 (+) 20 44989794 EFWN CpG 0 110210 <5kb Oct4 Sox2 chr7 61756723 chr7 61818450- mmu-mιr-344-1 61756802 (-) 0 61820325 N 0 chr7 61818977 chr7 61818450- mmu-mιr-344-2 61819056 (-) 10 61820325 N 0 <5kb chr7 64084771 chr7 64032777- mmu-mιr-211 64084848 (+) 20 64032977 known GENIC Nanog Tcf3 chr7 78761798 chr7 78755900- mmu-mιr-7a-2 78761879 (+) 0 78757275 EW CpG K27me3
miRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3
EST BB 622683, chr7 79378782 chr7 79373483- BM950 mmu-mιr-9-3 79378862 (+) 9 79373683 EWN CpG 1 091 . . . . Suz12 K27me3 chr7 79378782 chr7 79377250- mmu-mιr-9-3 79378862 (+) 10 79382900 ENW CpG <5kb Suz12 K27me3 chr7 96124632 chr7 96046335- mmu-mιr-708 96124719 (+) 18 96046535 known GENIC Oct4 Sox2 Nanog
EST CJ
101608,
CJ0993
52,CJO
98392,
CJ0985 chr7 96124632 chr7 96085979- 41 ,CJO mmu-mιr-708 96124719 (+) 15 96086179 EFNW CpG 98426 Cons Oct4 Sox2 Nanog Suz12 K27me3 chr7 99426360 chr7 99409468- mmu-mιr-326 99426441 (+) 24 99409668 ENFW CpG GENIC Cons Oct4 K27me3 chr7 99426360 chr7 99425150- mmu-mιr-326 99426441 (+) 10 99427125 E <5kb Chr7 10134919 3-101349271 chr7 10129541 mmu-mιr-139 (+) 24 8-101295618 E GENIC Cons Tcf3 K27me3 chr7 10134919
3-101349271 chr7 10132503 EST BY mmu-mιr-139 (+) 10 0-101325230 NEFW CpG 282009 Tcf3 K27me3
miRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3 chr7 13980900 5-139809082 (- chr7 13962197 mmu-mir-202 0 5-139823300 F Tcf3 chr7 14107271 EST A 7-141072796 (- chr7 14107247 W4561 mmu-mιr-210 ) 10 6-141073750 EFNW 89 Tcf3 chr7 14246431 0-142464389 (- chr7 14246877 mmu-mιr-483 ) 20 9-142468979 FEW CpG GENIC Nanog Tcf3 Suz12 K27me3 mmu-mir- chrδ 24608122 chrδ 24440268- 486 os 24608205 (+) 15 24440488 EW CpG 5 GENIC Oct4 - - Tcf3 Suz12 K27me3 chrδ 39720645 chrδ 40129925- mmu-mιr-383 39720726 (-) 20 40130125 known 0 GENIC - - - Tcf3 Suz12 K27me3 chrδ 87068814 chrδ 87069525- mmu-mιr-181d 87068896 (-) 10 δ7071900 EW 0 <5kb Oct4 Sox2 Nanog Ul chrδ 87068814 chrδ 87075222- EST CJ mmu-mιr-181d 87068896 (-) 9 87075422 EFNW 1 191375 Oct4 Sox2 Nanog chr8 87068980 chrδ 87069525- mmu-mιr-181c 87069060 (-) 10 87071900 EW 0 <5kb Oct4 Sox2 Nanog chrδ 6706δ9δ0 chrδ 87075222- EST CJ mmu-mιr-181c 67069060 (-) 9 87075422 EFNW 1 191375 Oct4 Sox2 Nanog chrδ 6709δ624 chrδ 87086300- mmu-mιr-23a 87098698 (+) 0 67095525 EFNW CpG 0 Oct4 Sox2 - chrδ δ709δ7δ0 chrδ 87086300- mmu-mιr-27a δ709δδ62 (+) 0 87095525 EFNW CpG 0 Oct4 Sox2 - chrδ δ709δ934 chrδ 87086300- mmu-mιr-24-2 67099011 (+) 0 δ7095525 EFNW CpG 0 Oct4 Sox2
miRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3 chrδ 97213431 chrδ 97212625- mmu-mιr-138-2 97213519 (+) 10 97214300 <5kb non- chrδ 10819749 overlap
8-108197580 (- chr8 10821355 gene Lr mmu-mιr-328 ) -15 0-108217600 FNEW CpG 0 rc29 chrδ 11044036
8-110440448 chrδ 11032542 mmu-mιr-140 (+) 23 6-110325626 EFNW CpG 2 GENIC Cons
EST B
G91924
4.CN71
7091 , B chrδ 11044036 Y02182
8-110440448 chrδ 11043411 3.BY01 mmu-mιr-140 (+) 10 4-110434314 FE 0 9354 chr9 21246890 chr9 21244900- mmu-mιr-199a-1 21246971 (-) 10 2124δδ75 FE 0 <5kb chr9 41282496 chr9 41225884- EST BB mmu-mιr-100 41282572 (+) δ 41226084 F 2 657503 chr9 41287791 chr9 41225884- EST BB mmu-let-7a-2 41287877 (+) δ 41226064 F 2 657503 chr9 41332999 chr9 41225884- EST BB mrnu-mιr-125b-1 41333080 (+) 7 41226084 F 3 657503 Tcf3
EST BY
711083, chr9 41332999 chr9 41332269- BB6262 mmu-mιr-125b-1 413330δ0 (+) 25 41332469 FNW 0 03 <5kb Cons
miRNA H3K4m IntervenGENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3 chr9 50855264 chr9 50855701- EST BY mmu-mιr-34c 50855338 (-) 20 50858400 FEW 0 100164 <5kb Oct4 - - K27me3 chr9 50855794 chr9 50855701- EST BY mmu-mιr-34b 50855872 (-) 20 50858400 FEW 0 100164 <5kb Oct4 - K27me3 chr9 67035798 chr9 67338659- mmu-mιr-190 67035875 (-) 19 67338859 known 1 GENIC _ _ chr9 89600146 chr9 89676975- mmu-mιr-184 89600226 (-) 0 89678850 EW 0 - Suz12 K27me3 chr9 10601223
1-106012310 chr9 10600668 EST BB mmu-mιr-135a-1 (+) 10 7-106006887 EW 0 655395 - - K27me3 chr9 10603693
7-106037035 chr9 10602892 mmu-let-7g (+) 25 9-106029129 EFNW CpG 0 GENIC Cons Oct4 Sox2 Nanog - - chr9 10842641
8-108426501 chr9 10842432 EST EH mmu-mιr-191 (+) 20 6-108426775 EFNW CpG 0 096264 <5kb Oct4 Sox2 - - chr9 10842688
2-108426964 chr9 10842432 EST EH mmu-mir-425 (+) 20 6-108426775 EFNW CpG 0 096264 <5kb Oct4 Sox2 _ chr9 11196352
3-111963600 (- chr9 11207963 mmu-mιr-128b ) 20 0-112079830 ENW 0 GENIC - Tcf3 Suz12 K27me3 chr9 11888049
9-118880579 chr9 11877506 mmu-mιr-26a-3 (+) 24 9-118775269 EFNW CpG 1 GENIC Cons - - K27me3 chr9 12253158 non-
6-122531667 chr9 12241655 overlap mmu-mιr-138-1 (+) -10 0-122422625 EWF CpG 0 EST Suz12 K27me3
miRNA H3K4m Interven- GENE/ Proxi- ConH3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3 non- overlap chr10 4282386 gene 53
0-42823951 ChMO 4270870 30439J mmu-mir-297a-7 I+) -10 0-42711800 EW 0 01 Rik Suz12 K27me3 non- chii 0 9150189 ChM O 9159112 overlap mmu-mιr-135a-2 3-91501972 (-) -10 5-91593550 FEW EST Oct4 Sox2 Nanog chii O 9339357 ChMO 9344377 mmu-mιr-331 2-93393647 (-) 0 5-93445175 F
ChM O 1223886
92-122388789 ChM O 1223852 mmu-let-7i (-) 10 50-122391850 FNEW <5kb
ChiiO 1263984
82-126398561 ChM O 1263815 00 mmu-mιr-26a-2 {+) 24 65-126381765 EFNW GENlC Cons Oct4 Sox2 Nanog non-
ChM 1 2864619 overlap
4-28646275 chM 1 2848217 gene C mmu-mιr-216b (+) -10 5-28485250 EFNW CpG cdc85a Sox2 K27me3 non- chM 1 2865700 overlap
8-28657089 ChM 1 2848217 gene C mmu-mir-216a (+) -10 5-28485250 EFNW CpG cdc85a Sox2 K27me3 non-
ChM 1 2866375 overlap
1-28663829 chM 1 2848217 gene C mmu-mιr-217 (+) -10 5-28485250 EFNW CpG cdc85a Sox2 K27me3
ChM 1 3546025
3-35460336 chM 1 3496478 mmu-mιr-218-2 (+) 23 0-34964980 known CpG GENIC Cons Suz12 K27me3
iTiiRNA H3K4m IntervenGENE/ ProxiConH3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3 chM 1 3562582
I -35625902 chii 1 3561279 mmu-mιr-103-1 (+) 25 7-35612997 EFNW CpG 0 GENIC Cons - - - - - - chr11 4321781 ChM 1 4322720 mmu-mιr-146a 2-43217889 (-) 0-43230425 NFEW Oct4 Nanog - - K27me3 ChM 1 4991313 3-49913217 ChM 1 4986865 mmu-mιr-340 (+) 18 3-49868853 EFNW CpG GENIC Suz12 K27me3
ChM 1 6555092 ChM 1 6560439 mmu-mιr-744 8-65551025 (-) 25 2-65604592 FNEW CpG GENIC Cons Oct4 ChM 1 6982824 5-69828321 ChM 1 6981671 mmu-mιr-324 (+) 20 1-69816911 EFNW CpG GENIC Sox2
ChM 1 7005091
5-70050996 chM 1 7004812 mmu-mιr-497 (+) 15 5-70050450 FENW <5kb Cons
ChM 1 7005124
7-70051327 chM 1 7004812 mmu-mir-195 (+) 15 5-70050450 FENW <5kb Cons
ChM 1 7498958
8-74989669 ChM 1 7498865 EST BY mmu-mir-212 (+) 25 4-74988854 EFWN CpG 706166 <5kb Cons Oct4 Suz12 K27me3
ChM 1 7498987
2-74989949 ChM 1 7498865 EST BY mmu-mιr-132 (+) 25 4-74988854 EFWN CpG 706166 <5kb Cons Oct4 Suz12 K27me3
ChM 1 7527991
9-75279998 ChM 1 7527836 EST BF mmu-mir-22 (+) 20 8-75278568 FENW CpG 786183 <5kb Oct4 Sox2 Tcf3
ChM 1 7689427 ChM 1 7689453 mmu-mir-423 I -76894349 (-) 35 0-76894730 FENW CpG GENIC <5kb Cons
miRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3 chr11 7788919
6-77889271 chM 1 7788876 EST AA mmu-mιr-144 (+) 20 1-77888961 E 050063 <5kb Nanog chr11 7788937 non-
1-77889432 ChM 1 7788787 overlap mmu-mιr-451 (+) 0 5-77889475 E EST <5kb Nanog
ChM 1 7952816
0-79528235 ChM 1 7952617 mmu-mιr-193 (+) 10 5-79531800 EFNW CpG <5kb Suz12 K27me3
ChM 1 7954261
2-79542693 ChM 1 7952617 mmu-mιr-365-2 (+) 0 5-79531800 EFNW CpG Tcf3 Suz12 K27me3
ChM 1 8640026 chM 1 8649991 mmu-mιr-21 7-86400346 (-) 17 7-86500117 FNEW CpG GENIC
ChM 1 8692920 o o
4-86929281 chM 1 8692535 mmu-mιr-301a (+) 25 2-86925552 FNEW CpG GENIC Cons Sox2 Nanog
ChM 1 8757305 non-
2-87573130 chM 1 8757177 overlap mmu-mιr-142 <+) 0 5-87574325 E EST <5kb
ChM 1 9608126 non-
7-96081346 chM 1 9607562 overlap mmu-mιr-196a-1 (+) -10 5-96077750 EF EST - Suz12 K27me3 chM 1 9613326
6-96133348 ChM 1 9613072 mmu-mιr-1 Oa (+) 14 5-96131975 F CpG <5kb Cons Oct4 Suz12 K27me3 chM 1 9613326 EST B
6-96133348 chM 1 9613322 Q94435 mmu-mιr-10a <+) 20 5-96133266 FEW CpG 3 <5kb Oct4 Suz12 K27me3
ChM 1 9666648
1-96666559 chM 1 9666586 mmu-mιr-152 <+) 35 5-96666065 FENW CpG 0 GENIC <5kb Cons Oct4 Sox2 Nanog
miRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3 chr11 1198308 61-119830936 chr11 1198629 mmu-mιr-338 (-) 20 40-119863140 ENW CpG GENIC K27me3 chr12 1091064
37-109106519 chr121090024 mmu-mιr-342 (+) 20 26-109002626 EWFN CpG GENIC
EST BX 524158, chr12 1092847 AA1195 86-109284863 chii 2 1092843 00.AA1 mmu-mιr-345 (+) 22 53-109284553 NEFW 19501 <5kb Cons K27me3 chr12 1100115 non- 07-110011582 chii 2 1099874 overlap mmu-mιr-770 (+) -10 00-109991350 FEW EST chr12 1100198 non-
01-110019883 chii 2 1099874 overlap mmu-mιr-673 (+) -10 00-109991350 FEW EST chr12 1100280 non- 40-110028122 chii 2 1099874 overlap mmu-mιr-493 (+) -10 00-109991350 FEW EST chr12 1100336 non-
12-110033688 chii 2 1099874 overlap mmu-mιr-337 (+) -10 00-109991350 FEW EST chii 2 1100338 non-
82-110033960 chii 2 1099874 overlap mmu-mιr-540 C+) -10 00-109991350 FEW EST chr12 1100341 non-
30-110034204 chii 2 1099874 overlap mmu-mιr-665 (+) -10 00-109991350 FEW EST chr12 1100382 non-
56-110038339 chii 2 1099874 overlap mmu-mir-431 (+) -10 00-109991350 FEW EST
miRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST ma! served Oct4 Sox2 Nanog Tcf3 Suz12 me3 chr12 1100395 non- 26-110039619 chr12 1099874 overlap mmu-mιr-433 (+) -10 00-109991350 FEW EST chr12 1100406 non-
47-110040726 chii 2 1099874 overlap mmu-mιr-127 (+) -10 00-109991350 FEW EST chr12 1100423 non-
24-110042403 chii 2 1099874 overlap mmu-mιr-434 (+) -10 00-109991350 FEW EST chr12 1100431 non- 28-110043203 chii 2 1099874 overlap mmu-mir-136 (+) -10 00-109991350 FEW EST chr12 1100593
18-110059393 chii 2 1100434 mmu-mιr-341 (+) 0 00-110045125 N chr12 1100660 O K)
67-110066143 chr12 1100434 mmu-mir-370 (+) 0 00-110045125 N chr12 1101300 04-110130080 chii 2 1100434 mmu-mιr-882 (+) 0 00-110045125 N chr12 1101568
62-110156939 chii 2 1100434 mmu-mιr-379 (+) 0 00-110045125 N chr12 1101579
87-110158069 chii 2 1100434 mmu-mιr-411 (+) 0 00-110045125 N chr12 1101584
41-110158514 chii 2 1100434 mmu-mιr-299 1+) 0 00-110045125 N
Chr12 1101596 03-110159680 chii 2 1100434 mmu-mιr-380 (+) 0 00-110045125 N
πniRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3 chr12 1101603
20-110160395 chM 2 1100434 mmu-mιr-323 (+) 0 00-110045125 chr12 1101606
19-110160696 chM 2 1100434 mmu-mιr-758 (+) 0 00-110045125 chr12 1101613
00-110161379 ChM 2 1100434 mmu-mιr-329 <+) 0 00-110045125 chr12 1101631
32-110163205 chM 2 1100434 mmu-mιr-494 l+) 0 00-110045125 N chr12 1101649
02-110164978 ChM 2 1100434 mmu-mιr-666 (+) 0 00-110045125 N
ChM 2 1101650 O
65-110165142 chM 2 1100434 mmu-mιr-543 (+) 0 00-110045125
ChM 2 1101665
54-110166633 ChM 2 1100434 mmu-mir-495 (+) 0 00-110045125
ChM 2 1101705
30-110170607 chM 2 1100434 mmu-mιr-368 M 0 00-110045125 chr12 1101710
25-110171108 chM 2 1100434 mmu-mιr-654 (+) 0 00-110045125 N chr12 1101712
68-110171345 ChM 2 1100434 mmu-mιr-376b (+) 0 00-110045125 chr12 1101715
84-110171661 ChM 2 1100434 mmu-mιr-376a (+) 0 00-110045125 N
miRNA H3K4m interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST ma! served Oct4 Sox2 Nanog Tcf3 Suz12 me3 chr12 1101721
19-110172201 chr121100434 mmu-mιr-300 (+) 0 00-110045125 N chr12 1101746
26-110174708 chr121100434 mmu-mιr-381 (+) 00-110045125 N chr12 1101751
43-110175220 chr121100434 mmu-mιr-487b (+) 00-110045125 chr12 1101759
33-110176014 chr121100434 mmu-mιr-539 (+) 00-110045125 N chr12 1101771
32-110177209 chr121100434 mmu-mιr-544 (+) 00-110045125 chr12 1101815 O 4-
78-110181656 chr121100434 mmu-mιr-382 (+) 00-110045125 N chr12 1101819
42-110182022 chii 21100434 mmu-mιr-134 (+) 00-110045125 N chr12 1101827
07-110182783 chr121100434 mmu-mιr-485 (+) 00-110045125 N chr12 1101834
26-110183507 chr121100434 mmu-mιr-453 (+) 00-110045125 N chr12 1101862
35-110186312 chii21100434 mmu-mιr-154 (+) 00-110045125 N chr12 1101869
29-110187004 chii 21100434 mmu-mιr-496 (+) 00-110045125 N
miRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST ma! served Oct4 Sox2 Nanog Tcf3 Suz12 me3 chr12 1101883 12-110188391 ChM 2 1100434 mmu-mιr-377 <+) 0 00-110045125 chr12 1101902
19-110190304 ChM 2 1100434 mmu-mir-541 (+) 0 00-110045125 chr12 1101909
69-110191043 ChM 2 1100434 mmu-mιr-409 (+) 0 00-110045125 chr12 1101910
98-110191176 ChM 2 1100434 mmu-mir-412 (+) 0 00-110045125 chr12 1101912
28-110191303 ChM 2 1100434 mmu-mιr-369 I+) 0 00-110045125
ChM 2 1101915 O
26-110191601 ChM 2 1100434 mmu-mιr-410 l+) 0 00-110045125 N
ChM 2 1125786
87-112578766 ChM 2 1125770 mmu-mιr-203 <+) 15 75-112579650 FEW CpG <5kb Cons Suz12 K27me3
ChM 2 1176928
21-117692900 ChM 2 1169276 mmu-mιr-153 16 28-116927828 known CpG GENIC K27me3
ChM 3 4854795 ChM 3 4855177 mmu-let-7d 1-48548046 (-) 0 5-48553275
ChM 3 4854976 ChM 3 4855177 mmu-let-7f-1 0-48549857 (-) 10 5-48553275 <5kb
ChM 3 4855011 ChM 3 4855177 mmu-let-7a-1 5-48550207 (-) 10 5-48553275 <5kb
miRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CCppGG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3 chr13 5803274 chii 3 5807520 mmu-mιr-874 8-58032823 (-) 0 0-58078100 E Oct4 chr13 5840241 chii 3 5841286 mmu-mιr-7a-1 3-58402496 (-) 20 6-58413066 FNE CpG GENIC Tcf3
ChM 3 6330970
4-63309781 chrt 3 6324937 EST BI mmu-mιr-23b C+) 7 9-63249579 FENW CpG 904238 chr13 6330970
4-63309781 chii 3 6328405 EST BF mmu-mιr-23b (+) 9 4-63284254 EWFN 100375 Oct4 Sox2 Nanog chr13 6330993
0-63310012 chii 3 6324937 EST BI mmu-mιr-27b (+) 7 9-63249579 FENW CpG 904238 chr13 6330993 o
0-63310012 chii 3 6328405 EST BF mmu-mιr-27b (+) 9 4-63284254 EWFN 100375 Oct4 Sox2 Nanog chr13 6331042
5-63310504 chii 3 6324937 EST BI mmu-mιr-24-1 (+) 7 9-63249579 FENW CpG 904238 Tcf3 chr13 6331042
5-63310504 chii 3 6328405 EST BF mmu-mιr-24-1 (+) 9 4-63284254 EWFN 100375 Oct4 Sox2 Nanog chr13 8421201
3-84212092 chii 3 8421198 mmu-mιr-9-2 (+) 35 1-84212181 NEWF 0 GENIC <5kb Cons Oct4 Sox2 Nanog Suz12 K27me3 chr13 1104456
08-110445688 chii 3 1097749 mmu-mιr-582 (+) 17 40-109775140 known GENIC chr13 1141582
97-114158376 chr13 1141548 mmu-mιr-449b (+) 15 25-114157525 EFW CpG <5kb Cons Suz12 K27me3
miRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3 chr13 1141584 17-114158498 ChM 3 1141548 mmu-mιr-449a < +) 15 25-114157525 EFW CpG 0 <5kb Cons Suz12 K27me3 chr14 3372370
6-33723785 ChM 4 3364915 mmu-mιr-346 <+) 25 7-33649357 ENW CpG 0 GENIC Cons Suz12 K27me3 non- overlap chr144386932 ChM 4 4390945 gene Pt mmu-mιr-327 0-43869388 (-) ■10 0-43912875 EFW CpG 0 ger2 Oct4 Sox2 Nanog Suz12 K27me3 chr145390327 ChM 4 5392069 mmu-mιr-208 1-53903347 (-) 12 8-53920898 known 13 GENIC Cons chr145392990 ChM 4 5394852 mmu-mιr-208b 9-53929985 (-) 12 0-53948720 known 13 GENIC Cons O
ChM46058599 ChM 4 6060279 EST BI mmu-mιr-16-1 3-60586077 (-) 15 2-60602992 FNEW CpG 0 696529 Cons
ChM46058599 ChM 4 6063629 EST BY mmu-mιr-16-1 3-60586077 (-) 14 9-60636499 FNEW CpG 1 734056 Cons Nanog Suz12 K27me3
ChM46058613 ChM 4 6060279 EST BI mmu-mιr-15a 8-60586216 (-) 15 2-60602992 FNEW CpG 0 696529 Cons
ChM46058613 chM 4 6063629 EST BY mmu-mir-15a 8-60586216 (-) 14 9-60636499 FNEW CpG 1 734056 Cons Nanog Suz12 K27me3 ChM46268129 9-62681377 chM4 6255917 mmu-mιr-598 1+) 5-62563900 NFEW CpG
miRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3
ChM 4 6354477 non- 2-63544848 ChM 4 6354045 overlap mmu-mir-124-1 (+) 0-63546275 EFVV CpG 0 EST <5kb Oct4 - Nanog - Suz12 K27me3 non-
ChM 4 6917859 overlap
5-69178668 chM4 6917625 gene P mmu-mir-320 (+) 2 0-69180100 FNEW CpG 0 olr3d <5kb Cons - - - - -
ChM 4 1139255
11-113925589 chM4 1139213 rnmu-mιr-17 (+) 10 00-113927025 EFWN CpG <5kb Oct4
ChM 4 1139256
94-113925776 ChM 4 1139213 mmu-mιr-18a (+) 10 00-113927025 EFWN CpG <5kb Oct4
ChM 4 1139258
40-113925917 ChM 4 1139213 mmu-mιr-19a (+) 10 00-113927025 EFWN CpG <5kb Oct4 O 00
ChM 4 1139260
10-113926088 ChM 4 1139213 mmu-mιr-20a (+) 10 00-113927025 EFWN CpG <5kb Oct4
ChM 5 3560555 ChM 5 3569387 mmu-mir-875 9-35605636 (-) 0 5-35694875 N Sox2 non-
ChM 5 6816709 ChM 5 6819017 overlap mmu-mιr-30b 0-68167169 (-) -8 5-68194550 FNEW EST Cons chM 5 6817087 chM 5 6819274 EST BB mmu-mιr-30d 6-68170957 (-) 12 7-68192947 FNEW 0 645359 Cons
ChM 5 7308206 chM 5 7325036 mmu-mιr-151 6-73082141 (-) 21 8-73250568 NFEW CpG 4 GENIC Cons Tcf3
miRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3 chii 5 8202537 2-82025452 chr15 8197435 mmu-mιr-33 (+) 17 8-81974558 ENFW CpG 3 GENIC Oct4 Sox2 Tcf3 chr15 8553437 2-85534461 chii 5 8551512 mmu-let-7c-2 (+) 0 5-85516200 F 0 chr15 8553508 1-85535176 chii 5 8551512 mmu-let-7b (+) 0 5-85516200 F 0 chr15 1028013 89-102801467 chri 5 1027996 mmu-mιr-196a-2 l+) 10 50-102802750 F 0 <5kb Suz12 K27me3 chr15 1028429 49-102843034 chii 5 1028419 mmu-mιr-615 (+) 35 41-102842141 NEFW CpG 0 GENIC <5kb Cons Oct4 Sox2 Suz12 K27me3 chii 5 1031131 O 72-103113250 chri 5 1031008 mmu-mιr-148b (+) 25 32-103101032 EFNW 0 GENIC Cons chr16 1336310 1-13363179 chii 6 1335975 mmu-mιr-193b (+) 15 0-13363900 EFW CpG 0 <5kb Cons Nanog - Suz12 K27me3 chii 6 1336742 2-13367504 chii 6 1335975 mmu-mιr-365-1 (+) 6 0-13363900 EFW CpG 0 Cons Nanog Tcf3 Suz12 K27me3 non- overlap chr16 1407316 gene 49 3-14073239 chri 6 1407087 21513D mmu-mιr-484 {+) 2 5-14074575 FNEW CpG 0 23Rιk <5kb Cons Oct4 chii 6 1703762 chii 6 1703856 mmu-mir-130b 4-17037703 (-) 30 9-17038769 EFNW 0 GENIC <5kb
miRNA H3K4iτ Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3 chr16 1703797 Chii 6 1703856 mmu-mιr-301 b 2-17038049 (-) 30 9-17038769 EFNW GENIC <5kb
EST CJ
146827,
BY2580
10.BB8
42045, chr16 1824095 chii 6 1825739 BY2546 mmu-mιr-185 8-18241032 (-) 15 1-18257591 FENW CpG 22 Cons chii 6 1824095 chii 6 1826150 mmu-mιr-185 8-18241032 (-) 24 7-18261707 EFW CpG GENIC Cons chii 6 2474320 7-24743288 chri 6 2430873 mmu-mιr-28 <+) 20 8-24308938 known CpG 0 GENIC Sox2 chii 64355998 non-
7-43560069 chii 6 4352805 overlap mmu-mιr-568 (+) -10 0-43530275 EW 0 EST Oct4 Sox2 Nanog chr16 7748152 2-77481600 chii 6 7738287 EST BE mmu-mιr-99a (+) 8 5-77383075 NF 2 654839 chr16 7748152
2-77481600 chii 6 7747739 EST BB mmu-mιr-99a (+) 10 3-77477593 N 0 649986 chii 6 7748225 4-77482340 chii 6 7738287 EST BE mmu-let-7c-1 C+) 8 5-77383075 NF 2 654839 chii 6 7748225
4-77482340 chti 6 7747739 EST BB mmu-let-7c-1 (+) 10 3-77477593 N 0 649986
miRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3
ChM 6 7752886
1-77528941 ChM 6 7738287 EST BE mmu-mir-125b-2 (+) 7 5-77383075 NF 654839
ChM 6 7752886 1-77528941 ChM 6 7747739 EST BB mmu-mιr-125b-2 (+) 9 3-77477593 N 649986
ChM 6 7752886 1-77528941 ChM 6 7752552 mmu-mιr-125b-2 (+) 10 5-77528275 FE <5kb
ChM 6 8459672 5-84596805 ChM 6 8458447 mmu-mir-155 (+) 0 5-84588475 FENW CpG Nanog Suz12 K27me3
ChM 6 9325826 7-93258347 ChM 6 9325750 mmu-mιr-802 (+) 10 0-93258650 N <5kb
ChM 7 1753480 2-17534881 ChM 7 1753072 EST BU mmu-mir-99b (+) 20 6-17533550 FENW 938247 <5kb
ChM 7 1753497 4-17535060 ChM 7 1753072 EST BU mmu-let-7e (+) 20 6-17533550 FENW 938247 <5kb
ChM 7 1753542 5-17535504 ChM 7 1753072 EST BU mmu-mιr-125a (+) 20 6-17533550 FENW 938247 <5kb
EST BY
242659,
ChM 7 3363545 ChM 7 3363452 BY2966 mmu-mιr-219-1 4-33635536 (-) 20 6-33636000 EFNW CpG 08 <5kb
ChM 7 3556877 ChM 7 3557770 mmu-mιr-877-1 6-35568877 (-) 20 3-35577903 EFNW CpG GENIC Oct4
miRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3 chr17 5587671 non-
6-55876795 chri 7 5587447 overlap mmu-rnιr-7b (+) 6 5-55876575 EW 0 EST <5kb Cons Oct4 chri δ 1078290 chii 8 1084872 mmu-mιr-133a-1 1-10782978 (-) 0 5-10849950 chii 8 1078547 chii 8 1084872 mmu-rnιr-1-2 3-10785553 (-) 0 5-10849950 ch r18 2457211 chri 8 2460840 mmu-mιr-187 0-24572187 (-) 2 0-24611775 EFNW CpG Cons Suz12 K27me3 chii 8 6152319 chii 8 6152570 mmu-mιr-378 7-61523274 (-) 35 0-61525900 FENW CpG GENIC <5kb Cons Oct4 Tcf3 chii 8 6177318 chii 8 6177326 EST AI mmu-mιr-145 7-61773267 (-) 20 7-61775675 429562 <5kb Oct4 Sox2 chii 8 6177455 chti 8 6177326 EST AI mmu-mιr-143 7-61774633 (-) 20 7-61775675 429562 <5kb Oct4 Sox2 Tcf3 chii 8 6537422
5-65374302 chii 8 6520805 mmu-mιr-122 I+) 0 0-65210300 CpG non- chii 9 6264648 chii 9 6247500- overlap mmu-mιr-194-2 6264724 (+) -10 6249425 EST Oct4 non-
Chii 9 6264847 chii 9 6247500- overlap mmu-mir-192 6264931 (+) -10 6249425 EST Oct4 chii 9 2281769
7-22817776 chii 9 2220611 mmu-mιr-204 I +) 20 3-22206313 known CpG GENIC
miRNA H3K4m IntervenGENE/ Proxi on- H3K27 name position score TSS position e3 CpG ing Sites EST mmaa!l served Oct4 Sox2 Nanog Tcf3 Suz12 me3 chr19 2920129
0-29201366 chr19 2916727 mmu-mιr-101 b (+) 19 1-29167471 FNEW CpG 1 GENIC - Sox2 - chii 9 3488668 chr19 3494535 mmu-mιr-107 7-34886768 (-) 20 2-34945552 NFEW 0 GENIC non- chii 9 4639609 overlap
1-46396170 chri 94639010 gene C mmu-mιr-146b (+) 0 0-46393375 ENFW CpG 0 uedc2 <5kb Nanog chrX 6394644- chrX 6476159- EST BB mmu-mιr-500 6394723 (-) 10 6476359 FNEW 0 655615 chrX 6398219- chrX 6476159- EST BB mmu-mιr-501 6398296 (-) 10 6476359 FNEW 0 655615 W chrX 6398933- chrX 6476159- EST BB mmu-mir-362 6399011 (-) 10 6476359 FNEW 0 655615 chrX 6404941- chrX 6476159- EST BB mmu-mιr-188 6405020 (-) 10 6476359 FNEW 0 655615 chrX 6405368- ChrX 6476159- EST BB mmu-mιr-532 6405446 (-) 10 6476359 FNEW 0 655615 non- chrX 1830325 chrX 18319475 overlap mmu-mιr-221 6-18303338 (-) -10 18326525 FN 0 EST non- chrX 1830385 chrX 18319475 overlap mmu-mιr-222 2-18303929 (-) -10 18326525 FN 0 EST
miRNA H3K4m IntervenGENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST ma! served Oct4 Sox2 Nanog Tcf3 non- chrX 4898631 chrX 48985925- overlap mmu-mιr-363 4-48986398 (-) 0 48991150 EWF 0 EST <5kb Oct4 Sox2 Nanog Tcf3 non- chrX 4898647 chrX 48985925- overlap mmu-mιr-92a 1-48986549 (-) 0 48991150 EWF 0 EST <5kb Oct4 Sox2 Nanog Tcf3 non- chrX 4898660 chrX 48985925 overlap mmu-mir-19b 8-48986693 (-) 0 48991150 EWF 0 EST <5kb Oct4 Sox2 Nanog Tcf3 non- chrX 4898673 chrX 48985925 overlap mmu-mir-20b 9-48986818 (-) 0 48991150 EWF 0 EST <5kb Oct4 Sox2 Nanog Tcf3 non- chrX 4898695 chrX 48985925 overlap mmu-mιr-18b 8-48987040 (-) 0 48991150 EWF 0 EST <5kb Oct4 Sox2 Nanog Tcf3 non- chrX 4898712 chrX 48985925- overlap mmu-mιr-106a 3-48987200 (-) 0 48991150 EWF 0 EST <5kb Oct4 Sox2 Nanog Tcf3 chrX 4929262 chrX 49296350- mmu-mιr-450b 5-49292702 (-) 0 49298075 E 0 - - - - chrX 4929278 chrX 49296350- mmu-mιr-450a-1 8-49292864 (-) 0 49298075 E 0 - - - - chrX 4929292 chrX 49296350 mmu-m!r-450a-2 1-49292997 (-) 0 49298075 E 0 - - - chrX 4929403 chrX 49296350- mmu-mιr-542 2-49294110 (-) 10 49298075 E 0 <5kb - - _ chrX 4929789 chrX 49296350 mmu-mir-351 3-49297975 (-) 10 49298075 E 0 <5kb - - - - non- chrX 4929860 chrX 49298875- overlap mmu-mιr-503 4-49298686 (-) 0 49304275 FNEW 0 EST <5kb _ _ _ Suz12 K27me3
miRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3 non- chrX 4929888 chrX 49298875- overlap mmu-mιr-322 7-49298964 (-) 0 49304275 FNEW 0 EST <5kb Suz12 K27me3
EST B
Q18072 chrX 5544444 chrX 55479898- 8.CJ06 mmu- ■mιr-504 2-55444520 (-) 10 55480098 ENFW 0 4508 Suz12 chrX 5544444 chrX 55597651- mmu- mιr-504 2-55444520 (-) 20 55597851 known 0 GENIC chrX 5674107 chrX 56750496- mmu- mir-505 6-56741152 (-) 20 56750696 FNEW 0 GENIC chrX 6308331 chrX 63325125- mmu- mιr-465d 0-63083388 (-) 0 63326200 N 0 chrX 6308662 chrX 63325125- mmu- ■mιr-465c-1 0-63086700 (-) 0 63326200 N 0 chrX 6308986 chrX 63325125- mmu- mιr-465b-1 7-63089945 (-) 0 63326200 N 0 chrX 6309318 chrX 63325125- mmu- mιr-465c-2 2-63093262 (-) 0 63326200 N 0 chrX 6309642 chrX 63325125- mmu- -mιr-465b-2 9-63096507 (-) 0 63326200 N 0 chrX 6309971 chrX 63325125- mmu- mιr-465a 8-63099796 (-) 0 63326200 N 0 chrX 6851374 chrX 68527341- mmu-i mιr-224 6-68513835 (-) 20 68527541 FEW 0 GENIC chrX 6851494 chrX 68527341- mmu-ι mιr-452 3-68515025 (-) 20 68527541 FEW 0 GENIC chrX 6884422 chrX 68908806- mmu- ■;mιr-105 0-68844299 (-) 19 68909006 known 1 GENIC non- chrX 9244555 overlap 2-92445634 chrX 92297875- gene IVl mmu-mιr-223 (+) -10 92303675 EFNW 0 sn
miRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3 non- overlap EST overlap chrX 9977564 chrX 99822432 in mmu-mιr-421 0-99775715 (-) 99826775 FNEW 0 Human manual Oct4 Nanog Tcf3 non- overlap EST overlap chrX 9977579 chrX 99817575 in mmu-mιr-374 1-99775862 (-) 99820975 FNEW 0 Human manual Oct4 Sox2 Nanog Tcf3 chrX 1003189 00-100318979 chrX 10040373 mmu-mιr-672 (-) 20 6-100403936 EFNW GENIC Oct4 Sox2 Nanog Tcf3 chrX 1015470 non-
01-101547083 chrX 10159285 overlap mmu-mιr-384 (-) -10 0-101595950 N EST chrX 1015818
14-101581893 chrX 10159437 EST BY mmu-mιr-325 (-) 10 9-101594579 N 727371 chrX 1091917
61-109191842 chrX 10930235 mmu-mιr-361 (-) 20 9-109302559 NFEW GENIC Tcf3 chrX 1379853
78-137985458 ChrX 13792768 mmu-mιr-652 (+) 19 5-137927885 FNEW GENIC chrX 1424045
78-142404680 chrX 14220878 mmu-mιr-448 (+) 20 0-142208980 E GENIC Suz12 K27me3 chrX 1472530
59-147253150 chrX 14714389 mmu-let-7f-2 (+) 20 7-147144097 FNEW GENIC Nanog
miRNA H3K4m Interven- GENE/ Proxi- Con- H3K27 name position score TSS position e3 CpG ing Sites EST mal served Oct4 Sox2 Nanog Tcf3 Suz12 me3 chrX: 1472539
35-147254034 chrX: 14714389 mmu-mir-98 (+) 20 7-147144097 FNEW 0 GENIC - - Nanog -
Table S7: Human miRNA promoters and associated proteins and genomic features
PROMOTER SCORING BINDING miRNA H3K4 CpG IntervenProxi- Con- H3K27 name position score TSS position me3 Island ing Sites GENE/EST mal served Oct4 Suz12 me3 chrl 1142418- chri 1138325- hsa-mιr-200b 1142494 (+) 10 1140200 EL CpG 0 <5kb - - chrl 1143172- chri 1138325- hsa-mιr-200a 1143250 (+) 0 1140200 EL CpG 0 - - chii 1144313- chii 1138325- hsa-mιr-429 1144389 (+) 0 1140200 EL CpG 0 - - chri 3500421- ch ii 3711643- hsa-rnιr-551a 3500500 (-) -4 3713559 ELTB CpG 4 - ND chri 9146007- chii 9176563- hsa-mιr-34a 9146091 (-) 10 9176763 ETL CpG 0 EST DB286351 - ND chii 34804298- chii 35000217- hsa-mir-552 34804374 (-) -3 35001666 ELBT CpG 3 - K27 chii 40889126- chrl 40826357- hsa-mir-30e 40889209 (+) 19 40826557 EBLT CpG 1 GENIC - - chrl 40889126- chri 40844119- hsa-mir-30e 40889209 (+) 10 40844319 EBTL - 0 EST BQ954189 Oct4 ND chrl 40892055- chri 40826357- hsa-rnιr-30c-1 40892135 (+) 19 40826557 EBLT CpG 1 GENIC - - chrl 40892055- ch ii 40844119- hsa-mιr-30c-1 40892135 (+) 10 40844319 EBTL - 0 EST BQ954189 Oct4 ND ch ii 65236136- chri 65245334- hsa-mιr-101-1 65236212 (-) 10 65245534 ELBT - 0 EST BP357933 - ND chii 71245336- ch ii 71258666- hsa-mιr-186 71245416 (-) 20 71258866 EBLT CpG 0 GENIC - - ch ii 94024409- chri 94022989- non-overlap hsa-mιr-760 94024488 (+) 0 94026240 LTEB CpG 0 gene BCAR3 <5kb - ND chri 98223658- chri 98223693- hsa-mιr-137 98223736 (-) 20 98223893 ELT CpG 0 EST DA655546 <5kb Oct4 Suz12 K27 chrl 100458823 ch ii 10044368 hsa-mir-553 100458888 (+) 20 3-100443883 ELBT CpG 0 GENIC - -
miRNA H3K4 CpG Interven- Proxi- Con- H3K27 name position score TSS position me3 Island ing Sites GENE/EST mal served Oct4 Suz12 me3 chri 109853555 chM 10978718 hsa-mir-197 109853635 (+) -1 9-109787712 BT - 1 ND end 148331350 chM 14832575 hsa-mιr-554 148331434 (+) 20 3-148325953 ELT CpG 0 GENIC K27
ChM 150979214 chM 15102352 hsa-mιr-190b 150979292 (-) -1 2-151024306 L - 1 ND
ChM 151978050 chM 15197898 EST AW60614 hsa-mιr-92b 151978132 (+) 20 6-151979186 EBLT - 0 3.BX349899 <5kb chM 152129219 ChM 15234512 hsa-mιr-555 152129309 (-) 20 6-152345326 TELB CpG 0 GENIC
ChM 153203208 ch M 15320423 hsa-mιr-9-1 153203289 (-) 20 0-153204430 ETL CpG 0 EST BM691385 <5kb Suz12 K27 chM 153203208 ChM 15321219 hsa-mιr-9-1 153203289 (-) 19 4-153212394 known - 1 GENIC Suz12 K27
ChM 159043999 chM 15877113 hsa-mir-556 159044078 (+) 19 8-158771338 known CpG 1 GENIC K27 chM 165076440 ChM 16504942 hsa-mιr-557 165076512 (+) 0 8-165049627 T - 0 ND
ChM 168839603 ChM 16888162 hsa-mιr-214 168839685 (-) 0 8-168882427 TB - 0 ND hsa-mιr-199a- ChM 168845341 chM 16888162
2 168845421 (-) 0 8-168882427 TB - 0 ND chM 173730157 chM 17386558 hsa-mιr-488 173730235 (-) 20 1-173865781 E - 0 GENIC Suz12 K27 hsa-mιr-181b- ChM 195559662 chM 19563810
1 195559743 (-) 7 5-195638305 BTL - 3 EST DA528985 ND hsa-mιr-181a- ChM 195559845 chM 19563810
2 195559926 (-) 7 5-195638305 BTL - 3 EST DA528985 ND
ChM 202149098 chM 20215060 hsa-mir-135b 202149178 (-) 20 5-202150805 E - 0 EST CD698254 <5kb Oct4
mi FtNA H3K4 CpG IntervenProxiConH3K27 name position score TSS position me3 Island ing Sites GENE/EST mal served Oct4 Suz12 me3 chri 204363595 ChM 20442556 hsa-mιr-29c 204363672 (-) -1 6-204425765 T - 1 - - ND ch ii 204364180 ChM 20442556 hsa-mιr-29b-2 204364265 (-) -1 6-204425765 T - 1 ND chr1 205993896 ChM 20599076 hsa-mιr-205 205993974 (+) 10 6-205990966 E - 0 EST W68815 non-overlap
ChM 216679604 ch M 21683290 gene RAB3- hsa-mir-215 216679683 (-) -10 5-216835678 ELBT CpG 0 GAP150 K27 non-overlap
ChM 216679897 ChM 21683290 gene RAB3- hsa-mιr-194-1 216679974 (-) -10 5-216835678 ELBT CpG 0 GAP 150 K27 chr2 32668885- chr2 32493280- hsa-mιr-558 32668959 (+) 20 32493480 TELB CpG 0 GENIC chr2 47516470- chr2 47508001- hsa-mιr-559 47516552 (+) 20 47508201 EL CpG 0 GENIC K>
O chr2 56121760- chr2 56147588- hsa-mir-217 56121838 (-) 0 56148556 B - 0 ND ch r2 56127756- ch r2 56147588- hsa-mιr-216a 56127837 (-) 0 56148556 B - 0 ND chr2 56139500- ch r2 56147588- hsa-mιr-216b 56139581 (-) 0 56148556 B - 0 ND chr2 136256703 ch r2 13612267 hsa-mιr-128a 13(3256778 (+) 19 4-136122874 ELBT CpG 1 GENIC K27 ch r2 176840554 ch r2 17682709 hsa-mir-IOb 176840635 (+) 9 1-176827291 E CpG 1 EST BQ722165 Oct4 Suz12 ND chr2 188987740 chr2 18898180 hsa-mιr-561 188987816 (+) 20 2-188982002 ELB CpG 0 GENIC K27 ch r2 219092875 chr2 21908988 hsa-mir-26b 219092950 (+) 30 2-219090082 TLBE CpG 0 GENIC <5kb K27
miRNA H3K4 CpG Interven- Proxi- Con- H3K27 name position score TSS position me3 Island ing Sites GENE/EST mal served Oct4 Suz12 me3 chr2 219691865 ch r2 21969246 noπ-overiap hsa-mιr-375 219691941 (-) 7-219693466 TLE CpG 0 EST <5kb Suz12 K27 Ch r2 219984343 chr2 21998486 hsa-mιr-153-1 219984421 (-) 10 7-219985266 T CpG 0 <5kb Chr2 219984343 chr2 21999954 hsa-mir-153-1 219984421 (-) 18 8-219999748 E CpG 2 GENIC K27 chil? 232862875 chr2 23265169 hsa-mιr-562 232862957 (+) 20 6-232651896 ELBT _ 0 GENIC chr2 241115412 chr2 24109500 hsa-mir-149 241115493 (+) 18 4-241095204 EBTL GENIC K27 chr2 241115412 chr2 24111211 hsa-mιr-149 241115493 (+) 7-241112317 ELBT 1 EST AL518538
Chr2 241115412 chr2 24111508 hsa-mir-149 241115493 (+) 10 7-241115899 E - 0 <5kb K27 chr3 10411173- ch r3 10466446- hsa-mιr-885 10411246 (-) 19 10466646 known - 1 GENIC chr3 15890297- chr3 15876545- hsa-mιr-563 15890361 (+) 10 15876745 LEBT - 0 EST DA789490 ND Ch r3 35760976- chr3 35657414- hsa-mιr-128b 35761053 (+) 15 35657614 ELBT - 5 GENIC Suz12 K27 Ch r3 37985898- chr3 37878072- hsa-mιr-26a-1 37985978 (+) 24 37878272 TELB - 1 GENIC <5kb Cons chr3 37985898- ch r3 37980996- hsa-mιr-26a-1 37985978 (+) 10 37981196 L - 0 EST BQ650754 Chr344130720- chr344075053- hsa-mιr-138-1 44130801 (+) 0 44075252 T - 0 ND Ch r3 44878389- chr3 44878311- hsa-mιr-564 44878471 (+) 30 44878511 TELB CpG 0 GENIC <5kb chr349032586- ch r349030949- hsa-mιr-425 49032668 (-) 20 49031149 BELT CpG 0 EST DB044174 <5kb
miFiNA H3K4 CpG Interven- ProxiConH3K27 name position SsCcOolre TSS position me3 island ing Sites GENE/EST mal served Oct4 Suz12 me3
Ch r3 49032586- chr3 49178672 hsa-mιr-425 49032668 (-) 15 49178872 ELBT CpG 5 GENIC <5kb Cons chr349033058- ch r349030949- hsa-mιr-191 49033141 (-) 20 49031149 BELT CpG 0 EST DBO. ch r3 49033058- ch r3 49178672- hsa-mir-191 49033141 (-) 15 49178872 ELBT CpG 5 GENIC <5kb Cons chr3 50185768- chr3 50167751 hsa-mir-566 50185839 (+) 19 50167951 TELB CpG 1 GENIC K27 chr352277325- chr352287341- hsa-let-7g 52277423 (-) 23 52287541 TELB CpG 2 GENIC <5kb Cons ND hsa-mιr-135a- Ch r3 52303279- chr3 52307870-
1 52303358 (-) 5 52308065 (E) - 0 AI936688 manual chr3 113314343 chr3 11328782 hsa-mιr-567 113314420 (+) 20 8-113288028 ELBT CpG 0 GENIC chr3 115518017 ch r3 11558547 hsa-mιr-568 115518101 (-) 0 4-115586734 L - 0 ND K> κ> chr3 121597199 chr3 12165243 hsa-mιr-198 121597271 (-) 19 3-121652633 ELT CpG 1 GENIC ND ch r3 161605087 chr3 16160002 hsa-mιr-15b 161605166 (+) 25 5-161600225 ELBT CpG 0 GENIC <5kb Cons ch r3 161605234 chr3 16160002 hsa-mιr-16-2 161605318 (+) 25 5-161600225 ELBT CpG 0 GENIC <5kb Cons chr3 169752355 chr3 16971792 hsa-mιr-551b 169752434 (+) 4-169718474 L - 0 ND ch r3 172307160 chr3 17266072 hsa-mιr-569 172307243 (-) 0-172660920 known CDG 2 GENIC Oct4 Suz12 K27 chr3 189889274 ch r3 18941332 hsa-mιr-28 189889355 (+) 21 2-189413522 known GENIC ND
miRNA H3K4 CpG Interven- Proxi- Con- H3K27 name position score TSS position me3 Isiand ing Sites GENE/EST mal served Oct4 Suz12 me3 chr3196915379 chr3 19676326 hsa-mir-570 196915455 (+) 1-196763460 T - 0 Oct4 - ND chr4333950- chr4 321523- hsa-mιr-571 334036 (+) 20 321723 ELBT - 0 GENIC K27 chr48125100- chr4 8278530- hsa-mir-95 8125176 (-) 18 8278730 ELBT - 2 GENIC K27 chr411046738- chr4 10867741- hsa-mιr-572 11046809 (+) 10868135 B - 0 ND chr420206181- ch r4 19931052- hsa-mιr-218-1 20206264 (+) 21 19931252 known - - GENIC Oct4 Suz12 K27 chr424198089- ch r424259881- non-overlap hsa-mir-573 24198177 (-) -10 24263285 EBLT - 0 gene DHX15 chr438692233- chr4 38691902- hsa-mιr-574 38692310 (+) 30 38692102 EBLT CpG 0 GENIC <5kb chr484031674- chr4 84077004- hsa-mιr-575 84031765 (-) 20 84077204 EBTL CpG 0 GENIC Oct4 Suz12 K27 K) chr4110767464 chr4 11071237 hsa-mir-576 110767543 (+) 20 4-110712574 EBLT CpG 0 GENIC chr4113926627 chr4 11392650 hsa-mιr-367 113926706 (-) 20 5-113926705 E - 0 EST CD175459 <5kb Oct4 chr4113926756 chr4 11392650 hsa-mir-302d 113926836 (-) 20 5-113926705 E - 0 EST CD175459 <5kb 0ct4 chr4113926936 chr4 11392650 hsa-mιr-302a 113927016 (-) 20 5-113926705 E - 0 EST CD175459 <5kb Oct4 chr4113927116 ch r4 11392650 hsa-mιr-302c 113927195 (-) 20 5-113926705 E - 0 EST CD 175459 <5kb Oct4 chr4113927239 chr4 11392650 hsa-mιr-302b 113927317 (-) 20 5-113926705 E - 0 EST CD175459 <5kb Oct4 chr4115935524 chr4 11587711 hsa-mιr-577 115935600 (+) 20 5-115877315 ET CDG 0 GENIC Oct4
miRNA H3K4 CpG Interven- Proxi- Con- H3K27 name position score TSS position me3 lsland ing Sites GENE/EST mal served Oct4 Suz12 me3 chr4 166665012 chr4 16665759 hsa-mir-578 166665089 (+) 20 8-166657798 ELT - 0 GENIC _ _ K27 chr5 15988291- chrδ 15553204- hsa-mιr-887 15988369 (+) 21 15553404 known CpG - GENIC K27 chr5 32430245- chr5 32480501- hsa-mιr-579 32430324 (-) 20 32480701 LBET CpG 0 GENIC - - K27 chrδ 36183766- ch r5 36187672- hsa-mιr-580 36183836 (-) 20 36187872 EBLT CpG 0 GENIC - - chr5 53283101- chrδ 53642060- hsa-mir-581 53283181 (-) 20 63642260 known CpG 0 GENIC Oct4 - K27 chrδ 54502121- chrδ 54504660- hsa-mιr-449a 54502202 (-) 35 54504860 TLBE CpG 0 GENIC <5kb Cons - _ K27 chr5 54502242- chrδ 54604660- hsa-mιr-449b 54502322 (-) 35 54504860 TLBE CpG 0 GENIC <5kb Cons - - K27 chr5 59035203- chr5 59100079- EST DB290681 hsa-mιr-582 59035280 (-) 12 59100279 LET - 0 , Cons <δkb Cons - - ND
Chrδ 59035203- chrδ 59819547- hsa-mιr-582 59035280 (-) 19 59819747 known - 1 GENIC _ _ - chr5 87998429- chr5 87997094- hsa-mιr-9-2 87998508 (-) 15 88000044 BETL - 0 <5kb Cons Oct4 K27
Ch r5 87998429- chr5 88016256- hsa-mιr-9-2 87998508 (-) 88016456 E - 3 EST DA495096 Oct4 Suz12 ND chrS 95440603- chrδ 95323360- EST DA871062 hsa-mιr-583 95440682 (+) 95323560 LEBT - 1 .DB197007 _ ND hsa-mir- ch r5 135444076 chr5 13544370
100871 135444196 (-) 10 2-135444301 T CpG 0 <δkb _ _ ND ch r5 137011160 chrδ 13709967 hsa-mιr-874 137011237 (-) 21 7-137099777 BTEL - 1 GENIC <5kb Cons - - _ chr5 148422079 chrδ 14842278 hsa-mιr-584 148422160 (-) 27 0-148422980 known - 3 GENIC <5kb - - - ch r5 148788690 chrδ 14871707 non-overlap hsa-mιr-143 148788764 (+) -10 4-148719170 EBTL CpG 0 gene MGC3265 - - K27
miRNA H3K4 CpG Interven- Proxi- Con- H3K27 name position score TSS position me3 lsland ing Sites GENE/EST mal served Oct4 Suz12 me3 chr5 148790407 chr5 14871707 non-overlap hsa-mιr-145 148790487 (+) -10 4-148719170 EBTL CpG 0 gene MGC3265 K27 chr5 149092576 ch r5 14908994 hsa-mir-378 149092653 (+) 35 7-149090147 TBEL CpG 0 GENIC <5kb Cons K27 chr5 159844947 chr5 15982635 hsa-mιr-146a 159845024 (+) 1 1-159827231 B - 1 <5kb Cons ND chr5 159844947 ch r5 15982773 EST BQ425371 hsa-mιr-146a 159845024 (+) 12 1-159827931 TB - 0 , Cons <5kb Cons ND chr5 167920477 chr5 16793906 hsa-mιr-103-1 167920558 (-) 20 6-167939266 LEBT CpG 0 GENIC chr5 168127741 chr5 16866045 hsa-mιr-218-2 168127824 (-) 24 4-168660654 known CpG 1 GENIC <5kb Cons K27 chr5 168623188 ch r5 16866045 'Jl hsa-mir-585 168623259 (-) 25 4-168660654 EB CpG 0 GENIC <5kb Cons K27
Ch r5 179374914 chr5 17943161 hsa-mιr-340 179374998 (-) 20 5-179431815 ELBT - 0 GENIC K27 hsa-mιr-548a- chrB 18680008- chr6 18611704-
1 18680085 (+) 0 18612591 L - 0 ND chr6 30660078- chr630631312- hsa-mιr-877 30660180 (+) 19 30631512 EBLT CpG 1 GENIC <5kb Cons K27 non-overlap chr6 30660078- ch r6 30646454- gene ABCF1 , hsa-mir-877 30660180 (+) 6 30647922 TELB CpG 0 Cons GENIC <5kb Cons chr6 33283600- ch r6 33284241- hsa-mιr-219-1 33283682 (+) 20 33284441 ELBT CpG 0 EST BF310850 <5kb K27 chrβ 45273404- chr645453548- hsa-mir-586 45273480 (-) 18 45453748 ELBT CpG 2 GENIC K27
Chr6 52117110- chr6 52097779- hsa-mιr-206 52117189 (+) 0 52098124 L - 0 ND chr6 52121698- chro 52097779- hsa-mιr-133b 52121776 (+) 0 52098124 L - 0 ND
miRNA H3K4 CpG Interven- Proxi- Con- H3K27 name position score TSS position me3 lsland ing Sites GENE/EST mal served Oct4 Suz12 me3 chr6 72143378- chr6 72262454- hsa-mιr-30c-2 72143459 (-) -1 72262653 T - 1 ND
Ch r6 72169968- ch r6 72262454- hsa-mir-30a 72170050 (-) -1 72262653 T - 1 ND chr6 107338698 chr6 10733685 hsa-mιr-587 107338787 (+) 10 1-107343373 L - 0 <5kb ND chr6 119431916 chrδ 11944117 EST CF993701 hsa-mιr-548b 119431993 (-) 6-119441376 ETLB CpG 1 .AA764733 K27 chr6 119431916 chr6 11951195 hsa-mιr-548b 119431993 (-) 18 1-119512151 known - 2 GENIC ND σ> chr6 126847475 ch r6 12670250 non-overlap hsa-mir-588 126847552 (+) -10 3-126704881 BELT CpG 0 gene C6orf 173 hsa-mιr-548a- chr6 135602004 ch r6 13558685
2 135602082 (+) 0 4-135587453 T - 0 ND
Ch r7 835822- chr7 837465- hsa-mιr-339 835899 (-) 10 838561 B - 0 <5kb K27
EST DB484936
,CN261680,BM ch r7 835822- chr7 841139- 847149.CA406 hsa-mir-339 835899 (-) 9 841339 BETL CpG 1 979 chr7 835822- chr7 878643- hsa-mιr-339 835899 (-) 5 878843 T CpG 5 EST H 16643 ND chr7 835822- chr7 951034- hsa-mιr-339 835899 (-) 9 951234 TELB CpG 11 GENIC chr7 5308696- ch r7 5326540- hsa-mιr-589 5308776 (-) 20 5326740 ELBT CpG 0 GENIC chr7 25762772- ch r7 25761863- hsa-mιr-148a 25762851 (-) 10 25766516 LETB CpG 0 <5kb chr726982341 - chr7 26982409- hsa-mιr-196b 26982418 (-) 30 26982609 BE CpG 0 GENIC <5kb Suz12 K27
miRNA H3K4 CpG IntervenProxi- Con- H3K27 name position score TSS position me3 Island ing Sites GENE/EST mai served Oct4 Suz12 me3
EST BE54909S chr7 26982341 - chr7 26992772- ,BE866789,BG hsa-mιr-196b 26982418 (-) 8 26992972 LBE CpG 2 284372 Oct4 Suz12 K27
Chr7 30102661- chr7 30097062- hsa-mir-550-1 30102742 (+) 20 30097262 TE CpG 0 GENIC
Chr7 32545844- chr7 32308273- hsa-mιr-550-2 32545925 (+) 19 32308473 EBLT CpG 1 GENIC K27
Ch r7 32545844- chr7 32541624- hsa-mir-550-2 32545925 (+) 10 32541824 ELBT CpG 0 EST DB021271 ND chr7 73050184- chr7 73033240- hsa-rnιr-590 73050264 (+) 20 73033440 ELBT CpG 0 GENIC
Chr7 92756734- chr7 92848587- hsa-mιr-653 92756816 (-) 20 92848787 EL CpG 0 GENIC Suz12 K27 chι7 92757900- chr7 92848587- hsa-mιr-489 92757979 (-) 20 92848787 EL CpG 0 GENIC Suz12 K27
Chr7 95493632- chr7 95595956- hsa-rnιr-591 95493714 (-) 20 95596156 LEBT CpG 0 GENIC K)
-J chr7 99335835- chr7 99343978- hsa-mιr-25 99335914 (-) 20 99344178 EBLT CpG 0 GENIC
Ch r7 99336041- Ch r7 99343978- hsa-mιr-93 99336121 (-) 20 99344178 EBLT CpG 0 GENIC chr7 99336267- chr7 99343978- hsa-mιr-106b 99336347 (-) 20 99344178 EBLT CpG 0 GENIC chr7 126292105 ch r7 12647716 hsa-mir-592 126292184 (-} 20 1-126477361 known - 0 GENIC chr7 127315869 chr7 12718039 hsa-mιr-593 127315960 (+) -1 6-127180760 B - 1 ND chr7 127441870 ch r7 12743521 hsa-mir-129-1 127441954 (+) 0 7-127435965 ET CpG 0 Suz12 K27 chr7 129004187 chr7 12901126 hsa-mιr-182 129004271 (-) 0 3-129017013 ELBT CpG 0 Suz12 K27 chr7 129008479 chr7 12901126 hsa-mιr-96 129008562 (-) 10 3-129017013 ELBT CpG 0 <5kb Suz12 K27
miRNA H3K4 CpG interven- Proxi- Con- H3K27 name position score TSS position me3 Island ing Sites GENE/EST mal served Oct4 Suz12 me3 chr7 129008708 chr7 12901126 hsa-mιr-183 129008789 (-) 10 3-129017013 ELBT CpG <5kb Suz12 K27 chr7 129729908 chr7 12971989 hsa-mir-335 129729985 (+) 19 6-129720096 LTEB CpG GENIC chr7 129729908 chr712972583 hsa-mιr-335 129729985 (+) 10 8-129726036 ETL CpG 0 EST DA660801 Suz12 K27 chr7 130018752 chr713004349 hsa-mιr-29a 130018831 (-) 0 4-130044679 L 0 - ND chr7 130019471 chr713004349 hsa-mιr-29b-1 130019555 (-) 0 4-130044679 L 0 - ND chr7 136045197 chr713601055 hsa-mιr-490 136045275 (+) 20 3-136010753 E GENIC Suz12 K27
Chr7 150373173 chr7 15036713 K> hsa-mir-671 150373253 (+) 20 2-150367332 TELB 0 GENIC Ch xl 156866507 chr7 15686883 EST AW69940 hsa-mir-153-2 156866586 (-) 20 3-156869033 T 0 5 <5kb ch r7 156866507 chr7 15787979 hsa-mιr-153-2 156866586 (-) 13 4-157879994 known - 7 GENIC K27 chr7 157824891 ch Xl 15787979 hsa-mir-595 157824977 (-) 18 4-157879994 ELB - 2 GENIC K27 ch r8 1752809- chrδ 1752393- hsa-mιr-596 1752883 (+) 10 1753192 TE CpG 0 <5kb ND chrδ 9636597- chrδ 9450754- hsa-mιr-597 9636678 (+) 20 9450954 ELBT CpG 0 GENIC non-overlap chrδ 9798311- chrδ 9797569- EST, Cons non- hsa-mιr-124-1 9798387 (-) 2 9803095 ET CpG 0 overlap EST <5kb Cons Suz12 K27 chr8 10930131- chrδ 11096165- hsa-mιr-598 10930209 (-) 19 11096385 ETL CpG 3 GENIC <5kb Cons K27 chrδ 14755313- chrδ 14851771- hsa-mιr-383 14755394 (-) 0 14852225 B 0 ND
miRNA H3K4 CpG intervenProxiConH3K27 name position score TSS position me3 Island ing Sites GENE/EST mal served Oct4 Suz12 me3 non-overlap gene POLR3D,
Cons non-
Ch r8 22158423- chrδ 22157378- overlap hsa-mιr-320 22158496 (-) 2 22160357 ELBT CpG 0 gene Polr3d <5kb Cons - hsa-mir- chrδ 41637107- chrδ 41774197-
486 os 41637190 (-) 16 41774397 T CpG 4 GENIC Suz12 K27
Ch r8 65454276- chrδ 65443464- hsa-mιr-124-2 65454350 (+) 0 65446855 E CpG 2 <5kb Cons Oct4 Suz12 ND chrδ 100618190 chrδ 1007δ39δ hsa-mιr-875 100618265 (-) 0 9-100764453 BT _ 0 ND hsa-mιr-548a- chr8 105565778 chrδ 10566862 non-overlap
3 105565855 (-) -10 1-105672400 EBLT CpG 0 gene LRP12 - K27 hsa-mir-548d- chrδ 124429460 chrδ 12449767
1 124429537 (-) 18 1-124497871 LEBT CpG 2 GENIC K27 chrδ 135881947 chrδ 13591362 hsa-mιr-30b 135882026 (-) 1 4-135913616 (ELB) CpG 0 manual _ ND K> chrδ 135886294 chrδ 13591362 hsa-mιr-30d 135886375 (-) 1 4-135913616 (ELB) CpG 0 manual ND chrδ 141811858 chr8 14208041 hsa-rnιr-151 141811934 (-) 19 4-142080614 known CpG 1 GENIC _
EST BP359593 chrδ 145091352 chrδ 14509989 ,BP321327,BP3 hsa-mιr-661 145091422 (-) 10 0-145100090 ELBT CpG 0 15878 Oct4 - chrδ 145091352 chrδ 14512143 hsa-mιr-661 145091422 (-) 17 1-145121631 LTEB CpG 3 GENIC Suz12 K27 chr94840299- chr94782836- hsa-mιr-101-2 4840375 (+) 24 4783036 ELBT - 1 GENIC <5kb Cons _ chr9 4840299- Ch r9 4830304- EST BE718847 hsa-mιr-101-2 4840375 (+) 10 4630504 L - 0 N94092 ND ch r9 20706109- chr9 2064δ20δ- hsa-mιr-491 20706184 (+) 18 20648408 known - 2 GENIC ND chr9 20706109- ch r9 20648460- hsa-mir-491 20706184 (+) 9 20648660 T _ 1 EST DA511016 - ND
miRNA H3K4 CpG Interven- Proxi- Con- H3K27 name position score TSS position me3 Island ing Sites GENE/EST mal served Oct4 Suz12 me3 chr9 20706109- chrθ 20674113- EST BP231045 hsa-mιr-491 20706184 (+) 15 20674313 ELT CpG 0 , Cons GENIC <5kb Cons ND chrθ 21502109- chrθ 21549673- hsa-mιr-31 21502187 (-) 10 21549873 ETL CpG 0 EST DA246725 ND
Chr928853624- hsa-mιr-876 28853704 (-) chr9 28878877- hsa-mir-873 28878953 (-) chrθ 70654453- chr9 71291274- hsa-mιr-204 70654532 (-) 20 71291474 known CpG 0 GENIC ND chrθ 83814230- chr9 83824973- hsa-mιr-7-1 83814313 (-) 20 83825173 EBLT CpG 0 GENIC
Ch r9 94017789- chr9 94008025- hsa-let-7a-1 94017881 (+) 10 94008225 LTBE CpG 0 EST BG326593 ND ch r9 94018180- chrθ 94008025- hsa-let-7f-1 94018277 (+) 10 94008225 LTBE CpG 0 EST BG326593 ND
O chr9 94020668- ch r9 94008025- hsa-let-7d 94020763 (+) 10 94008225 LTBE CpG 0 EST BG326593 ND
Chr9 94927055- chr9 94568437- hsa-mιr-23b 94927132 (+) 12 94568637 known CpG 8 GENIC chr9 94927055- chr9 94846550- hsa-mιr-23b 94927132 (+) 94846750 LEBT CpG 4 EST BF681561 ND chr9 94927290- chrθ 94568437- hsa-mιr-27b 94927372 (+) 12 Θ4568637 known CpG 8 GENIC chr9 94927290- chrθ 94846550- hsa-mιr-27b 94927372 (+) 94846750 LEBT CpG 4 EST BF681561 ND
Ch r9 94927853- ch r9 94568437- hsa-mir-24-1 94927932 (+) 12 94568637 known CpG 8 GENIC chr9 94927853- chrθ 94846550- hsa-mιr-24-1 94927932 (+) 94846750 LEBT CpG 4 EST BF681561 ND chr9 108888057 chrθ 10896168 hsa-mιr-32 108888138 (-) 19 0-108961880 LBTE CpG 1 GENIC
miRNA H3K4 CpG Interven- Proxi- Con- H3K27 name position score TSS position me3 Island ing Sites GENE/EST mal served Oct4 Suz12 me3 chr9 114051273 chr9 11399768 hsa-mιr-455 114051351 (+) 20 4-113997884 ELTB CpG 0 GENIC Suz12 K27 ch r9 120086805 chr9 12022813 hsa-mir-147a 120086887 (-) 0 1-120230421 L - 0 ND chr9 122953384 chr9 12295721 hsa-mιr-600 122953457 (-) 15 0-122957410 known - 5 GENIC chr9 123244355 chr9 12377187 hsa-mιr-601 123244431 (-) 19 1-123772071 known - 1 GENIC hsa-mιr-181a- chr9 124534303 chr9 12450019
1 124534382 (+) 10 9-124500399 BELT - 0 EST BM716898 ND hsa-mir-181b- chr9 124535548 chr9 12450019
2 124535627 (+) 10 9-124500399 BELT 0 EST BM716898 ND chr9128086568 ch r9 12808634 hsa-mιr-199b 12S086648 (-) 10 8-128086876 B - 0 <5kb W chr9 128234455 chr9 12823377 hsa-mιr-219-2 128234539 (-) 10 3-128235536 ELB CpG 0 <5kb Suz12 K27 chr9 136840895 chr9 13682904 hsa-mιr-126 136840973 (+) 18 5-136829245 B CpG GENIC chr9 136840895 chr9 13683596 hsa-mιr-126 136840973 (+) 10 8-136836168 LBE CpG 0 EST BQ962580 ch r9 138008713 chr9 13778918 hsa-mιr-602 138008791 (+) 17 0-137789380 ELBT - J 3 CN GENIC ND chii 0 17927118 chiiO 1789126 hsa-mιr-511-1 17927196 (+) 25 7-17891467 known - 0 GENIC <5kb Cons ND chr10 18174047 chr10 1813825 hsa-mιr-511-2 18174125 (+) 26 7-18138457 known - - GENIC <5kb Cons ND chr10 24604633 chr10 2402358 hsa-mιr-603 24604711 (+) 19 0-24023780 known CpG 1 GENIC Suz12 ND
miRNA H3K4 CpG IntervenProxi- Con- H3K27 name position score TSS position me3 Island ing Sites GENE/EST mal served Oct4 Suz12 me3 chr 1024604633 chMO 2453799 hsa-mιr-603 24604711 (+) 9 6-24538196 LEBT - 1 EST DB096290 _ K27 chr10 29873944 ChMO 3006463 hsa-mir-604 29874031 (-) 17 6-30064836 ELBT CpG 3 GENiC _ K27 chr10 52729344 chMO 5242102 hsa-mιr-605 52729420 (+) 20 3-52421223 known CpG 0 GENIC Suz12 ND chr 10 52729344 chMO 5250422 hsa-mιr-605 52729420 (+) 10 1-52504421 ELT CpG 0 EST CB958569 Suz12 K27 chr10 76982217 ChMO 7686130 hsa-mιr-606 76982312 (+) 9 9-76861509 EL CpG 1 EST BM720401 Suz12 ND chii 0 88014437 chMO 8811372 EST DA723974 hsa-mir-346 88014516 {-) 9 6-88113926 EBL CpG 1 , B 1546010 Suz12 K27 chr10 88014437 chMO 8811613 hsa-mιr-346 88014516 (-) 22 0-88116330 EN CpG 3 GENIC <5kb Cons Suz12 K27 chr10 88014437 chMO 8811682 hsa-mιr-346 88014516 (-) 7 9-88117029 E CpG 3 EST DR001156 Suz12 K27 ch r10 91342483 chMO 9139520 K> hsa-mir-107 91342564 (-) 19 9-91395409 LBET - 1 GENlC - - chr10 98578421 chMO 9858063 non-overlap hsa-mir-607 98578501 (-) 0 3-98584579 BELT CpG 0 gene MLR2 <5kb _ _ chr10 10272473
7-102724826 chMO 1027191 hsa-mιr-608 (+) 20 69-102719369 LEBT CpG 0 GENIC ch r10 10418625
7-104186335 chMO 1041847 hsa-mιr-146b (+) 10 97-104186757 BTL - 0 <5kb - K27 chiiO 10596854 chMO 1059820 hsa-mιr-609 3-105968626 (-) 20 10-105982210 ELTB CpG 0 GENIC - - chii 0 13494991 chMO 1349599 hsa-mιr-202 4-134949987 (-) 0 21-134960314 B - 0 ND chii 1 558102- ChM 1 558302- hsa-mir-210 558181 (-) 20 558502 BLTE CpG 0 EST BU855637 <5kb _
miRNA H3K4 CpG Interven- Proxi- Con- H3K27 name position score TSS position me3 Island ing Sites GENE/EST mal served Oct4 Suz12 me3
EST AL521178, BF308314.BF3 chr11 2111936- chr11 2116071- 13025.B119698 hsa-mir-483 2112018 (-) 10 2116271 E CpG 0 0 K27 chri i 2111936- chii 1 2118637- hsa-mιr-483 2112018 (-) 9 2118837 ELB CpG 1 EST T48617 Suz12 K27 chri i 2111936- chr11 2125516- hsa-mιr-483 2112018 (-) 18 2125716 knowr 2 GENIC ND ch r11 28034943 hsa-mιr-610 28035023 (+) - chr11 43559524 chii 1 4355706 EST BI964058, hsa-rnιr-129-2 43559606 (+) 25 7-43557267 ETL Cons <5kb Cons Suz12 K27
EST BM803887 chr11 57165251 chr11 5716234 , Cons non- hsa-mιr-130a 57165332 (+) 22 7-57162547 EBTL overlap EST <5kb Cons chrH 61316538 ChM 1 6131656 hsa-mιr-611 61316611 (-) 30 1-61316761 ELBT CpG GENIC <5kb chr11 61316538 ChM 1 6149157 hsa-mιr-611 61316611 (-) 1 9-61491779 LEBT CpG EST DC348087 chr11 64415197 ChM 1 6441464 non-overlap hsa-mir-192 64415281 (-) 0 6-64418145 L EST <5kb chrH 64415407 ChM 1 6441464 non-overlap hsa-mιr-194-2 64415483 {-) 0 6-64418145 L EST <5kb chr11 64968510 ChM 1 6474962 hsa-mir-612 64968592 (+) -8 1-64750963 TELB CpG ND chrH 72003748 ChM 1 7206301 hsa-mιr-139 72003826 (-) 24 3-72063213 E 1 GENIC <5kb Cons K27 chr11 74723790 ChM 1 7474042 hsa-mιr-326 74723869 (-) 19 1-74740621 EBTL CpG 1 GENIC
EST CF126780 chrH 78790714 ChM 1 7882885 Cons EST CJ 1 hsa-rnιr-708 78790801 (-) 15 8-78829058 BE CpG 0 01608 <5kb Cons Oct4 Suz12 ND
miRNA H3K4 CpG terven- Proxi- Con- H3K27 name position score TSS position me3 I issllaamnd ing Sites GENE/EST mal served Oct4 Suz12 me3 chr'11 11088887
6-1 10888954 ChM 1 1108883 hsa-mir-34b (+) 10 03-110889419 EBT CpG <5kb Suz12 K27 chri i 11088937
6-1 10889450 chM 1 1108883 hsa-mir-34c (+) 10 03-110889419 EBT CpG <5kb Suz12 K27 hsa-mιr-125b- chr11 12147567 chM 1 1214762
1 7-121475758 (-) 20 69-121476469 E - 0 EST DA405903 <5kb - - K27
EST DA808613 hsa-mir-125b- ChM 1 12147567 chM 1 1216852 , Cons non-
1 7-121475758 (-) 9 69-121685469 L - 1 overlap EST <5kb Cons - - ND
EST DA808613 chr11 12152243 chM 1 1216852 , Cons non- hsa-let-7a-2 1-121522517 (-) 9 69-121685469 L - 1 overlap EST <5kb Cons - - ND
EST DA808613 chr11 12152814 chM 1 1216852 , Cons non- 4- hsa-mιr-100 8-121528224 (-) 9 69-121685469 L - 1 overlap EST <5kb Cons - - ND
EST BU860272
ChM 2 6943117- chM 2 6942568- ,BX094730,BM hsa-mιr-200c 6943198 (+) 20 6942768 E - 0 855863 <5kb _
EST BU860272
ChM 2 6943528- ChM 2 6942568- ,BX094730,BM hsa-mιr-141 6943610 (+) 20 6942768 E - 0 855863 <5kb - - -
ChM 2 12808850 chM 2 1277003 hsa-mιr-613 12808939 (+) 19 0-12770230 TBEL CpG 1 GENIC - - ND
Ch M 2 12960045 chM 2 1285273 hsa-mir-614 12960115 (+) -2 0-12853136 B - 2 - - - hsa-rnιr-196a- ChM 2 52671803 chr12 5266856
2 52671881 (+) 10 5-52670193 B - 0 <5kb Oct4 Suz12 K27
EST BE394964 chM2 52714007 chr12 5269707 , Cons non- hsa-rnιr-615 52714092 (+) 6 9-52697279 ELBT - 4 overlap EST <5kb Cons Oct4 Suz12 K27
ChM 2 52714007 chM2 5271132 hsa-mιr-615 52714092 (+) 9 1-52711758 E CpG 1 <5kb Suz12 K27
miRNA H3K4 CpG IntervenProxiConH3K27 name position score TSS position me3 Island ing Sites GENE/EST mal served Oct4 Suz12 me3 chr 12 52714007 chM2 5271299 hsa-mιr-615 52714092 (+) 35 8-52713198 LET - 0 GENIC <5kb Cons - Suz12 K27 chM2 53017280 ChM 2 5298115 EST DA590530 hsa-mιr-148b 53017360 (+) 2-52981352 EBL - 2 ,BM 147393 Oct4 chr12 53017280 chr12 5298446 hsa-mιr-148b 53017360 (+) 4-52985083 B - 1 <5kb Cons Oct4 chr 1253017280 ChM 2 5300507 hsa-mιr-148b 53017360 (+) 25 7-53005277 LTEB - 0 GENIC <5kb Cons - K27 chr 12 56199223 ch M 2 5620046 hsa-mir-616 56199304 (-) 30 7-56200667 TELB CpG 0 GENIC <5kb - chr 12 56504660 ChM 2 5652668 hsa-mιr-26a-2 56504739 (-) 24 9-56526889 LEBT CpG 1 GENIC <5kb Cons - chr12 61283728 chM2 6128269 hsa-let-7ι 61283825 (+) 20 6-61282896 ELBT CpG 0 EST DA092355 <5kb - chr12 63302570 ChM 2 6329045 hsa-mιr-548c 63302647 (+) 20 9-63290659 EBTL CpG 0 GENIC - chr1279728798 ch M 2 7983406 Ul hsa-mir-617 79728871 (-) 20 2-79834262 LE - 0 GENIC - ChM 2 79831990 ch M 2 7983406 hsa-mιr-618 79832075 (-) 30 2-79834262 LE - 0 GENIC <5kb - chr12 93730661 ChM 2 9354006 hsa-mιr-492 93730746 (+) -1 2-93541288 L - 1 - ND chr12 94204679 ChM 2 9410027 hsa-mιr-331 94204754 (+) -1 3-94101201 E - 1 Oct4 ND hsa-mιr-135a- ChM 2 96460070 ChM 2 9637086
2 96460149 (+) 8-96371668 LB _ 0 - ND
ChM 2 10773315 ChM 2 1077537 hsa-mιr-619 5-107733233 (-) 20 32-107753932 ELBT CpG 0 GENIC ch r12 11504909 chM 2 1151777 hsa-mιr-620 0-115049183 (-) 19 63-115177963 BELT 1 GENIC ChM 340282915 chM 3 4026153 hsa-mιr-621 40282992 (+) 16 2-40261732 TELB CpG 4 GENIC
miRNA H3K4 CpG intβrven- Proxi- Con- H3K27 name position score TSS position rne3 lsland ing Sites GENE/EST mal served Oct4 Suz12 me3 chr13 49521111 chr134955404 hsa-mιr-16-1 49521195 (-) 25 0-49554240 BTEL CpG 0 GENIC <5kb Cons - K27
EST BI907938, chr1349521111 chri 3 4959755 Cons EST BY7 hsa-mιr-16-1 49521195 (-) 13 6-49597756 BELT CpG 2 34056 <5kb Cons Oct4 Suz12 ND chr13 49521257 ch r134955404 hsa-mιr-15a 49521335 (-) 25 0-49554240 BTEL CpG 0 GENIC <5kb Cons K27
EST BI907938, chr1349521257 chii 34959755 Cons EST BY7 hsa-mιr-15a 49521335 {-) 13 6-49597756 BELT CDG 2 34056 <5kb Cons Oct4 Suz12 ND chii 3 89681444 hsa-mιr-622 89681527 (+) chii 3 90800863 chri 3 9079797 hsa-mir-17 90800941 (+) 35 4-90798174 EBLT CpG 0 GENIC <5kb Cons chii 390801001 chri 3 9079797 hsa-mιr-18a 90801083 (+) 35 4-90798174 EBLT CpG 0 GENIC <5kb Cons chii 3 90801149 chri 3 9079797 hsa-mιr-19a 90801226 (+) 25 4-90798174 EBLT CpG 0 GENIC <5kb Cons chii 3 90801317 ch ii 3 9079797 hsa-mir-20a 90801395 (+) 25 4-90798174 EBLT CpG 0 GENIC <5kb Cons chti 3 90801452 chii 3 9079797 hsa-mιr-19b-1 90801532 (+) 25 4-90798174 EBLT CpG 0 GENIC <5kb Cons chii 3 90801569 chri 3 9079797 hsa-mιr-92a-1 90801647 (+) 25 4-90798174 EBLT CpG 0 GENIC <5kb Cons ChM 3 98806391 chii 3 9865098 hsa-mιr-623 98806479 (+) 1-98651181 ELBT CpG 16 GENIC chii 3 98806391 chri 3 9868863 hsa-mιr-623 98806479 (+) 1-98688831 T - 9 EST CF131708 ND chii 3 98806391 chii 3 9876486 hsa-mιr-623 98806479 (+) 2-98765062 T - 1 EST BM922876 ND chrt 3 98806391 chri 3 9880627 hsa-mιr-623 98806479 (+) 10 4-98807081 TB - 0 <5kb ND chr14 22927641 ch ii 42294722 hsa-mιr-208 22927717 (-) 17 4-22947424 known - 8 GENIC <5kb Cons
miRNA H3K4 CpG Interven- Proxi- Con- H3K27 name position score TSS position me3 Island ing Sites GENE/EST mal served Oct4 Suz12 me3 chr14 22927641 chr14 2305123 hsa-rnιr-208 22927717 (-) 1 3-23051632 T <5kb Cons ND ch r 14 22957036 chr14 2297459 hsa-mir-208b 22957112 (-) 12 0-22974790 known GENIC chr 1422957036 ChM 4 2305123 hsa-mιr-208b 22957112 (-) 1 3-23051632 T - 1 <5kb Cons ND chr14 30553616 chr14 3056524 hsa-mιr-624 30553694 (-) 19 0-30565440 LEBT CpG 1 GENIC K27 chr 14 65007577 chr14 6494696 hsa-mιr-625 65007655 (+) 19 3-64947163 known - 1 GENIC chr14 65007577 chr14 6494892 hsa-mir-625 65007655 (+) 10 0-64949120 LBTE CpG 0 EST BP234683 K27 chr 14 99645755 chr14 9950780 hsa-mir-342 99645837 (+) 17 3-99508003 TELB CpG 8 GENIC <5kb Cons ND chr 14 99645755 chr14 9960142 hsa-mir-342 99645837 (+) 4 9-99601629 BT 6 EST DB113798 non-overlap chr 14 99843956 chr14 9984067 gene SLC25A2 hsa-mιr-345 99844033 (+) 0 4-99844301 ETBL CDG 0 9 <5kb chr 14 10038849 2-100388567 chr14 1003854 hsa-mιr-770 (+) 10 33-100386041 TB <5kb ND chr 14 10040515 5-100405237 chr14 1003854 hsa-mir-493 (+) 0 33-100386041 TB ND chr14 10041059
5-100410670 chr14 1003854 hsa-rnir-337 (+) 0 33-100386041 TB ND chr14 10041112
3-100411194 chr14 1003854 hsa-mir-665 (+) 0 33-100386041 TB ND chr14 10041710
6-100417189 ch r14 1003854 hsa-mιr-431 0 33-100386041 TB ND
mi RNA H3K4 CpG interven- Proxi- Con- H3K27 name position score TSS position me3 Island ing Sites GENE/EST mal served Oct4 Suz12 me3 chr14 10041797 7-100418070 ch ii 4 1003854 hsa-mιr-433 (+) 0 33-100386041 TB - 0 ND ch ri 4 10041907 7-100419156 chr14 1003854 hsa-mιr-127 (+) 0 33-100386041 TB - 0 ND chr14 10042057
6-100420664 ch r14 1003854 hsa-mιr-432 (+) 0 33-100386041 TB - 0 ND chr 1410042079
6-100420871 chr14 1003854 hsa-mιr-136 (+) 0 33-100386041 TB - 0 ND chr 14 10044723
1-100447307 chr14 1004409 hsa-mir-370 (+) 0 72-100441559 E - 0 chr14 10055815 1-100558229 chr14 1004409 06 hsa-mιr-379 (+) 0 72-100441559 E - 0 chr 14 10055942
0-100559495 chr14 1004409 hsa-mιr-411 (+) 0 72-100441559 E - 0 chr14 10055988
0-100559953 chr14 1004409 hsa-mir-299 (+) 0 72-100441559 E - 0 chr14 10056110
0-100561177 ChM 4 1004409 hsa-mir-380 (+) 0 72-100441559 E - 0 chr14 10056182
7-100561902 chri 4 1004409 hsa-mιr-323 (+) 0 72-100441559 E - 0 chr14 10056211
5-100562192 chr14 1004409 hsa-mιr-758 (+) 0 72-100441559 E - 0
miRNA H3K4 CpG Interven- Proxi- Con- H3K27 name position score TSS position me3 Island ing Sites GENE/EST mal served Oct4 Suz12 me3 chr14 10056287 6-100562955 chr14 1004409 hsa-mιr-329-1 (+) 0 72-100441559 E - 0 chr14 10056319
3-100563272 chr14 1004409 hsa-mιr-329-2 (+) 0 72-100441559 E - 0 chr 14 10056572 9-100565802 chr14 1004409 hsa-mir-494 (+) 0 72-100441559 E - 0
Chr 14 10056807
7-100568154 chr14 1004409 hsa-mιr-543 (+) 0 72-100441559 E - 0 chr 14 10056984 8-100569925 chr14 1004409 hsa-mιr-495 (+) 0 72-100441559 E - 0 chr 14 10057577 5-100575852 ChM 4 1004409 hsa-mir-368 (+) 0 72-100441559 E - 0 chr14 10057616 hsa-mιr-376a- 1-100576238 chr14 1004409 2 (+) 0 72-100441559 E - 0 chr14 10057631 2-100576390 chr14 1004409 hsa-mιr-654 (+) 0 72-100441559 E - 0 chr14 10057653
9-100576618 chr14 1004409 hsa-mιr-376b (+) 0 72-100441559 E - 0 chr14 10057686 hsa-mir-376a- 8-100576945 chr14 1004409 1 (+) 0 72-100441559 E - 0 chr14 10057745
3-100577535 chr14 1004409 hsa-mιr-300 (+) 0 72-100441559 E - 0
miRNA H3K4 CpG interven- Proxi- Con- H3K27 name position score TSS position me3 Island ing Sites GENE/EST mal served Oct4 Suz12 me3 chr 14 10058200 7-100582089 chr14 1004409 hsa-mιr-381 (+) 0 72-100441559 E - 0 chr 14 10058254
9-100582626 chr14 1004409 hsa-mir-487b {+) 0 72-100441559 E - 0 chr 14 10058340
9-100583492 chr14 1004409 hsa-mir-539 (+) 0 72-100441559 E - 0 chr 14 10058399
1-100584069 chr14 1004409 hsa-mιr-889 (+) 0 72-100441559 E - 0 chr 14 10058475
6-100584833 chr14 1004409 hsa-mir-544 (+) 0 72-100441559 E - 0 chr 14 10058565
2-100585731 chr14 1004409 hsa-mιr-655 (+) 0 72-100441559 E - 0 chr14 10058853
8-100588615 chr14 1004409 hsa-mιr-487a (+) 0 72-100441559 E - 0 chr14 10059039
6-100590474 chr14 1004409 hsa-mιr-382 (+) 0 72-100441559 E - 0 chr14 10059077
4-100590854 chr14 1004409 hsa-mir-134 (+) 0 72-100441559 E - 0 chr14 10059150
7-100591585 chr14 1004409 hsa-mir-485 (+) 0 72-100441559 E - 0 chr14 10059227
3-100592353 chr14 1004409 hsa-mιr-453 (+) 0 72-100441559 E - 0
miRNA H3K4 CpG interven- Proxi- Con- H3K27 name position score TSS position me3 Island ing Sites GENE/EST mal served Oct4 Suz12 me3 chii 4 10059584 9-100595926 ch r14 1004409 hsa-mιr-154 (+) 0 72-100441559 E - 0 chii 4 10059667
6-100596749 chr14 1004409 hsa-mιr-496 (+) 0 72-100441559 E - 0 chr14 10059813
6-100598215 ChM 4 1004409 hsa-mιr-377 (+) 0 72-100441559 E - 0 chr 14 10060058
5-100600668 ch r14 1004409 hsa-mιr-541 (+) 0 72-100441559 E - 0 ch r14 10060139
4-100601467 chti 4 1004409 hsa-mir-409 (+) 0 72-100441559 E - 0 chr14 10060154
3-100601621 chr14 1004409 hsa-mιr-412 (+) 0 72-100441559 E - 0 chr14 10060168
6-100601761 ch Ii 4 1004409 hsa-mιr-369 (+) 0 72-100441559 E - 0 chr14 10060200 5-100602081 chr14 1004409 hsa-mιr-410 (+) 0 72-100441559 E - 0 chr14 10060281
1-100602886 ch Ii 4 1004409 hsa-mιr-656 (+) 0 72-100441559 E - 0 chr14 10365351
1-103653590 chii 4 1036519 hsa-mιr-203 (+) 10 82-103653842 LTEB CpG 0 <5kb Suz12 K27 chr15 29144544 chriS 2918112 hsa-mιr-211 29144621 (-) 19 1-29181321 known - 1 GENIC chr 15 29144544 ch r15 2929436 hsa-mir-211 29144621 (-) 1 2-29296632 LBE CpG 1 <5kb Cons ND
miRNA H3K4 CpG IntervenProxi- Con- H3K27 name position score TSS position me3 Island ing Sites GENE/EST mal serv< Suz12 me3
EST BP872676 ch r15 39771096 chr15 3970061 ,DA245206,DA hsa-mιr-626 39771163 (+) 3-39700813 ELT CpG 1 218982 ND chr 15 39771096 chr15 3973983 hsa-mιr-626 39771163 (+) 20 6-39740036 ELBT CpG 0 GENIC ND chr 15 40279072 chr15 4028769 hsa-mιr-627 40279151 (-) 20 4-40287894 ELBT CpG 0 GENIC K27 chr 1543512540 ch Ii 54348176 hsa-mir-147b 43512619 {+) 19 4-43481964 LBET CpG 1 GENIC chr 1543512540 chr154351000 EST BI560356, hsa-mιr-147b 43512619 (+) 20 0-43510200 T CpG 0 BM313088 <5kb K27 chr 15 53452434 chr15 5348769 hsa-mιr-628 53452510 (-) 20 6-53487896 LBTE CpG 0 GENIC chr15 60903213 ch r15 6072670 hsa-mir-190 60903290 (+) 20 1-60726901 known - 0 GENIC chr15 61950185 chr15 6212488 non-overlap hsa-mιr-422a 61950272 (-) -10 5-62126146 BELT CpG 0 gene DAPK2 K27 chr 15 68158770 chr156817721 hsa-mιr-629 68158850 (-) 20 0-68177410 ETBL CpG 0 GENIC Oct4 Suz12 K27 ch r15 70666620 chr15 7054111 hsa-mιr-630 70666703 (+) -1 6-70541715 T - 1 ND chr15 73433003 ch ii 5 7361050 hsa-mιr-631 73433074 (-) -2 6-73611464 L 2 ND chr15 77289187 chr15 7728903 hsa-mιr-184 77289268 (+) 10 1-77289639 E - 0 <5kb hsa-mir- chr15 78921381 chr15 7885866
549 os 78921454 (+) 20 6-78858866 EL CpG 0 GENIC Suz12 K27 chr15 86956081 chr15 8674539 hsa-mir-7-2 86956162 (+) _3 9-86748492 B - 3 ND chrt 5 87712257 chr15 8771223 EST DA059714 hsa-mιr-9-3 87712337 (+) 25 3-87712433 E CpG 0 , Cons <5kb Cons Suz12 K27 chr16 760191- chrt 6 726429- hsa-mιr-662 760274 (+) -1 728379 B CpG 1 ND
miRNA H3K4 CpG IntervenProxiConH3K27 name position score TSS position me3 Island ing Sites GENE/EST mal sent me3
EST CD386249
ChM 6 14305328 chr16 1430360 ,BX108536,BF5 hsa-mιr-193b 14305406 (+) 25 0-14303800 LBET CpG 0 75040, Cons <5kb Cons K27
EST CD386249 chr16 14310647 chr16 1430360 ,BX108536, BF5 hsa-mιr-365-1 14310729 (+) 13 0-14303800 LBET CpG 2 75040, Cons <5kb Cons K27 chr16 14310647 chr16 1430918 hsa-mιr-365-1 14310729 (+) 9 2-14309534 L - 1 <5kb chr 16 14310647 chr16 1430974 hsa-mιr-365-1 14310729 (+) 10 8-14312532 BT 0 <5kb non-overlap gene LKAP, Cons non- overlap chr16 15644614 chri 6 1564309 gene 4921513D hsa-mιr-484 15644690 (+) 2 2-15645825 BTEL 0 23Rιk <5kb Cons chr 16 55449930 chri 6 5530009 hsa-mιr-138-2 55450018 (+) -1 2-55300691 T 1 ND non-overlap gene LRRC29, Cons non- chr16 65793721 chri 6 6581737 overlap hsa-mιr-328 65793803 (-) -15 7-65819769 BLTE CpG 0 gene Lrrc29 <5kb Cons chr16 68524497 chri 6 6835364 hsa-mιr-140 68524577 (+) 25 9-68353849 TELB - 0 GENIC <5kb Cons chr17 1563948- chri 7 1566155- hsa-mir-22 1564027 (-) 30 1566355 LETB CpG 0 GENIC <5kb - K27 chii 7 1899963- chii 7 1899412- hsa-mιr-132 1900040 (-) 10 1901670 LTEB CpG 0 <5kb Suz12 K27 ch Ii 7 1900323- chri 7 1899412- hsa-mir-212 1900404 (-) 10 1901670 LTEB CpG 0 <5kb Suz12 K27 chri 7 6861660- ch r17 6863599- EST DB266639 hsa-mir-195 6861740 (-) 25 6863799 L 0 , Cons <5kb Cons
miRNA H3K4 CpG IntervenProxiConH3K27 name position score TSS position me3 Island ing Sites GENE/EST mal served Oct4 Suz12 me3 chM76861966- ChM 7 6863599- EST DB266639 hsa-mιr-497 6862052 (-) 25 6863799 L - 0 , Cons <5kb Cons _ _ ChM77067341- ChM 7 7076508- non-overlap hsa-mir-324 7067417 (-) -10 7079338 EBLT CpG 0 gene DVL2 Oct4 - - ChM711925941 Ch M 7 1186475 K4 TB hsa-mir-744 11926038 (+) 25 9-11864959 EL CpG 0 GENIC <5kb Cons K27 chr1717657886 ChM 7 1766752 EST CD672966 hsa-mιr-33b 17657965 (-) 10 2-17667722 LBT CpG 0 .BQ640065 - - ND ChM717657886 ChM 7 1768095 hsa-mιr-33b 17657965 (-) 18 0-17681150 LEBT CpG 2 GENIC _ _ _ chrl724212517 ChM 7 2424431 hsa-mιr-451 24212578 (-) 0-24244509 T - 0 - ND chr 1724212682 ChM 7 2424431 hsa-mιr-144 24212757 (-) 0-24244509 T - 0 - ND chr 1725468229 ChM 7 2546785 hsa-mιr-423 25468307 (+) 35 9-25468059 TELB CpG 0 GENIC <5kb Cons _ K27 ChM726911138 ChM 7 2690991 hsa-mιr-193a 26911213 (+) 14 0-26910109 T CpG 1 <5kb Cons - - K27 chr1726911138 ChM 7 2691018 hsa-mιr-193a 26911213 (+) 15 6-26913533 EL CpG 0 <5kb Cons - Suz12 «27 ChM726926560 ChM 72690991 hsa-mιr-365-2 26926641 (+) 0-26910109 T CpG 2 <5kb Cons - K27 chr1726926560 chr17 2691018 hsa-mir-365-2 26926641 (+) 6-26913533 EL CpG 1 <5kb Cons Suz12 K27 chM727701248 ChM 7 2767546 hsa-mιr-632 27701329 (+) 2-27675662 L - 2 EST DA702523 _ ND ChM727701248 ChM 7 2770116 hsa-mιr-632 27701329 (+) 30 4-27701364 ELBT CpG 0 GENIC <5kb - K27 Chr1743469529 ch M 7 4347005 hsa-mir-152 43469607 (-) 35 1-43470251 ELT CpG 0 GENIC <5kb Cons _ _ K27 chM744012215 chr17 4401431 hsa-mιr~10a 44012297 (-) 15 0-44014709 T CpG 0 <5kb Cons Oct4 Suz12 K27 hsa-mιr-196a- ChM 744064845 ChM 7 4406631
1 44064924 (-) 10 0-44066509 T CpG 0 <5kb Oct4 Suz12 K27
miRNA H3K4 CpG IntervenProxiConH3K2 name position score TSS position me3 Island ing Sites GENE/EST mal served Oct4 Suz12 me3 chr17 53763595 ChM 7 5375878 hsa-mιr-142 53763673 (-) 10 4-53758984 BELT - 0 EST BP251669 - - - chii 7 54569920 ChM 7 5458724 hsa-mιr-454 54570000 (-) 25 3-54587443 EBLT CpG 0 GENIC <5kb Cons Oct4 - - chr 17 54583282 Ch M 7 5458724 hsa-mir-301a 54583359 (-) 25 3-54587443 EBLT CpG 0 GENiC <5kb Cons Oct4 - - non-overlap chr 17 55273406 ChM 7 5513828 gene BIT1 , hsa-mιr-21 55273485 (+) 6 3-55141202 ELBT CpG 4 Cons GENIC <5kb Cons - - K27 chr 17 58375322 ChM 7 5822571 hsa-mir-633 58375400 (+) -1 0-58225909 T - 1 - - ND chr17 62213669 ChM 7 6172928 hsa-mιr-634 62213743 (+) 21 7-61729487 known CpG - GENIC Oct4 Suz12 K27 hsa-mιr-548d- chr17 62898072 ChM 7 6290151
2 62898149 (-) 0 0-62901909 T - 0 - - ND chr17 63932186 ch M 7 6396511 hsa-mιr-635 63932279 (-) 20 0-63965310 ELBT CpG 0 GENIC - - 4- Ul chr17 72244133 ChM 7 7224490 hsa-mιr-636 72244213 (-) 30 8-72245108 BETL CpG 0 GENIC <5kb - - K27 non-overlap chr17 76713676 ChM 7 7675331 gene FLJ44861 hsa-mιr-657 76713750 (-) 6 1-76755359 BEL CpG 1 , Cons GENIC <5kb Cons - - K27 non-overlap
ChM 7 76714272 ChM 7 7675331 gene FLJ44861 hsa-mιr-338 76714348 (-) 6 1-76755359 BEL CpG 1 , Cons GENIC <5kb Cons - - K27 hsa-mιr-133a- ChM 8 17659661
1 17659738 (-) _
ChM 8 17662964 hsa-mιr-1-2 17663044 (-) -
ChM 8 31738786 chr18 3178314 hsa-mιr-187 31738863 (-) 0 4-31784776 ELT CpG 0 - - ND
Ch M 8 54269290 ChM 85426407 non-overlap hsa-mιr-122 54269367 (+) 0 9-54269401 L - 0 EST <5kb - - K27 chM9 3912417- ChM 9 3921938- hsa-mιr-637 3912510 (-) 20 3922138 EBLT CpG 0 GENIC _ -
miRNA H3K4 CpG intervenProxiConH3K27 name position score TSS position me3 Island ing Sites GENE/EST mal served me3 chii 9 4721702- chii 9 4720051- hsa-mιr-7-3 4721781 (+) 32 4720251 E - 0 GENIC <5kb Cons chri 9 6446959- ch ii 9 6440459- hsa-mιr-220b 6447045 (+) 0 6441187 TB - 0 ND chr19 10690085 chri 9 1068965 hsa-mιr-638 10690183 (+) 30 4-10689854 LTEB CpG 0 GENIC <5kb hsa-mir-199a- chr19 10789095 chii 9 1078933
1 10789177 (-) 6 4-10789682 (ELB) - 0 <5kb manual ND chii 9 13808093 chri 9 1380776 hsa-mιr-24-2 13808179 (-) 10 2-13808928 BT - 0 <5kb ch ii 9 13808093 chri 9 1381842 hsa-mir-24-2 13808179 (-) 1 7-13819944 BTEL - 1 <5kb Cons ND chii 9 13808251 chri 9 1380776 hsa-mir-27a 13808332 (-) 10 2-13808928 BT - 0 <5kb chr19 13808251 chii 9 1381842 hsa-mιr-27a 13808332 (-) 1 7-13819944 BTEL - 1 <5kb Cons ND chii 9 13808399 chri 9 1380776 4- hsa-mιr-23a 13808473 (-) 10 2-13808928 BT - 0 <5kb chri 9 13808399 chri 9 1381842 hsa-mιr-23a 13808473 (-) 1 7-13819944 BTEL - 1 <5kb Cons ND chii 9 13846529 chri 9 1384358 hsa-m!r-181c 13846609 (+) 10 3-13846410 BTE CpG 0 <5kb K27 chr19 13846714 chri 9 1384358 hsa-mιr-181d 13846797 (+) 10 3-13846410 BTE CpG 0 <5kb K27 chii 9 14501355 chri 9 1448879 hsa-mir-639 14501447 (+) 9 6-14488996 LTEB - 1 EST BF727112 chr19 14501355 chr19 1450128 hsa-mir-639 14501447 (+) 30 1-14501481 TLBE CpG 0 GENIC <5kb chr19 19406870 chii 9 1935769 hsa-mιr-640 19406962 (+) 8 3-19357893 ELBT CpG 2 EST BF436051 ND chii 9 19406870 chii 9 1937754 EST BU 149378 hsa-mir-640 19406962 (+) 9 3-19377743 EBLT CpG 1 .BX474864 ND chr 1945480294 chii 94548293 hsa-mir-641 45480383 (-) 30 6-45483136 LBET CpG 0 GENIC <5kb
miRNA H3K4 CpG Interven- Proxi- Con- H3K27 name position score TSS position me3 lsland ing Sites GENE/EST mal served Oct4 Suz12 me3 chri 950834097 chr19 5083440 hsa-mιr-330 50834178 (-) 30 9-50834609 BELT 0 GENIC <5kb Cons
EST BX352958,
DC389279.DC4 chr 1950834097 chr19 5083870 20439,BP3418 hsa-mιr-330 50834178 (-) 0-50838900 E 1 30 chN950870031 chr19 5086324 hsa-mιr-642 50870107 (+) 20 1-50863441 BTEL 0 GENIC Suz12 K27 chr 1953755341 ch r19 5378425 hsa-mιr-220c 53755423 (-) 9-53784458 T 0 ND chr 1954695857 chr19 5469463 EST BE673493 hsa-mιr-150 54695932 (-) 20 4-54694834 TELB 0 .AA689263 <5kb chr 1956887673 chr19 5688371 hsa-mιr-99b 56887752 (+) 10 7-56886813 EBLT 0 <5kb chr1956887848 chr19 5688371 hsa-let-7e 56887934 (+) 10 7-56886813 EBLT 0 <5kb chr1956888323 chr19 5688371 hsa-mir-125a 56888402 (+) 10 7-56886813 EBLT 0 <5kb chr1957476873 chr19 5746453 hsa-mιr-643 57476953 (+) 20 5-57464735 LBET CpG 0 GENIC chr1958861748 chr195877185 hsa-mιr-512-2 58861826 (+) 7-58773878 LE 0 ND chr1958864232 chr19 5877185 hsa-mir-512-1 58864310 (+) 7-58773878 LE 0 ND chr1958869286 chrt 9 5877185 hsa-mιr-498 58869365 (+) 7-58773878 LE 0 ND chr1958870781 chr19 5877185 hsa-mir-520e 58870861 (+) 7-58773878 LE 0 ND chr1958874072 ChM 9 5877185 hsa-mιr-515-1 58874150 (+) 7-58773878 LE 0 ND chr1958875009 ChN 9 5877185 hsa-mιr-519e 58875088 (+) 7-58773878 LE 0 ND chr1958877229 chr19 5877185 hsa-mιr-520f 58877309 (+) 7-58773878 LE 0 ND
miRNA H3K4 CpG Interven- Proxi- Con- H3K27 name position score TSS position me3 Island ing Sites GENE/EST mal served Oct4 Suz12 me3 chr19 58880078 chr19 5877185 hsa-mιr-515-2 58880156 (+) 0 7-58773878 LE 0 ND chii 9 58881540 ch r19 5877185 hsa-mιr-519c 58881618 (+) 0 7-58773878 LE 0 ND chri 9 58885950 chr19 5877185 hsa-mιr-520a 58886030 (+) 0 7-58773878 LE 0 ND chri 9 58889462 chr19 5877185 hsa-mir-526b 58889540 (+) 0 7-58773878 LE 0 ND chii 9 58890281 chr19 5877185 hsa-mιr-519b 58890360 (+) 0 7-58773878 LE 0 ND chr 19 58892603 chr19 5877185 hsa-mιr-525 58892681 (+) 0 7-58773878 LE 0 ND chr 19 58893456 chr19 5877185 hsa-mιr-523 58893535 (+) 0 7-58773878 LE 0 ND chr 19 58895086 ch r19 5877185 hsa-mιr-518f 58895163 (+) 0 7-58773878 LE 0 ND hsa-mιr-520c- chr 19 58896284 ch r19 5877185
2 58896364 (+) 0 7-58773878 LE 0 ND chr19 58897806 chr19 5877185 hsa-mιr-518b 58897884 (+) 0 7-58773878 LE 0 ND hsa-mιr-526a- chr19 58901322 chr19 5877185
1 58901400 (+) 0 7-58773878 LE 0 ND hsa-mιr-520c- chr19 58902524 ch r19 5877185
1 58902603 (+) 0 7-58773878 LE 0 ND chr19 58903814 chr19 5877185 hsa-mιr-518c 58903894 (+) 0 7-58773878 LE 0 ND chr19 58906074 chr19 5877185 hsa-mιr-524 58906150 (+) 0 7-58773878 LE 0 ND hsa-mir-517a- ch r19 58907338 chr19 5877185
1 58907418 (+) 0 7-58773878 LE 0 ND
ChM 9 58908418 chr195877185 hsa-mιr-519d 58908497 (+) 0 7-58773878 LE 0 ND ch r19 58911664 ChM 9 5877185 hsa-mιr-521-2 58911744 (+) 0 7-58773878 LE 0 ND
miRNA H3K4 CpG Interven- Proxi- Con- H3K27 name position score TSS position me3 lsland ing Sites GENE/EST mal served Oct4 Suz12 me3 chr19 58915166 chrl 9 5877185 hsa-mir-520d 58915246 (+) 0 7-58773878 LE 0 ND hsa-mιr-517a- chr19 58916137 chr19 5877185
2 58916215 (+) 0 7-58773878 LE 0 ND
Ch r19 58917236 chrl 9 5877185 hsa-mιr-520g 58917319 (+) 0 7-58773878 LE 0 ND hsa-mir-516b- chrl 9 58920512 ch Ii 9 5877185
1 58920590 (+) 0 7-58773878 LE 0 ND hsa-mιr-526a- chr! 9 58921984 chrt 9 5877185
2 58922058 (+) 0 7-58773878 LE 0 ND ch r19 58924909 chrl 9 5877185 hsa-mιr-518e 58924987 (+) 0 7-58773878 LE 0 ND hsa-mιr-518a- chrl 9 58926074 chrl 9 5877185
1 58926153 (+) 0 7-58773878 LE 0 ND chr 19 58929948 ch ii 9 5877185 hsa-mιr-518d 58930026 (+) 0 7-58773878 LE 0 ND hsa-mιr-516b- chrl 9 58931916 chrl 9 5877185
2 58931998 (+) 0 7-58773878 LE 0 ND hsa-mir-518a- chr 19 58934403 chrl 9 5877185
2 58934482 (+) 0 7-58773878 LE 0 ND chr 19 58936388 ch ii 9 5877185 hsa-mιr-517c 58936466 (+) 0 7-58773878 LE 0 ND chr 19 58937580 chrl 95877185 hsa-mιr-520h 58937665 (+) 0 7-58773878 LE 0 ND chrl 9 58943706 chrl 9 5877185 hsa-mιr-521-1 58943786 (+) 0 7-58773878 LE 0 ND chr 19 58946282 chrl 9 5877185 hsa-mir-522 58946361 (+) 0 7-58773878 LE 0 ND hsa-mιr-519a- chrl 9 58947467 chrl 9 5877185
1 58947546 (+) 0 7-58773878 LE 0 ND ch r19 58949086 chrl 9 5877185 hsa-mιr-527 58949168 (+) 0 7-58773878 LE 0 ND hsa-mir-516a- chrl 9 58951811 chrl 9 5877185
2 58951894 (+) 0 7-58773878 LE 0 ND
miRNA H3K4 CpG IntervenProxi- Con- H3K27 name position score TSS position me3 Island ing Sites GENE/EST mal served Oct4 Suz12 me3 hsa-mιr-516a- chri 9 58956203 ChM 9 5877185
1 58956286 (+) 0 7-58773878 LE - 0 ND hsa-mιr-519a- chii 9 58957414 ChM 9 5877185
2 58957494 (+) 0 7-58773878 LE - 0 ND chii 9 58982736 chM 9 5898034 hsa-mιr-371 58982814 (+) 9 8-58981893 E - 1 <5kb Oct4 chii 9 58982736 ChM 9 5898189 hsa-mιr-371 58982814 (+) 10 3-58984412 E - 0 <5kb Oct4
Ch r 19 58982951 chM 9 5898034 hsa-mir-372 58983029 (+) 9 8-58981893 E - 1 <5kb Oct4 chr 19 58982951 chM 9 5898189 hsa-mιr-372 58983029 (+) 10 3-58984412 E - 0 <5kb Oct4 chr 19 58983766 chM 9 5898034 hsa-mιr-373 58983846 (+) 9 8-58981893 E - 1 <5kb Oct4 chr 19 58983766 chM 9 5898189 hsa-mιr-373 58983846 (+) 10 3-58984412 E - 0 <5kb Oct4 ch r20 3846141- chr20 3817385- Ul
O hsa-mιr-103-2 3846220 (+) 20 3817585 EBLT CpG 0 GENIC ch r20 32517809 chr20 3241464 hsa-mιr-644 32517879 (+) 19 4-32414844 TELB CpG 1 GENIC chr20 33041862 chr20 3302676 hsa-mιr-499 33041940 (+) 20 6-33026966 known - 0 GENIC chr2048635738 chr204859661 hsa-mιr-645 48635818 (+) 0 6-48596815 T - 0 - ND chr20 56826066 chr20 5686288 hsa-mιr-296 56826141 (-) -2 1-56864918 ET CpG 2 Suz12 K27 chr20 56826676 chr20 5686288 hsa-mir-298 56826763 (-) -2 1-56864918 ET CpG 2 Suz12 K27
EST DC342929 ch r20 58316931 ch r20 5814683 ,DA922484,DA hsa-mιr-646 58317015 (+) 10 1-58147031 TL - 0 504866 ND chr20 60561954 chr20 6055800 hsa-mιr-1-1 60562034 (+) 18 4-60558204 known CpG 2 GENIC hsa-mir-133a- chr20 60572576 chr20 6055800
2 60572653 (+) 18 4-60558204 known CDG 2 GENIC
miRNA H3K4 CpG IntervenProxi an- H3K27 name position score TSS position me3 Island ing Sites GENE/EST mmaall served Oct4 Suz12 me3 chr2061280303 ch r206127670 hsa-mιr-124-3 61280378 (+) 9 3-61277978 E CpG 1 <5kb Suz12 K27 chr20 61280303 chr20 6127841 hsa-mιr-124-3 61280378 {+) 10 8-61280978 ELB CpG 0 <5kb Suz12 K27 chr20 62044425 chr20 6205811 hsa-mιr-647 62C44518 (-) 20 2-62058312 BELT CpG 0 GENIC chr21 16833282 chr21 1636461 hsa-mιr-99a 16833360 (+) 17 2-16364812 known - 3 GENlC chr21 16833282 chr21 1671355 hsa-mιr-99a 16833360 (+) 8 9-16713759 L - 2 EST AA093075 ND chr21 16833282 chr21 1683140 hsa-mιr-99a 16833360 (+) 20 2-16831602 L - 0 EST DB256041 <5kb chr21 16834019 chr21 1636461 hsa-let-7c 16834105 (+) 17 2-16364812 known - 3 GENIC chr21 16834019 chr21 1671355 hsa-let-7c 16834105 (+) 8 9-16713759 L - 2 EST AA093075 ND chr21 16834019 chr21 1683140 'Jy hsa-iet-7c 16834105 (+) 20 2-16831602 L - 0 EST DB256041 <5kb hsa-mιr-125b- chr21 16884434 chr21 1636461
2 16884514 (+) 17 2-16364812 known - 3 GENIC hsa-mιr-125b- chr21 16884434 chr21 1671355
2 16884514 (+) 7 9-16713759 L 3 EST AA093075 ND hsa-mιr-125b- chr21 16884434 chr21 1683140
2 16884514 (+) 9 2-16831602 L - 1 EST DB256041
EST DB502959 hsa-mιr-125b- chr21 16884434 chr21 1688277 .DB485387.DB
2 16884514 (+) 20 8-16882978 LE - 0 504610 <5kb chr21 25868156 chr21 2585618 EST CR998853 hsa-mιr-155 25868236 (+) 9 6-25856386 TEB CpG 1 .CD638999 ND chr21 25868156 chr21 2586762 hsa-mir-155 25868236 (+) 10 0-25868098 B 0 <5kb chr21 36014889 chr21 3582255 hsa-mιr-802 36014969 (+) 0 1-35823995 BT - 0 Oct4 ND ch r22 16838193 chr22 1685855 hsa-mιr-648 16838276 (-) 10 0-16858750 ELBT 0 EST DB077645 Oct4 ND
mi RNA H3K4 CpG Interven- Proxi- Con- H3K27 name position score TSS position me3 lsland ing Sites GENE/EST mal served Oct4 Suz12 me3 chr22 16838193 chr22 1688176 hsa-rnιr-648 16838276 (-) 0-16881960 EBLT 2 EST DA531335 Suz12 ND Ch r22 18395220 chr22 1834809 hsa-mιr-185 18395295 (+) 1-18348291 TBEL 2 EST DA523208 ND EST BX345246, BX376796.DC3 chr22 18395220 chr22 1837899 06770, DC3033 hsa-mιr-185 18395295 (+) 0-18379190 ETLB 1 00.DC325386 K27 ch r22 18395220 chr22 1838308 hsa-mιr-185 18395295 (+) 20 4-18383284 ELBT 0 GENIC K27 Ch r22 19713024 chr22 1972226 hsa-mιr-649 19713106 (-) 10 0-19722460 T 0 EST AL701929 ND chr22 20331824 chr22 2033208 hsa-mιr-301b 20331901 (+) 20 4-20332284 EBT 0 EST DV080645 <5kb chr22 20332151 chr22 2033208 hsa-mιr-130b 20332226 (+) 20 4-20332284 EBT 0 EST DV080645 <5kb EST CD691322 Ul K> ,CD694912,BG chr22 21489829 chr22 2148603 398521 , BG755 hsa-mιr-650 21489903 (+) 10 5-21486235 0 301 ND chr2236564784 chr22 3656472 hsa-mιr-658 36564873 (-) 30 0-36564920 TELB 0 GENIC <5kb Ch r22 36568190 chr22 3656834 non-overlap hsa-mιr-659 36568270 (-) 6-36571735 ELBT 0 gene EIF3S6IP <5kb chr2240621443 chr224055350 hsa-mιr-33a 40621523 (+) 19 5-40553705 ELBT 1 GENIC K27 chr22 44829141 chr224480453 hsa-let-7a-3 44829230 (+) 2-44805753 0 chr2244830080 chr22 4480453 hsa-let-7b 44830175 (+) 2-44805753 0 chrX 7904747- chrX 7704222- non-overlap hsa-mιr-651 7904826 (+) -10 7706312 LE 0 gene PNPLA4 Oct4 K27 chrX 45361852- chrX 45369307 hsa-mιr-221 45361934 (-) 45369795 ND
miRNA H3K4 CpG Interven- Proxi- Con- H3K27 name position score TSS position me3 lsiand ing Sites GENE/EST mal served Oct4 Suz12 me3 chrX 45362686- chrX 45369307 hsa-mir-222 45362763 (-) 0 45369795 B 0 ND chrX 49470799- chrX 49390161 hsa-mιr-532 49470877 (+) 20 49390361 ELB 0 GENIC - - ND chrX 49471149- chrX 49390161 hsa-mir-188 49471228 (+) 20 49390361 ELB 0 GENIC - - ND chrX 49476077- chrX 49390161 hsa-mιr-500 49476157 (+) 20 49390361 ELB 0 GENIC - - ND chrX 49476602- chrX 49390161 hsa-mιr-362 49476680 (+) 20 49390361 ELB 0 GENIC - - ND chrX 49477369- chrX 49390161 hsa-mιr-501 49477447 (+) 20 49390361 ELB 0 GENIC - - ND chrX 49480890- chrX 49390161 hsa-mir-660 49480967 (+) 20 49390361 ELB 0 GENIC - - ND chrX 49482247- chrX 49390161 hsa-mιr-502 49482324 (+) 20 49390361 ELB 0 GENIC - - ND chrX 53466213- chrX 53563972 Ul hsa-mir-98 53466312 (-) 18 53564172 known 2 GENIC - - ND chrX 53467168- chrX 53563972 hsa-let-7f-2 53467259 (-) 18 53564172 known 2 GENIC - - ND chrX 65021749- chrX 64825044 hsa-mir-223 65021831 (+) 0 64826223 ELB 0 ND chrX 73221239- chrX 73292570 hsa-mir-421 73221314 (-) 0 73297231 EBL 0 K27 chrX 73221403- chrX 73292570 hsa-mιr-374b 73221474 (-) 0 73297231 EBL 0 K27 chrX 73289972- chrX 73292570 hsa-mir-545 73290053 (-) 10 73297231 EBL 0 <5kb - - K27 chrX 73290141- chrX 73292570 hsa-mιr-374a 73290212 (-) 10 73297231 EBL 0 <5kb - - K27 chrX 75922388- hsa-mιr-384 75922470 (-) - chrX 76008529- hsa-mιr-325 76008608 (-) -
mi RNA H3K4 CpG Interven- Proxi- Con- H3K27 name position score TSS position me3 lsland ing Sites GENE/EST mal served Oct4 Suz12 me3 chrX 84964779- chrX 85108611 hsa-mιr-361 84964862 (-) 20 85108811 ELB 0 GENIC chrX 10910471
2-109104792 chrX 10905238 hsa-mιr-652 (+) 20 7-109052587 BEL 0 GENIC K27 chrX 11388099
7-113881098 chrX 11364143 hsa-mιr-448 (+> 17 0-113641630 E 3 GENlC Suz12 K27 chrX 12242149 chrX 12245056 hsa-mir-220 4-122421578 (-) 0 2-122451250 L 0 - ND chrX 13302892 chrX 13302938 hsa-mιr-363 2-133029006 (-) 10 7-133034128 ELB 0 <5kb Oct4 K27 chrX 13302908 chrX 13302938 hsa-mιr-92a-2 4-133029164 (-) 10 7-133034128 ELB 0 <5kb Oct4 K27 chrX 13302922 chrX 13302938 hsa-mιr-19b-2 3-133029308 (-) 10 7-133034128 ELB 0 <5kb Oct4 K27 chrX 13302935 chrX 13302938 Ul
4- hsa-mιr-20b 3-133029432 (-) 10 7-133034128 ELB 0 <5kb Oct4 K27 chrX 13302958 chrX 13302938 hsa-mιr-18b 2-133029666 (-) 10 7-133034128 ELB 0 <5kb Oct4 K27 chrX 13302974 chrX 13302938 hsa-mir-106a 8-133029826 (-) 10 7-133034128 ELB 0 <5kb Oct4 K27 chrX 13339973 chrX 13340369 hsa-mιr-450b 5-133399812 (-) 0 0-133406617 EL 0 - K27 hsa-mιr-450a- chrX 13339989 chrX 13340369
1 8-133399974 (-) 0 0-133406617 EL 0 - K27 hsa-mιr-450a- chrX 13340006 chrX 13340369
2 9-133400145 (-) 0 0-133406617 EL 0 - K27 chrX 13340090 chrX 13340369 hsa-mir-542 4-133400982 (-) 10 0-133406617 EL 0 <5kb - K27 chrX 13340587 chrX 13340369 hsa-mιr-503 1-133405953 (-) 10 0-133406617 EL 0 <5kb - K27 hsa-mir- chrX 13340618 chrX 13340369
424/322 3-133406261 (-) 10 0-133406617 EL 0 <5kb K27
miRNA H3K4 CpG IntervenProxiConH3K27 name position score TSS position me3 Island ing Sites GENE/EST mal served Oct4 Suz12 me3 chrX 13747539 chrX 13754693 hsa-mιr-504 4-137475472 (-) 20 5-137547135 known - 0 GENIC K27 chrX 13873182 chrX 13873786 hsa-mιr-505 9-138731905 (-) 8-138742220 LEB 0 chrX 14478133 hsa-mιr-890 9-144781415 (-) - chrX 14478184 hsa-mιr-888 8-144781924 (-) - chrX 14478373 hsa-mιr-892a 3-144783807 (-) - chrX 14478426 hsa-mιr-892b 2-144784338 (-) - chrX 14478811 hsa-mιr-891b 7-144788195 (-) - chrX 14481485 hsa-mιr-891a 8-144814936 (-) - chrX 14600055 'Jl 'Jl hsa-mιr-513-2 2-146000629 (-) - chrX 14601291 hsa-mir-513-1 4-146012991 (-) - chrX 14601780 hsa-mir-506 7-146017883 (-) - chrX 14601805 hsa-mir-507 6-146018133 (-) - chrX 14602399 hsa-mιr-508 9-146024076 (-) - chrX 14604582 hsa-mir-509-1 9-146045904 (-) - chrX 14604671 hsa-mιr-509-2 6-146046790 (-) - chrX 14604760 hsa-mir-509-3 4-146047679 (-) - chrX 14605939 hsa-mιr-510 6-146059473 (-) -
miRNA H3K4 CpG Interven- Proxi- Con- H3K27 name position score TSS position me3 Islaπd ing Sites GENE/EST mal served Oct4 Suz12 me3 chrX 14606631 hsa-mιr-514-1 9-146066398 {-) - chrX 14606901 hsa-mιr-514-2 0-146069089 (-) - chrX 14607170 hsa-mir-514-3 8-146071787 (-) - chrX 15079761 chrX 15081361 hsa-mιr-224 3-150797701 (-) 20 9-150813819 EL 0 GENIC 2159 chrX 15079866 chrX 15081361 hsa-mir-452 7-150798749 (-) 20 9-150813819 EL 0 GENIC 2159 chrX 15123125 chrX 15129029 hsa-mir-105-1 8-151231337 (-) 20 8-151290498 BE 0 GENIC 1249 chrX 15123345 chrX 15129029 hsa-mιr-105-2 1-151233530 (-) 20 8-151290498 BE 0 GENIC 1249
Ul
Claims
1. A method of identifying a genomic region containing a high probability transcriptional start site for a microRNA (miRNA) gene, the method comprising:
(a) identifying a genomic region comprising a candidate transcriptional start site for an miRNA gene based at least in part on enrichment for histone H3 trimethylated at its lysine residue (H3K4me3) within such region; and
(b) assigning a score to said region based at least in part on (i) its proximity to one or more annotated mature miRNA sequences, (ii) expressed sequence tag (EST) data, and/or (iii) conservation of the region between multiple species, wherein the following factors, if present, contribute positively to the score: (I) proximity of the region to one or more annotated mature miRNA sequences, (II) identification of the region as containing the start site of a known transcript that spans a miRNA or of an EST that spans a miRNA, and (III) conservation of the region between multiple mammalian species; and the following factors, if present, contribute negatively to the score: (IV) if the H3K4me3 enriched region is assignable instead to a transcript or EST that does not overlap the miRNA; (V) intervening H3K4me3 sites between the region and the miRNA; and
(c) identifying the genomic region as containing a high probability transcriptional start site for a microRNA (miRNA) gene based at least in part on the score.
2. The method of claim 1, wherein the region is between 100 base pairs (bp) and 10 kilobases (10 kB) in length.
3. The method of claim 1 , wherein the length of the region is between 100 base pairs (bp) and 5 kilobases (5 kB) in length.
4. The method of claim 1, wherein the length of the region is between 100 base pairs (bp) and 1 kilobase (1 kB) in length.
5. The method of claim 1, wherein the method comprises: (i) identifying a plurality of genomic regions containing candidate transcriptional start sites for miRNA genes from the genomes of at least two cell types of different cell lineages; and (ii) identifying genomic regions that are conserved between the at least two cell types, wherein such conservation indicates an increased likelihood that the genomic region comprises a transcriptional start site.
6. The method of claim 1, wherein the method comprises: (i) identifying a plurality of genomic regions containing candidate transcriptional start sites for miRNA genes from the genomes of at least two different differentiated cell types; and (ii) identifying genomic regions that are conserved between the at least two cell types, wherein such conservation indicates an increased likelihood that the genomic region comprises a transcriptional start site.
7. The method of claim 1, wherein the method comprises: (i) identifying a plurality of genomic regions containing candidate transcriptional start sites for miRNA genes from the genomes of cells derived from each at least two different mammalian species; and (ii) identifying genomic regions that are conserved between the cells derived from each at least two different mammalian species, wherein such conservation indicates an increased likelihood that the genomic region comprises a transcriptional start site.
8. The method of claim 7, wherein the cells from the at least two different mammalian organisms are of the same cell type or lineage.
9. The method of claim 7, wherein the cells are from mouse and human.
10. A computer-readable medium having instructions stored thereon for performing at least step (b) or step (c) of the method of claim 1 when provided with suitable data.
1 1. A computer-readable medium having information stored thereon, wherein the information describes a plurality of regions comprising high probability miRNA gene transcriptional start sites, wherein said information describes regions comprising high probability mammalian miRNA gene transcriptional start sites for at least 100 miRNA genes or at least 75% of the miRNA genes in a selected mammalian species.
12. The computer-readable medium of claim 11 , wherein the high probability miRNA gene transcriptional start sites are identified by a method comprising steps of:
(a) identifying a genomic region comprising a candidate transcriptional start site for an miRNA gene based at least in part on enrichment for histone H3 trimethylated at its lysine residue (H3K4me3) within such region; and
(b) assigning a score to said region based at least in part on (i) its proximity to one or more annotated mature miRNA sequences, (ii) expressed sequence tag (EST) data, and/or (iii) conservation of the region between multiple species, wherein the following factors, if present, contribute positively to the score: (I) proximity of the region to one or more annotated mature miRNA sequences, (II) identification of the region as containing the start site of a known transcript that spans a miRNA or of an EST that spans a miRNA, and (III) conservation of the region between multiple mammalian species; and the following factors, if present, contribute negatively to the score: (IV) if the H3K4me3 enriched region is assignable instead to a transcript or EST that does not overlap the miRNA; (V) intervening H3K4me3 sites between the region and the miRNA;
(c) identifying the region as containing a high probability transcriptional start site for a microRNA (miRNA) gene based at least in part on the score.
13. The computer-readable medium of claim 11, wherein the miRNA transcriptional start sites are high probability mammalian miRNA gene transcriptional start sites.
14. A method comprising steps of: (i) electronically accessing the computer-readable medium of claim 11 ; and (ii) extracting or analyzing information therefrom.
15. A computer-readable medium having information stored thereon, wherein the information describes a regulatory network comprising relationships between one or more key ES cell transcription factors, at least 20 ES cell transcription factor target genes, and at least some targets of the ES cell transcription factor target genes, wherein the ES cell transcription factor target genes include at least some genes that encode proteins and at least some genes that encode rm'RNAs.
16. The computer-readable medium of claim 15, wherein the key ES cell transcription factors are selected from: Oct4, Nanog, Sox2, and TcG.
17. The computer-readable medium of claim 15, wherein the key ES cell transcription factors are Oct4, Nanog, Sox2, and Tcf3.
18. The computer-readable medium of claim 15, further comprising information describing relationships between Polyconib group proteins and at least some of the key ES cell transcription factor target genes.
19. A method comprising steps of: (i) electronically accessing the computer-readable medium of claim 15; and (ii) extracting or analyzing information therefrom.
20. An isolated nucleic acid comprising a region comprising a high probability transcriptional start site for a mammalian miRNA gene.
21. The isolated nucleic acid of claim 20, wherein the region is identified according to a method comprising steps of:
(a) identifying a genomic region comprising a candidate transcriptional start site for an miRNA gene based at least in part on enrichment for histone H3 trimethylated at its lysine residue (H3K4me3) within such region; and
(b) assigning a score to said region based at least in part on (i) its proximity to one or more annotated mature miRNA sequences, (ii) expressed sequence tag (EST) data, and/or (iii) conservation of the region between multiple species, wherein the following factors, if present, contribute positively to the score: (I) proximity of the region to one or more annotated mature miRNA sequences, (II) identification of the region as containing the start site of a known transcript that spans a miRNA or of an EST that spans a miRNA, and (III) conservation of the region between multiple mammalian species; and the following factors, if present, contribute negatively to the score: (IV) if the PI3K4me3 enriched region is assignable instead to a transcript or EST that does not overlap the miRNA; (V) intervening H3K4me3 sites between the region and the miRNA; (c) identifying the region as containing a high probability transcriptional start site for a microRNA (miRNA) gene based at least in part on the score.
22. The isolated nucleic acid of claim 20, wherein the region comprises or consists of a transcription start site (TSS) listed in Table S6 or S7 and wherein, optionally, the isolated nucleic acid comprises no more than 1 kB, 2 kB, 5 kB, 8 kB, or 10 kB of genomic sequence on the 5' side, the 3' side, or both sides of the TSS.
23. The isolated nucleic acid of claim 20, wherein the region comprises at least 50 continuous nucleic acids of a transcription start site (TSS) listed in Table S6 or S7 and wherein, optionally, the isolated nucleic acid comprises no more than 1 kB, 2 kB, 5 kB, 8 kB, or 10 kB of genomic sequence on the 5' side, the 3' side, or both sides of the TSS.
24. The isolated nucleic acid of claim 20, further comprising a miRNA sequence.
25. A composition comprising the isolated nucleic acid of claim 20 and a transcription factor, wherein the transcription factor is one that binds to the region in at least some cell types.
26. A nucleic acid construct comprising the isolated nucleic acid of claim 20.
27. The nucleic acid construct of claim 26, wherein the isolated nucleic acid comprises a promoter and the construct comprises a heterologous nucleic acid operably linked to the promoter.
28. The nucleic acid construct of claim 26, wherein the isolated nucleic acid comprises a promoter and the construct comprises a sequence encoding a polypeptide or microRNA.
29. The nucleic acid construct of claim 28, wherein the polypeptide is a reporter polypeptide of use to detect and/or quantify expression from the promoter.
30. The nucleic acid construct of claim 29, wherein the reporter polypeptide comprises a fluorescent protein.
31. A host cell or transgenic non-human mammal containing the nucleic acid construct of claim 26.
32. A method of identifying an agent with potential to modulate expression of a miRNA, the method comprising: (i) providing a nucleic acid construct comprising a miRNA promoter operably linked to a heterologous nucleic acid; and (ii) determining whether a test agent affects expression of the heterologous nucleic acid, wherein if the test agent affects expression of the heterologous nucleic acid, the test agent is identified as an agent with potential to modulate expression of the miRNA.
33. The method of claim 32, wherein the heterologous nucleic acid encodes a reporter protein.
34. The method of claim 32, wherein the nucleic acid construct is in a cell and the method comprises contacting the cell with the test agent.
35. The method of claim 32, wherein the miRNA is listed in Table S6 or Table S7.
36. The method of claim 32, further comprising: (iii) contacting cells with the agent; (iv) measuring expression of the miRNA or of a target gene of the miRNA; and (v) determining whether contacting the cells with the agent alters expression of the miRNA or miRNA target gene relative to expression that would be expected in the absence of the agent.
37. A method of identifying a miRNA that acts as a determinant of cell fate decisions, wherein the miRNA is one that is selectively expressed in cells of one or more differentiated cell types or lineages, the method comprising determining whether the promoter of the miRNA is repressed by a Polycomb group protein in ES and/or iPS cells, wherein if the promoter of the miRNA is repressed by a Polycomb group protein in ES and/or iPS cells, the miRNA is identified as a determinant of cell fate decisions.
38. The method of claim 37, wherein determining whether the promoter is repressed by a Polycomb group protein comprises determining whether the promoter is bound by a Polycomb group protein.
39. The method of claim 37, wherein the miRNA is listed in Table S6 or S7.
40. A method of identifying a polymorphism or mutation in a mammalian species, the method comprising; (i) obtaining the sequence of a genomic region containing a miRNA promoter in a plurality of individuals of the species; and (ii) determining whether the sequence of the region varies between within the region, wherein variations within the sequence define polymorphisms or mutations.
41. The method of claim 40, wherein the miRNA is listed in Table S6 or S7.
42. A method of identifying a polymorphism or mutation associated with increased or decreased risk of developing a disease, the method comprising: (i) analyzing the sequence of a genomic region containing a miRNA promoter in a plurality of individuals with the disease; and (ii) determining whether a correlation exists between the presence of particular polymorphic variant(s) or mutation(s) within the region in individuals and presence of the disease.
43. The method of claim 42, wherein the disease is associated with aberrant (e.g., increased or decreased) miRNA expression.
44. The method of claim 42, wherein the disease is cancer.
45. The method of claim 42, wherein the miRNA is listed in Table S6 or Table S7.
46. The method of claim 42, wherein the miRNA promoter is one that is bound by a Polycomb group protein in ES and/or iPS cells.
47. A method of modulating the differentiation of a pluripotent mammalian stem cell, the method comprising: modulating the level or activity of a miRNA in the pluripotent stem cell, wherein the miRNA is encoded by a gene whose promoter is bound by a key embryonic stem (ES) cell transcription factor in ES and/or iPS cells.
48. The method of claim 47, wherein the pluripotent stem cell is an induced pluripotent stem (iPS) cell.
49. The method of claim 47. wherein the method comprises decreasing the level or activity of a miRNA in the cell.
50. The method of claim 49, wherein the method comprises contacting the cell with an oligonucleotide complementary to the miRNA.
51. The method of claim 47. wherein the method comprises increasing the level or activity of a miRNA in the cell.
52. The method of claim 51 , wherein the method comprises introducing the miRNA or a miRNA precursor containing the miRNA into the cell, or expressing the miRNA or a miRNA precursor in the cell.
53. The method of claim 47, wherein the method comprises modulating the binding of a transcription factor to the promoter of the gene that encodes the miRNA.
54. The method of claim 47, wherein the miRNA is one whose promoter is bound by a Polycomb group protein in ES and/or iPS cells.
55. The method of claim 47, further comprising administering the cell to an individual.
56. The method of claim 47, wherein the pluripotent stem cell is a human cell.
57. A mammalian cell, wherein the differentiation state of the cell has been modulated according to the method of claim 47.
58. A method of treating an individual comprising: administering the cell of claim 57 to the individual.
59. The method of claim 58, wherein the method comprises (i) obtaining a cell from an individual; (ii) reprogramming the cell in vitro; and (iii) administering the cell to the individual.
60. A method of modulating the in vitro reprogramming of a differentiated mammalian somatic cell, the method comprising: modulating the level or activity of a miRNA in the differentiated mammalian somatic cell, wherein the miRNA is encoded by a gene whose promoter is bound by a key embryonic stem (ES) cell transcription factor in ES and/or iPS cells.
61. The method of claim 60, wherein the method comprises reprogramming the somatic cell to a pluripotent state.
62. The method of claim 60, wherein the method comprises reprogramming the somatic cell to a pluripotent state and then differentiating the reprogrammed pluripotent cell to a desired cell type or lineage.
63. The method of claim 60, wherein the method comprises reprogramming the somatic cell from a first at least partially differentiated state to a second at least partially differentiated state.
64. The method of claim 60, wherein the method comprises reprogramming the somatic cell from a first cell type to a second cell type, wherein the first and second cell types are in different cell lineages.
65. The method of claim 60, wherein the method comprises decreasing the level or activity of a miRNA in the cell.
66. The method of claim 60, wherein the method comprises contacting the cell with an oligonucleotide complementary to the miRNA.
67. The method of claim 60, wherein the method comprises increasing the level or activity of a miRNA in the cell.
68. The method of claim 60, wherein the method comprises introducing the miRNA or a miRNA precursor containing the miRNA into the cell, or expressing the miRNA or a miRNA precursor in the cell.
69. The method of claim 60, wherein the method comprises modulating the binding of a transcription factor to the promoter of the gene that encodes the miRNA.
70. The method of claim 60, wherein the miRNA is one whose promoter is bound by a Polycomb group protein in ES and/or iPS cells.
71. The method of claim 60, wherein the somatic cell is a human cell.
72. A reprogrammed mammalian somatic cell, wherein the in vitro reprogramming of the cell has been modulated according to the method of claim 60.
73. A method of treating an individual comprising: administering the cell of claim 72 to the individual.
74. The method of claim 73, wherein the method comprises (i) obtaining a cell from an individual; (ii) reprogramming the cell in vitro; and (iii) administering the cell to the individual.
75. A method of modulating the differentiation state of a mammalian somatic cell, the method comprising: modulating the level or activity of a miRNA in the mammalian somatic cell, wherein the miRNA is one that is expressed in a cell type or cell lineage specific manner and is encoded by a gene whose promoter is bound by a Polycomb group protein in ES and/or iPS cells.
76. The method of claim 75, wherein the somatic cell is a human cell.
77. The method of claim 75, wherein the method comprises decreasing the level or activity of a miRNA in the cell.
78. The method of claim 75, wherein the method comprises contacting the cell with an oligonucleotide complementary to the miRNA.
79. The method of claim 75, wherein the method comprises increasing the level or activity of a miRNA in the cell.
80. The method of claim 75, wherein the method comprises introducing the miRNA or a miRNA precursor containing the miRNA into the cell, or expressing the miRNA or a miRNA precursor in the cell.
81. The method of claim 75, wherein the method comprises modulating the binding of a transcription factor to the promoter of the gene that encodes the miRNA.
82. A mammalian somatic cell, wherein the differentiation state of the cell has been modulated according to the method of claim 75.
83. A method of treating an individual comprising: administering the cell of claim 82 to the individual.
84. The method of claim 83, wherein the method comprises (i) obtaining a cell from an individual; (ii) modulating the differentiation state of the cell in vitro; and (iii) administering the cell to the individual.
85. The method of claim 83, wherein the modulating promotes differentiation of the cell to a desired cell type or lineage.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US18821108P | 2008-08-07 | 2008-08-07 | |
| US61/188,211 | 2008-08-07 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2010017518A2 true WO2010017518A2 (en) | 2010-02-11 |
| WO2010017518A3 WO2010017518A3 (en) | 2010-06-03 |
Family
ID=41664224
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2009/053214 Ceased WO2010017518A2 (en) | 2008-08-07 | 2009-08-07 | Connecting microrna genes to the core transcriptional regulatory circuitry of embryonic stem cells |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2010017518A2 (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103484544A (en) * | 2013-09-17 | 2014-01-01 | 遵义医学院 | Method for detecting miRNA promoter activity |
| CN107358062A (en) * | 2017-06-02 | 2017-11-17 | 西安电子科技大学 | A kind of construction method of double-deck gene regulatory network |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2007030678A2 (en) * | 2005-09-07 | 2007-03-15 | Whitehead Institute For Biomedical Research | Methods of genome-wide location analysis in stem cells |
-
2009
- 2009-08-07 WO PCT/US2009/053214 patent/WO2010017518A2/en not_active Ceased
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103484544A (en) * | 2013-09-17 | 2014-01-01 | 遵义医学院 | Method for detecting miRNA promoter activity |
| CN107358062A (en) * | 2017-06-02 | 2017-11-17 | 西安电子科技大学 | A kind of construction method of double-deck gene regulatory network |
| CN107358062B (en) * | 2017-06-02 | 2020-05-22 | 西安电子科技大学 | A method for constructing a two-layer gene regulatory network |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2010017518A3 (en) | 2010-06-03 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Guo et al. | Distinct processing of lncRNAs contributes to non-conserved functions in stem cells | |
| Cesarini et al. | ADAR2/miR-589-3p axis controls glioblastoma cell migration/invasion | |
| Bar et al. | MicroRNA discovery and profiling in human embryonic stem cells by deep sequencing of small RNA libraries | |
| Caron et al. | A human pluripotent stem cell model of facioscapulohumeral muscular dystrophy-affected skeletal muscles | |
| Marson et al. | Connecting microRNA genes to the core transcriptional regulatory circuitry of embryonic stem cells | |
| Dykes et al. | Transcriptional and post-transcriptional gene regulation by long non-coding RNA | |
| Aprea et al. | Long non‐coding RNA s in corticogenesis: Deciphering the non‐coding code of the brain | |
| Stadler et al. | Characterization of microRNAs involved in embryonic stem cell states | |
| Hammoud et al. | Chromatin and transcription transitions of mammalian adult germline stem cells and spermatogenesis | |
| Wu et al. | Integrative transcriptome sequencing identifies trans-splicing events with important roles in human embryonic stem cell pluripotency | |
| Flynn et al. | Long noncoding RNAs in cell-fate programming and reprogramming | |
| CN103459611B (en) | The functional genomics research that effectiveness and the safety of pluripotent stem cell are characterized | |
| Salomonis et al. | Alternative splicing regulates mouse embryonic stem cell pluripotency and differentiation | |
| Lim et al. | Epigenetic modulation of the miR-200 family is associated with transition to a breast cancer stem-cell-like state | |
| Creighton et al. | Discovery of novel microRNAs in female reproductive tract using next generation sequencing | |
| Liao et al. | Matched miRNA and mRNA signatures from an hESC-based in vitro model of pancreatic differentiation reveal novel regulatory interactions | |
| Ikeda et al. | Srf destabilizes cellular identity by suppressing cell-type-specific gene expression programs | |
| Hinton et al. | sRNA-seq analysis of human embryonic stem cells and definitive endoderm reveals differentially expressed microRNAs and novel IsomiRs with distinct targets | |
| Zhang et al. | LINE1 and PRC2 control nucleolar organization and repression of the 8C state in human ESCs | |
| Chen et al. | Altered microRNA and Piwi-interacting RNA profiles in cumulus cells from patients with diminished ovarian reserve | |
| Rajan et al. | Analysis of early C2C12 myogenesis identifies stably and differentially expressed transcriptional regulators whose knock-down inhibits myoblast differentiation | |
| Torre et al. | Nuclear RNA catabolism controls endogenous retroviruses, gene expression asymmetry, and dedifferentiation | |
| Yoshida et al. | MicroRNA-140 mediates RB tumor suppressor function to control stem cell-like activity through interleukin-6 | |
| Yu et al. | Long non-coding RNA LHX1-DT regulates cardiomyocyte differentiation through H2A. Z-mediated LHX1 transcriptional activation | |
| WO2010017518A2 (en) | Connecting microrna genes to the core transcriptional regulatory circuitry of embryonic stem cells |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 09805631 Country of ref document: EP Kind code of ref document: A2 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 09805631 Country of ref document: EP Kind code of ref document: A2 |