WO2025066245A1 - Method for detecting chromatin accessibility or dna-binding protein footprints in cells - Google Patents
Method for detecting chromatin accessibility or dna-binding protein footprints in cells Download PDFInfo
- Publication number
- WO2025066245A1 WO2025066245A1 PCT/CN2024/096659 CN2024096659W WO2025066245A1 WO 2025066245 A1 WO2025066245 A1 WO 2025066245A1 CN 2024096659 W CN2024096659 W CN 2024096659W WO 2025066245 A1 WO2025066245 A1 WO 2025066245A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- dna
- cell
- deaminase
- cells
- genomic dna
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6804—Nucleic acid analysis using immunogens
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/68—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
Definitions
- the present invention belongs to the field of biotechnology. Specifically, the present invention relates to a method for detecting chromatin accessibility or DNA binding protein footprints in cells. More specifically, the present invention relates to a method for converting the chromatin accessibility information into the mutation information of the DNA sequence based on the deamination of DNA deaminase on the DNA in the chromatin open region, thereby realizing a method for quantitatively measuring the chromatin accessibility of a single DNA molecule and the binding footprints of DNA binding proteins such as transcription factors at the whole genome level.
- the present invention also relates to a method for detecting DNA binding protein footprints in the open region of the cell genome based on deaminase and transposase, which can enrich the chromatin open region of the cell genome and detect the DNA binding protein footprints in the chromatin open region.
- the DNA of eukaryotic organisms forms a complex with histones, and about 147bp of DNA is coiled around the histone core particle 1.65 times to form a nucleosome structure.
- the nucleosome and the connecting region together form a beaded structure, which is further folded and supercoiled to form a dense chromatin structure, which is wrapped by the nuclear membrane and separated from the cytoplasm.
- Chromosome accessibility reflects the degree to which various regulatory factors can physically contact genomic DNA, which is affected by the nucleosome arrangement, the spatial structure of the surrounding chromatin, and the promotion or inhibition of chromatin structure by different chromatin binding factors.
- Transcription factors dynamically compete with histones or other chromatin binding proteins to regulate physiological processes such as gene transcription, thereby showing the characteristics of different cells. Therefore, by studying chromatin accessibility, clarifying the state of chromatin in different cells, and establishing a dynamic change map of mutual transformation between different cells from the aspects of transcription factor binding and nucleosome rearrangement, it is very important to further clarify the gene regulation process of cells.
- the methylase used in NOMe-seq can only methylate GpC sequences, so the resolution is poor.
- Fiber-seq has the problem of not being able to amplify directly, which still requires an enrichment step, or uses a very large amount of cells to make up for the problem of not being able to amplify.
- the openness information of chromatin can be converted into mutation information of DNA sequence, so that the chromatin accessibility and DNA binding protein binding footprints such as transcription factors at the level of single DNA molecules can be quantitatively measured ( Figure 1).
- the qDeAC-ATAC-seq method formed by combining qDeAC-seq and ATAC-seq can enrich the open chromatin regions of the cell genome and detect the footprints of DNA binding proteins in the open chromatin regions ( Figure 21).
- Figure 1 Schematic diagram of the qDeAC-seq workflow.
- Figure 19 Effect of adding UGI on sequencing depth at the ATAC center.
- FIG. 21 Schematic diagram of the qDeAC-ATAC-seq process.
- Strategy 1 is to add Tn5 treatment first and then add SsdAtox for deamination.
- Strategy 2 is to add SsdAtox for deamination first and then add Tn5 to attack the open area.
- Fig. 22 Schematic diagram of the qDeAC-ATAC-seq library construction process.
- FIG. 24 Effects of different Tn5 reaction buffers on the efficiency of enriching open zones.
- FIG. 25 Effects of different Tn5 reaction buffers on the footprint depths of different transcription factors in the R1 cell line.
- Fig. 26 qDeAC-ATAC-seq library fragment size distribution.
- Figure 27 Comparison of different TF depth values under different deaminase treatment conditions in R1 cell line.
- FIG. 28 Opening area of strategy 2 (the first group, and the fourth to eighth groups in the figure) and strategy 1 (the second and third groups in the figure) Comparison of enrichment effects.
- the term “and/or” encompasses all combinations of items connected by the term, and each combination should be considered to have been listed separately herein.
- “A and/or B” encompasses “A,” “A and B,” and “B.”
- “A, B, and/or C” encompasses “A,” “B,” “C,” “A and B,” “A and C,” “B and C,” and “A and B and C.”
- the protein or nucleic acid may consist of the sequence, or may have additional amino acids or nucleotides at one or both ends of the protein or nucleic acid, but still have the activity described in the present invention.
- the methionine encoded by the start codon at the N-terminus of the polypeptide may be retained in certain practical situations (for example, when expressed in a specific expression system), but it does not substantially affect the function of the polypeptide.
- Sequence "identity” has a recognized meaning in the art, and the percentage of sequence identity between two nucleic acid or polypeptide molecules or regions can be calculated using published techniques. Sequence identity can be measured along the full length of a polynucleotide or polypeptide or along a region of the molecule.
- Gene as used herein encompasses not only the chromosomal DNA present in the cell nucleus, but also the organellar DNA present in subcellular components of the cell (eg, mitochondria, plastids).
- the present invention provides a method for detecting chromatin accessibility and/or DNA binding protein footprints of a cell genome, the method comprising:
- step b) treating the permeabilized cells or isolated cell nuclei obtained in step a) with a DNA deaminase, thereby allowing the DNA deaminase to fully react with the open regions of the genomic DNA in the cells;
- the The permeabilization buffer should be gentle enough to disrupt or partially disrupt the cell wall/membrane, allowing DNA deaminases to enter the cells, but not significantly disrupt the chromatin structure of the cells.
- the nuclear isolation should also be gentle enough not to significantly disrupt the chromatin structure.
- the permeabilization buffer comprises NP40.
- the permeabilization buffer comprises NP40, Tween-20, and Digitonin. In some preferred embodiments, the permeabilization buffer comprises approximately 0.1% NP40, approximately 0.1% Tween-20, and approximately 0.01% Digitonin.
- the permeabilization buffer further comprises Tris-HCl and/or PIC (protease inhibitor cocktail).
- Deaminase refers to an enzyme that catalyzes a deamination reaction.
- DNA deaminase refers to a deaminase that can accept DNA (single-stranded or double-stranded, especially double-stranded) as a substrate and can catalyze the deamination of cytidine or deoxycytidine to uracil or deoxyuracil, respectively.
- the DNA deaminase can deaminate the C in the open region of genomic DNA to U, so that in the subsequent amplification reaction for sequencing, the C-G of the original genomic DNA is converted to T-A. By detecting the conversion of C-G to T-A, the deamination site can be detected.
- the DNA deaminase described in the present invention can be single-stranded DNA deaminase A (SsdA) or double-stranded DNA deaminase A (DddA) or a functional fragment thereof.
- the DNA deaminase described in the present invention can be from different species, for example, from Burkholderia cenocepacia or Phytophthora syringae.
- telomere length refers to a fragment of a DNA deaminase that substantially retains its deamination activity, such as DddA tox or SsdA tox known in the art.
- the SsdA comprises the amino acid sequence shown in SEQ ID NO: 1 or an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, at least 99.9% sequence identity with SEQ ID NO: 1.
- the functional fragment of SsdA such as SsdA tox, comprises the amino acid sequence shown in SEQ ID NO: 5 or an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, at least 99.9% sequence identity with SEQ ID NO: 5.
- the DddA comprises the amino acid sequence shown in SEQ ID NO:2 or an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, or at least 99.9% sequence identity with SEQ ID NO:2.
- the DNA deaminase is preferably SsdA or a functional fragment thereof.
- the DNA deaminase is recombinantly produced.
- the DNA deaminase also contains a fusion tag, such as a tag for separation and/or purification of the DNA deaminase.
- a fusion tag such as a tag for separation and/or purification of the DNA deaminase.
- Methods for recombinant protein production are known in the art.
- tags that can be used to separate and/or purify proteins are known in the art, including but not limited to His tags, smt3 tags, GST tags, etc. Generally speaking, these tags do not change the activity of the target protein.
- the DNA deaminase comprises a His tag and a smt3 tag.
- the SsdA or its functional fragment comprises an amino acid sequence shown in any one of SEQ ID NOs: 1, 3-5.
- step b) is performed in a reaction system of about 10 ⁇ l to about 100 ⁇ l, preferably 50 ⁇ l.
- the reaction system can be adjusted according to the number of cells or cell nuclei.
- the reaction system of step b) comprises a reaction buffer.
- the reaction buffer can be a Tris-HCl buffer or a HEPES buffer.
- the pH of the reaction buffer is, for example, 7.4-8.0. Preferably, the pH of the reaction buffer is about 7.4.
- the reaction buffer comprises approximately 10 mM to approximately 20 mM, preferably approximately 10 mM Tris-HCl.
- the reaction buffer comprises about 10 mM to about 20 mM, preferably about 10 mM Tris-HCl, and about 0.1% NP40, about 0.1% Tween-20, about 0.01% Digitonin and/or about 0.01% Triton-100. In some embodiments, the reaction buffer comprises about 10 mM to about 20 mM, preferably about 10 mM Tris-HCl, and about 0.01% Digitonin and about 0.1% Tween-20. In some embodiments, the reaction buffer does not comprise additional metal salts, such as NaCl or MgCl 2 .
- the reaction buffer contains no additional reagents other than Tris-HCl. In some embodiments, the reaction buffer consists of about 10 mM to about 20 mM, preferably about 10 mM Tris-HCl and water.
- step b) further comprises adding a uracil glycosylase inhibitor (UGI) to the reaction.
- UGI uracil glycosylase inhibitor
- Adding UGI can inhibit the activity of the enzyme present in the reaction system that may remove U, thereby avoiding the effect caused by the removal of U produced by the deamination reaction.
- Adding UGI can increase the amount of enzyme used in the reaction to shorten the reaction time while ensuring the signal-to-noise ratio.
- UGI that can be used in the method of the present invention can be commercially available.
- the at least one cell is from 1 cell to about 50,000 cells or more, such as about 100, about 1,000, about 10,000, about 50,000 cells or more.
- the cells are treated with about 0.5 U/ ⁇ l to about 50 U/ ⁇ l, preferably about 7.5 U/ ⁇ l of DNA deaminase.
- about 25 U to about 2500 U, preferably about 375 U, of the DNA deaminase is treated for about every 50,000 cells or nuclei.
- about 0.5 U/ ⁇ l to about 50 U/ ⁇ l enzyme is used for about 1000 cells/ ⁇ l.
- the enzyme activity unit U of the DNA deaminase used herein can be determined as shown in the examples herein.
- the enzyme activity unit U is defined as the amount of enzyme required to digest 0.1 ⁇ M double-stranded oligo DNA substrate in a total reaction system of 10 ⁇ l at 37°C within 1 hour.
- the cells or cell nuclei are treated with the DNA deaminase in step b) for about 5 minutes to about 60 minutes, preferably about 10 minutes to about 30 minutes, more preferably about 10 minutes.
- step b) is performed at a temperature of about 4°C to about 40°C, such as about 30°C to about 40°C, preferably about 37°C.
- step b) about 50,000 cells or nuclei from about 50,000 cells are treated with 7.5 U/ ⁇ l of DNA deaminase in a 50 ⁇ l reaction system at about 37°C for about 30 minutes.
- step c) The separation of genomic DNA in step c) can be carried out by conventional methods known in the art, for example, The genomic DNA was extracted using a commercially available kit.
- the method of the present invention can be used to detect the chromatin accessibility and/or DNA binding protein footprints of certain specific regions of the cell genome. In this case, only these specific regions need to be sequenced.
- step d) comprises amplifying and sequencing a specific portion of the genomic DNA to determine the chromatin accessibility and/or DNA binding protein footprint of the specific portion of the genome of the cell.
- the amplification is PCR amplification, such as PCR amplification performed using a U-tolerant DNA polymerase, for example, the U-tolerant DNA polymerase is selected from: HiFi V3 (component in the Novozymes single-stranded library construction kit, NE103), Phusion U (Thermo Scientific F555L), Phanta Uc (vazyme, P507-01), Q5U (NEB, M0515L) and Q6U (Biyuntian, D7239M).
- HiFi V3 component in the Novozymes single-stranded library construction kit, NE103
- Phusion U Thermo Scientific F555L
- Phanta Uc vazyme, P507-01
- Q5U NEB, M0515L
- Q6U Biyuntian, D7239M
- the method of the present invention can also be used to detect the chromatin accessibility and/or DNA binding protein footprints of the cell genome at the whole genome level.
- step d) comprises establishing a whole genome DNA library from the isolated genomic DNA, and performing sequencing based on the whole genome library, thereby determining the chromatin accessibility and/or DNA binding protein footprint of the cell genome at the whole genome level.
- Whole genome DNA libraries can be established using commercial kits, such as Novagen NE103, Swift Biosciences Cat. No. 33096, etc. for single-stranded library construction; Tn5 library construction can be used for double-stranded library construction, such as Novagen TD202 kit; WGBS library construction can use zymo research D5455 & D5456 kit.
- the present inventors surprisingly found that the method of constructing a library using single-stranded DNA can achieve a significantly higher library construction depth than the method of constructing a library using double-stranded DNA.
- a whole genomic DNA library is established from the isolated genomic DNA by single-stranded DNA library construction.
- the single-stranded DNA library was constructed using the DNA Methylation Library Kit for Illumina V3NE103.
- a U-tolerant DNA polymerase is used for DNA amplification in the library establishment, for example, the U-tolerant DNA polymerase is selected from: HiFi V3, Phusion U, Phanta Uc, Q5U and Q6U.
- Sequencing described herein can be first generation sequencing such as Sanger sequencing, second generation sequencing (NGS) or other high throughput sequencing.
- first generation sequencing such as Sanger sequencing, second generation sequencing (NGS) or other high throughput sequencing.
- second generation sequencing of Illumina can be used.
- DNA in the open region of chromatin is more easily converted to U by DNA deaminase, DNA bound by nucleosomes or DNA binding proteins is protected and cannot be deaminated. Therefore, by analyzing the distribution of deamination sites, the openness of chromatin, the arrangement of nucleosomes, and the position information and binding dynamics of DNA binding proteins can be determined.
- the chromatin accessibility and/or DNA binding protein footprint of the genomic DNA or a portion thereof of the cell is determined by analyzing the presence of CG to TA transitions (i.e., C to T and G to A) in the genomic DNA or a portion thereof of the cell relative to a control genomic DNA or a portion thereof.
- the chromatin accessibility and/or DNA binding protein footprint of the genomic DNA or a portion thereof of the cell is determined by analyzing the position and/or density and/or conversion rate of C-G to T-A transitions in the genomic DNA or a portion thereof of the cell relative to a control genomic DNA or a portion thereof.
- the location and/or density of C-G to T-A transitions in the genome indicates the location and/or degree of openness of chromatin regions.
- the sequence of the control genomic DNA or its part can be the sequence of the genomic DNA or its part from a public database. Alternatively, the sequence of the control genomic DNA or its part can also be the sequence of the genomic DNA or its part of the same type of cell that has not been treated with the DNA deaminase.
- the DNA binding protein described herein can be a transcription factor, histone, etc., preferably a transcription factor, such as Klf4, Thap11, SP1, Nrf1, etc.
- the cells described herein can be animal cells, plant cells, or microbial cells.
- the cells are mammalian cells, including but not limited to cells of humans, mice, rats, cats, dogs, pigs, and cattle; or, the cells are monocotyledonous or dicotyledonous plant cells, such as cells of rice, corn, wheat, sorghum, soybean, potato, tomato, etc.; or, the cells are fungi such as yeast cells.
- the cell may be a cell line cell.
- the cell may be a primary cell from a different organ or tissue.
- the cell may be from blood, cerebrospinal fluid, or a biopsy.
- the cell may be a hepatocyte, a cardiomyocyte, a neuron, a fibroblast, an epithelial cell, or the like.
- the cell may also be a tumor cell, a stem cell such as an embryonic stem cell, or an induced pluripotent stem cell.
- the cell is a cell treated with a specific substance (compound) or condition or a cell at a specific developmental stage.
- a specific substance compound
- the method of the present invention can measure the effect of the specific substance (compound) or condition or specific developmental stage on the chromatin accessibility of the cell genome and/or the footprint of DNA binding proteins.
- the specific substance can be a drug.
- the present invention provides a method for detecting binding between a DNA molecule and a DNA binding protein, comprising:
- the DNA deaminase is as defined above.
- the step 2) is performed in a reaction buffer as defined above.
- the DNA binding is determined by analyzing the presence of a CG to TA transition in the DNA molecule or portion thereof relative to a control DNA molecule (e.g., a DNA molecule that has not been treated with a DNA deaminase). Binding of the protein to the DNA molecule.
- a control DNA molecule e.g., a DNA molecule that has not been treated with a DNA deaminase.
- the position or sequence of the DNA molecule to which the DNA binding protein binds is determined by analyzing the position of the C-G to T-A transition in the DNA molecule or a portion thereof relative to a control DNA molecule (e.g., a DNA molecule that has not been treated with a DNA deaminase).
- a control DNA molecule e.g., a DNA molecule that has not been treated with a DNA deaminase.
- the present invention provides a method for detecting DNA binding protein footprints in open regions in a cell genome, the method comprising:
- step b) i) treating the permeabilized cells or isolated cell nuclei obtained in step a) with Tn5 transposase, thereby allowing the Tn5 transposase to fully react with the open regions of the genomic DNA in the cells; treating the permeabilized cells or isolated cell nuclei treated with Tn5 transposase with DNA deaminase, thereby allowing the DNA deaminase to fully react with the open regions of the genomic DNA in the cells; or
- step ii) treating the permeabilized cells or isolated cell nuclei obtained in step a) with DNA deaminase, thereby allowing the DNA deaminase to fully react with the open regions of the genomic DNA in the cells; treating the permeabilized cells or isolated cell nuclei treated with DNA deaminase with Tn5 transposase, thereby allowing the Tn5 transposase to fully react with the open regions of the genomic DNA in the cells;
- the open region is an open chromatin region.
- the open chromatin region herein has a meaning known in the art, and refers to a region where the compact structure of chromatin is opened. These regions allow trans-acting factors to bind to cis-regulatory elements such as promoters, enhancers, insulators, and silencers. This property of allowing trans-acting factors to bind is called the openness/accessibility of chromatin.
- the permeabilization buffer should be mild, which can destroy or partially destroy the cell wall/cell membrane, allowing DNA deaminase and/or Tn5 transposase to enter the cell, but does not significantly destroy the chromatin structure of the cell.
- Cell nuclear isolation should also be mild and not significantly destroy the chromatin structure.
- the permeabilization buffer comprises NP40. In some preferred embodiments, the permeabilization buffer comprises NP40, Tween-20 and Digitonin. In some preferred embodiments, the permeabilization buffer comprises approximately 0.1% NP40, approximately 0.1% Tween-20 and approximately 0.01% Digitonin. In some embodiments, the permeabilization buffer further comprises Tris-HCl and/or PIC (protease inhibitor cocktail).
- the cell nucleus separation is performed using methods known in the art.
- the cell nucleus is separated using a buffer comprising 10 mM Tris-HCl 7.4, 10 mM NaCl, 1% BSA, 0.1% Tween-20, 0.1% NP40, 0.01% Digitonin, 0.1 mM EDTA, 3 mM MgCl2, 1xPIC.
- Deaminase refers to an enzyme that catalyzes a deamination reaction.
- DNA deaminase refers to a deaminase that can accept DNA (single-stranded or double-stranded, especially double-stranded) as a substrate and can catalyze the deamination of cytidine or deoxycytidine to uracil or deoxyuracil, respectively.
- the DNA deaminase can deaminate the C in the open region of genomic DNA to U, so that in the subsequent amplification reaction for sequencing, the C-G of the original genomic DNA is converted to T-A. By detecting the conversion of C-G to T-A, the deamination site can be detected.
- the DNA deaminase described in the present invention can be single-stranded DNA deaminase A (SsdA) or double-stranded DNA deaminase A (DddA) or a functional fragment thereof.
- the DNA deaminase described in the present invention can be from different species, for example, from Burkholderia cenocepacia or Phytophthora syringae.
- telomere length refers to a fragment of a DNA deaminase that substantially retains its deamination activity, such as DddA tox or SsdA tox known in the art.
- the SsdA comprises the amino acid sequence shown in SEQ ID NO: 1 or an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, at least 99.9% sequence identity with SEQ ID NO: 1.
- the functional fragment of SsdA such as SsdA tox, comprises the amino acid sequence shown in SEQ ID NO: 5 or an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, at least 99.9% sequence identity with SEQ ID NO: 5.
- the DddA comprises the amino acid sequence shown in SEQ ID NO:2 or an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, or at least 99.9% sequence identity with SEQ ID NO:2.
- the DNA deaminase is preferably SsdA or a functional fragment thereof.
- the DNA deaminase is recombinantly produced.
- the DNA deaminase also contains a fusion tag, such as a tag for separation and/or purification of the DNA deaminase.
- a fusion tag such as a tag for separation and/or purification of the DNA deaminase.
- Methods for recombinant protein production are known in the art.
- tags that can be used to separate and/or purify proteins are known in the art, including but not limited to His tags, smt3 tags, GST tags, etc. Generally speaking, these tags do not change the activity of the target protein.
- the DNA deaminase comprises a His tag and a smt3 tag.
- the SsdA or its functional fragment comprises the amino acid sequence shown in any one of SEQ ID NO:1, 3-5.
- step b) is performed in a reaction system of about 10 ⁇ l to about 100 ⁇ l, preferably 50 ⁇ l.
- the reaction system can be adjusted according to the number of cells or cell nuclei.
- the DNA deaminase reacts sufficiently with the open regions of the genomic DNA in the cell to result in the deamination of C to U in the open regions of the genomic DNA in the cell.
- the DNA deaminase treatment is performed in a deamination reaction buffer.
- the deamination reaction buffer can be a Tris-HCl buffer or a HEPES buffer.
- the pH of the deamination reaction buffer is, for example, 7.4-8.0.
- the pH of the deamination reaction buffer is about 7.4.
- the deamination reaction buffer comprises about 10 mM to about 20 mM, preferably about 10 mM Tris-HCl.
- the deamination reaction buffer comprises about 10 mM to about 20 mM, preferably about 10 mM Tris-HCl, and about 0.1% NP40, about 0.1% Tween-20, about 0.01% Digitonin and/or about 0.01% Triton-100.
- the deamination reaction buffer contains about 10mM-about 20mM, preferably about 10mM Tris-HCl, and about 0.01% Digitonin and about 0.1% Tween-20. In some embodiments, the deamination reaction buffer does not contain additional metal salts, such as NaCl or MgCl 2 . In some embodiments, the deamination reaction buffer does not contain additional reagents except Tris-HCl. In some embodiments, the deamination reaction buffer consists of about 10mM-about 20mM, preferably about 10mM Tris-HCl and water. In some embodiments, the deamination reaction buffer further comprises the addition of uracil glycosylase inhibitor (UGI).
- UBI uracil glycosylase inhibitor
- Adding UGI can inhibit the activity of enzymes present in the reaction system that may remove U, and avoid the effects caused by the removal of U produced by the deamination reaction. Adding UGI can increase the amount of enzyme in the reaction to shorten the reaction time while ensuring the signal-to-noise ratio.
- UGI that can be used in the method of the present invention can be purchased commercially.
- the cells or cell nuclei are treated with about 0.5 U/ ⁇ l to about 50 U/ ⁇ l, preferably about 0.5-7.5 U/ ⁇ l of DNA deaminase. In some embodiments, about 25 U to about 2500 U, preferably about 375 U of DNA deaminase is used for every 50,000 cells or cell nuclei. In some embodiments, about 1000 cells/ ⁇ l are treated with about 0.5 U/ ⁇ l to about 50 U/ ⁇ l of DNA deaminase.
- the cells or cell nuclei are treated with about 5 U/ ⁇ l of DNA deaminase in step b)i). In some preferred embodiments, the cells or cell nuclei are treated with about 2 U/ ⁇ l of DNA deaminase in step b)ii).
- the cells or cell nuclei are treated with the DNA deaminase for about 5 minutes to about 60 minutes, preferably about 10 minutes to about 30 minutes, and more preferably about 10 minutes.
- the cells or cell nuclei are treated with a DNA deaminase at a temperature of about 4°C to about 40°C, such as about 30°C to about 40°C, preferably about 37°C.
- the Tn5 transposase comprises the amino acid sequence of SEQ ID NO: 6 or an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, at least 99.9% sequence identity to SEQ ID NO: 6.
- the Tn5 transposase is capable of fragmenting substrate DNA and adding adapters to the fragmented DNA.
- the Tn5 transposase has been complexed with an adaptor containing a Tn5ME sequence to form a transposome.
- Tn5 transposases for nucleic acid sequencing library construction, methods for constructing transpososomes with adapters, and corresponding commercial kits, which can all be applied to the present invention.
- the adapter added by the Tn5 transposase is compatible with subsequent library construction and sequencing steps.
- the adapter added by the Tn5 transposase does not contain C, and in particular, the sequence in the adapter other than the Tn5ME sequence does not contain C.
- the adapter added by the Tn5 transposase comprises a nucleotide sequence as shown below: 5'-TGGTAGAGAGGGTG AGATGTGTATAAGAGACAG-3' (SEQ ID NO:7) (the underlined part is the double-stranded Tn5ME sequence, and the non-underlined part is the single-stranded part), which is formed by annealing the single-stranded Tn5adapter-F of the sequence shown in SEQ ID NO:7 and the single-stranded Tn5adapter-R of the sequence shown in SEQ ID NO:8 (5'-CTGTCTCTTATACACATCT-3').
- the Tn5 transposase treatment is performed in a Tagmentation buffer.
- the Tagmentation buffer may be a commercially available Tn5 transposase Tagmentation buffer.
- the Tagmentation buffer comprises DMF (dimethylformamide).
- the Tagmentation buffer does not comprise a detergent.
- the Tagmentation buffer comprises 10 mM Tris-HCl (pH 7.6), 5 mM MgCl 2 , 10% DMF, 33% PBS.
- the composition of PBS is: 155 mM NaCl, 3 mM Na 2 HPO 4 , and 1 mM KH 2 PO 4 .
- the Tagmentation buffer further comprises UGI, for example, the Tagmentation buffer used in step b) ii) comprises UGI.
- the concentration of the Tn5 transposase is about 50 nM to about 150 nM, such as 100 nM.
- the cells or cell nuclei are treated with Tn5 transposase for about 5 minutes to about 60 minutes, preferably about 10 minutes to about 30 minutes, and more preferably about 15 minutes.
- the cells or cell nuclei are treated with Tn5 transposase at a temperature of about 4°C to about 40°C, such as about 30°C to about 40°C, preferably about 37°C.
- step b)i) can be used, which can lead to higher enrichment of open regions of the cell genome.
- step b)ii) can be used for the detection of DNA binding proteins with weak DNA binding ability to avoid the effect of Tn5 transposase treatment causing DNA binding proteins to fall off the DNA.
- step b) i) after the Tn5 transposase treatment, the cells or cell nuclei are separated by centrifugation and then subjected to the DNA deaminase treatment.
- step b) ii) after the DNA deaminase treatment, the cells or cell nuclei are separated by centrifugation and then subjected to the Tn5 transposase treatment.
- step c) of the method of the present invention can be carried out by conventional methods known in the art, for example, by using a commercial genomic DNA extraction kit.
- step d) comprises establishing a genomic DNA library from the isolated genomic DNA, and performing sequencing based on the genomic library, thereby determining the DNA binding protein footprints of the open regions of the cell genome at the genome level.
- Genomic DNA libraries can be established using commercial kits, such as Novagen NE103, Swift Biosciences Cat. No. 33096, etc. for single-stranded library construction; Tn5 library construction can be used for double-stranded library construction, such as Novagen TD202 kit; WGBS library construction can use zymo research D5455 & D5456 kit.
- a genomic DNA library is established from the isolated genomic DNA by single-stranded DNA library construction.
- the single-stranded DNA library was constructed using the DNA Methylation Library Kit for Illumina V3NE103.
- a U-tolerant DNA polymerase is used for DNA amplification in the library establishment, for example, the U-tolerant DNA polymerase is selected from: HiFi V3, Phusion U, Phanta Uc, Q5U and Q6U.
- Sequencing described herein can be first generation sequencing such as Sanger sequencing, second generation sequencing (NGS) or other high throughput sequencing.
- first generation sequencing such as Sanger sequencing, second generation sequencing (NGS) or other high throughput sequencing.
- second generation sequencing of Illumina can be used.
- DNA in the open region of genomic chromatin is more susceptible to DNA deaminase reaction, resulting in C conversion to U, and DNA bound by DNA binding proteins is protected and cannot be deaminated
- the positional information and binding dynamics of DNA binding proteins can be determined by analyzing the distribution of deamination sites.
- the DNA binding protein footprint of the cell's genomic DNA or a portion thereof is determined by analyzing the presence of C-G to T-A transitions (i.e., C to T and G to A) in the cell's genomic DNA or a portion thereof relative to a control genomic DNA or a portion thereof.
- C-G to T-A transitions in a specific region of the genome indicates that the genomic DNA in that region is not bound by DNA binding proteins.
- the DNA binding protein footprint of the genomic DNA or a portion thereof is determined by analyzing the position and/or density and/or conversion rate of C-G to T-A transitions in the genomic DNA or a portion thereof of the cell relative to a control genomic DNA or a portion thereof.
- the sequence of the control genomic DNA or its part can be the sequence of the genomic DNA or its part from a public database. Alternatively, the sequence of the control genomic DNA or its part can also be the sequence of the genomic DNA or its part of the same type of cell that has not been treated with the DNA deaminase.
- the DNA binding protein described herein can be a transcription factor, a histone, etc., preferably a transcription factor.
- the transcription factor is selected from NRF1, MYBL2, NFYA, RFX1, etc.
- the cells described herein can be animal cells, plant cells, or microbial cells.
- the cells are mammalian cells, including but not limited to cells of humans, mice, rats, cats, dogs, pigs, and cattle; or, the cells are monocotyledonous or dicotyledonous plant cells, such as cells of rice, corn, wheat, sorghum, soybean, potato, tomato, etc.; or, the cells are fungi such as yeast cells.
- the cell may be a cell line cell.
- the cell may be a primary cell from a different organ or tissue.
- the cell may be from blood, cerebrospinal fluid, or a biopsy.
- the cell may be a hepatocyte, a cardiomyocyte, a neuron, a fibroblast, an epithelial cell, or the like.
- the cell may also be a tumor cell, a stem cell such as an embryonic stem cell, or an induced pluripotent stem cell.
- the cell is a cell treated with a specific substance (compound) or condition or a cell at a specific developmental stage.
- a specific substance compound
- the method of the present invention can measure the effect of the specific substance (compound) or condition or specific developmental stage on the chromatin accessibility of the cell genome and/or the footprint of DNA binding proteins.
- the specific substance can be a drug.
- the present invention provides a kit for detecting chromatin accessibility and/or DNA binding protein footprints of a cell genome or for detecting the binding between a DNA molecule and a DNA binding protein, which comprises at least the DNA deaminase described herein above and an optional uracil glycosylase inhibitor (UGI).
- a kit for detecting chromatin accessibility and/or DNA binding protein footprints of a cell genome or for detecting the binding between a DNA molecule and a DNA binding protein which comprises at least the DNA deaminase described herein above and an optional uracil glycosylase inhibitor (UGI).
- the kit is used to detect chromatin accessibility and/or DNA binding protein footprints of a cell genome by the method of the present invention, or is used to detect the binding between a DNA molecule and a DNA binding protein by the method of the present invention.
- the kit further comprises a permeabilization buffer comprising a detergent and/or a reagent for isolating cell nuclei.
- the permeabilization buffer is as defined above.
- the kit further comprises a reaction buffer that is conducive to the DNA deaminase reaction.
- the reaction buffer is as defined above.
- the kit further includes reagents for amplifying the genomic DNA portion of interest, such as specific primers.
- the kit further comprises reagents for establishing a genomic library.
- the kit further comprises reagents for sequencing, such as Sanger sequencing or next generation sequencing.
- the present invention provides a kit for detecting DNA binding protein footprints in open regions of a cell genome, which comprises at least the DNA deaminase and Tn5 transposase described herein above.
- the kit is used to detect DNA binding protein footprints in open regions of a cell genome by the methods of the present invention.
- the kit further comprises a permeabilization buffer comprising a detergent and/or a reagent for isolating cell nuclei.
- a permeabilization buffer comprising a detergent and/or a reagent for isolating cell nuclei.
- the permeabilization buffer and the reagent for analyzing cell nuclei are as defined above.
- the kit further comprises a deamination reaction buffer that is conducive to the DNA deaminase reaction.
- the deamination reaction buffer is as defined above.
- the kit further comprises an adaptor comprising a Tn5ME sequence, and a reagent for complexing the adaptor with Tn5 transposase to form a transposome.
- the adaptor comprising a Tn5ME sequence is as defined above.
- the kit further comprises a Tagmentation reaction buffer that facilitates the Tn5 transposase reaction.
- the Tagmentation reaction buffer is as defined above.
- the kit further comprises reagents for establishing a genomic library.
- the kit further comprises reagents for sequencing, such as Sanger sequencing or next generation sequencing.
- the kit generally also includes a label indicating the intended use and/or method of use of the contents of the kit.
- the term label includes any written or recorded material provided on or with the kit or otherwise provided with the kit.
- Example 1 Detecting chromatin accessibility or DNA binding protein footprints in cells at the whole genome level
- the pETDuet vector was used for protein expression and purification.
- the vector contains an ampicillin resistance gene and two multiple cloning sites, each of which contains a T7 promoter, a Lac operator and a ribosome binding site.
- the DddA/SsdA sequence or its tag sequence or its truncated sequence and its corresponding Inhibitor sequence were constructed into the two sites of the pETDuet vector respectively.
- the vector was transformed into protein expression strains such as Rosetta (DE3), Rosetta Gami, and BL21 (DE3) using the heat shock method, and cultured under the conditions of ampicillin and chloramphenicol double resistance.
- Use double resistance culture medium to culture the strain at a suitable speed, test the OD value of the bacterial solution at different times, and use IPTG to induce protein expression after the bacterial solution is cultured to a certain OD value. After a certain period of induction, the bacterial precipitate is collected by centrifugation and crushed by overpressure.
- 1SsdA centrifuge the broken bacterial solution and collect the inclusion bodies. Add a buffer containing 8M urea to dissolve the inclusion bodies, centrifuge to remove impurities, add Ni-NTA beads, and affinity purify the protein.
- 2DddA centrifuge the broken bacterial solution, collect the supernatant, add Ni-NTA beads for incubation, and then add a buffer containing 8M urea to denature the protein. Treatment with 8M urea can separate DddA/SsdA and its corresponding inhibitor (see Figure 2), and then DddA/SsdA is renatured by reducing the urea concentration.
- oligonucleotide primers (5'-AATATAATATAATAACTCGCCATAATTTTAATTAAT-3', 5' with 6-FAM fluorescent group) and their complementary sequences, and use annealing to generate double-stranded oligonucleotides.
- different concentrations of deaminase were mixed with double-stranded oligonucleotide substrates with a final concentration of 1 nM and incubated at 37°C for 1 hour.
- cytosine on the substrate is deaminated to form uracil.
- Uracil DNA glycosylase was added and incubated at 37°C for 30 minutes to cut off uracil and form a purine-free pyrimidine site.
- the double-stranded oligonucleotide was unwound and broken at the purine-free pyrimidine site by high temperature treatment with 100mM NaOH and formamide at 95°C. Finally, the fragments were separated by 15% acrylamide gel electrophoresis containing 8M urea, and the deamination effect was judged by the fragment size. See Figure 3 for details.
- the enzyme activity unit U is defined as the amount of enzyme required to digest 0.1 ⁇ M substrate in a total reaction system of 10 ⁇ l at 37°C for 1 hour.
- the reaction system is:
- 10x annealing buffer 100mM Tris pH 7.5, 500mM NaCl, 10mM EDTA
- Lysis Resuspend the cells by flicking the bottom of the tube, then add 50ul Lysis Buffer (10mM Tris-HCl 7.4, 10mM NaCl, 1% BSA, 0.1% Tween-20, 0.1% NP40, 0.01% Digitonin, 0.1mM EDTA, 3mM MgCl 2 , 1xPIC) to 50,000 cells, gently pipette 3 times, and place on ice for 3 minutes.
- Lysis Buffer 10mM Tris-HCl 7.4, 10mM NaCl, 1% BSA, 0.1% Tween-20, 0.1% NP40, 0.01% Digitonin, 0.1mM EDTA, 3mM MgCl 2 , 1xPIC
- Terminate the reaction Add 1ml Final buffer (10mM Tris-HCl 7.4, 10mM NaCl, 1% BSA, 0.1mM EDTA, 1xPIC), gently invert to mix 3 times, 4°C, 700 ⁇ g, 5min, centrifuge for 1min, remove the supernatant to about 100ul, invert and centrifuge again for 1min, and completely discard the supernatant.
- Final buffer 10mM Tris-HCl 7.4, 10mM NaCl, 1% BSA, 0.1mM EDTA, 1xPIC
- the total amount of starting DNA was selected based on the measured DNA concentration and experimental requirements. Taking 5 ng as an example, the genome was fragmented by ultrasound. DNA Methylation Library Kit for Illumina V3NE103 was used for single-stranded DNA library construction (Figure 5).
- the 3'Adapter and 5'Adapter were connected respectively, and finally primer amplification was performed to obtain a complete library.
- DNA fragments were analyzed using the Agilent 2100 biochip analysis system.
- the constructed library was sent to the company for high-throughput sequencing.
- the sequencing machine used was Illumina X TEN, and the sequencing mode was double-end 150bp.
- deaminases with different purification tags and auxiliary tags were purified in multiple strains, and mouse embryonic fibroblast stem cells were treated with different deaminases, and data analysis was performed: the selected sites were amplified by PCR, and then all C sites were proposed, and the conversion rates of AC, TC, GC and C in CC were calculated to determine the preference of SsdA for substrate conversion.
- the analysis results showed that DddA had a certain TC preference, while SsdA had basically no sequence preference (Figure 6).
- the qDeAC-seq single-strand library construction method complies with the instructions of the EpiArt DNA Methylation Library Kit for Illumina V3).
- the whole genome DNA of the cells after deamination treatment by deaminase SsdA was collected and ultrasonically treated to obtain fragmented double-stranded DNA.
- a certain amount of DNA was taken as the starting library, and the double-stranded DNA was denatured at 95°C and immediately placed on ice to obtain fragmented single-stranded DNA.
- an extension system was added for reaction, and then 1.2x DNA purification magnetic beads (Novozymes, N411, VAHTS DNA Clean Beads) were used to obtain a double-stranded DNA product with a 3' end adapter connected.
- the 5' adapter ligase was used to connect the adapter to the 5' end of the double-stranded DNA, and then purified using 1x DNA magnetic beads to obtain a double-stranded DNA product with a double-end adapter connected.
- Novozymes amplification primers including but not limited to N321/N322, VAHTS Multiplex Oligos Set
- the library was amplified by PCR for an appropriate number of rounds and purified using 0.85x DNA purification magnetic beads to obtain the final library.
- the qDeAC-seq double-strand library construction method complies with the instructions of the TruePrep DNA Library Prep Kit V2 for Illumina.
- the whole genome DNA of cells was collected after deamination treatment by deaminase SsdA, and 5 ng was added to treat with transposase Tn5.
- the DNA fragments connected to the street by transposase Tn5 were purified using 2x AMP XP magnetic beads.
- Novazon amplification primers including but not limited to TD204, TruePrep Index Kit V4for Illumina
- a uracil (U)-tolerant DNA polymerase including but not limited to Phusion U
- the two database construction methods have the following differences:
- the DNA state of the starting library is different.
- the starting library of single-stranded library construction is single-stranded DNA
- the starting library of double-stranded DNA is double-stranded DNA.
- the library is fragmented in different ways.
- Single-stranded library construction uses the physical effect of ultrasound to fragment intact DNA
- double-stranded library construction uses the DNA cutting activity of the transposase Tn5 itself to fragment DNA.
- the methods of connecting adapters are different.
- the single-stranded library construction uses a ligase-based method to connect the 3' and 5' adapters in two rounds of reactions.
- the double-stranded library construction uses the adapter-connecting activity of the transposase Tn5 itself to connect the adapters during the fragmentation of DNA.
- the obtained pre-amplification DNA contains a large amount of uracil bases (Uracil, U), so when the library is amplified by PCR at the end, a DNA polymerase that tolerates U is required.
- U uracil bases
- the enzymes tested include: HiFi V3, Phusion U, Phanta Uc, Q5U, and Q6U.
- the total volume of the basic enzyme reaction system is 50 ⁇ l and 50,000 cells are used for the reaction.
- the components in the reaction were reduced in equal proportions to a total volume of 10 ⁇ l, and 10,000 cells were treated for the reaction.
- the overall conversion rate of the 50 ⁇ l basic reaction system was significantly higher than that of the 10 ⁇ l system group. This shows that the reduced system can also react, but the basic reaction system reaction is more stable.
- UMI uracil glycosylase inhibitor
- the inventors also tested the activities of a truncated variant of SsdA deaminase, SsdA-tox, and the full-length SsdA deaminase in a variety of different buffer systems and reaction temperatures.
- the present invention provides a method and system for determining the chromatin accessibility state and the position information and binding dynamics of DNA binding proteins such as transcription factors at the whole genome level.
- This technology creates a new qDEAC-seq (Quantitative DNA Deaminase-Accessible Chromatin assay using sequencing) method, which treats permeabilized cells or cell nuclei with single-stranded DNA deaminase A (SsdA) or double-stranded DNA deaminase A (DddA).
- the DNA deaminase After deaminase contact, the DNA deaminase causes the exposed cytosine bases on the DNA sequence to be deaminated into uracil; and in the subsequent PCR amplification, uracil complements adenine and replaces the original cytosine with thymine. Subsequently, Sanger sequencing, next-generation sequencing (NGS) or single-molecule long-read sequencing is used to detect the location of base mutations in genomic DNA. Given that the DNA in the open chromatin region is more easily reacted by deaminase, DNA bound by nucleosomes or DNA-binding proteins will be protected. Therefore, by analyzing the distribution of deamination sites, the openness of chromatin, the arrangement of nucleosomes, and the positional information and binding dynamics of DNA-binding proteins can be determined at the whole genome level.
- Example 1 the inventors have realized the detection of transcription factor footprints at the whole genome level using single-stranded DNA cytosine deaminase (SsdA), which is called qDeAC-seq.
- SsdA single-stranded DNA cytosine deaminase
- qDeAC-seq is a whole genome level detection, it has a high requirement for sequencing amount. Transcription factors appear more frequently in open chromatin regions, so by enriching open regions, more transcription factor footprint information can be obtained under lower sequencing conditions.
- the present invention combines a method for detecting transcription factor footprints at the whole genome level based on deaminase (qDeAC-seq) and a sequencing analysis method for transposase-accessible chromatin (Assay for Transposase-Accessible Chromatin using sequencing, ATAC-seq) to create a new method for specifically analyzing transcription factor footprints in open chromatin regions, called qDeAC-ATAC-seq.
- This method uses Tn5 and SsdA to treat permeabilized cells or cell nuclei in sequence, and uses the transposase activity of Tn5 to interrupt the chromatin DNA in the more open area, and at the same time, adds a special Tn5 adapter to the 5' end of the chromatin DNA (considering the deamination effect of SsdA on cytosine bases, a Tn5 adapter sequence without cytosine is used here, so that the DNA fragment located in the open area of chromatin can be specifically amplified by using the Tn5 adapter sequence primer in the subsequent process to achieve the enrichment of chromatin in the open area).
- SsdA single-stranded DNA deaminase A
- cytosine base on the DNA sequence is deaminated and converted into uracil base.
- uracil is recognized as thymine, complementary to adenine, and the cytosine base deaminated by the deaminase is eventually replaced by thymine.
- This mutation can be detected by first-generation Sanger sequencing, second-generation sequencing (Next Generation Sequencing, NGS) or single-molecule long-read sequencing.
- the DNA fragments in the open region are enriched by Tn5, and combined with the transcription factor footprints produced by the deamination reaction, the binding site analysis of transcription factors can be achieved under low sequencing cost conditions.
- the qDeAC-ATAC-seq technology of the present invention can include two experimental strategies ( FIG. 21 ), and the basic steps thereof are as follows:
- step (A) The cells or nuclear samples collected in step (A) are incubated with Tn5 in which the adapter sequence is embedded in advance, so that the transposase can fully react with the open chromatin DNA.
- the Tn5 reaction system is removed by centrifugation, and the precipitated cells/nuclei are collected.
- the deaminase reaction system is then added to allow the deaminase to fully react with the chromatin DNA in the cells, and then the genomic DNA is extracted using phenol chloroform.
- the DNA Methylation Library Kit for Illumina V3NE103 kit is used to construct a single-stranded DNA library. No 5' ligation is required, and the sample can be directly amplified after extension.
- the first step of amplification is performed using primers containing a Tn5 adapter homologous sequence and an i7 primer. After adding a fragment with a homologous sequence to the i5 primer at the 5' end of the library by overlap PCR, PCR amplification uses i5 and i7 primers for the second step of amplification.
- the schematic diagram of the library construction process is shown in Figure 22.
- step (A) The cells or nuclear samples collected in step (A) are incubated with SsdAtox to allow the deaminase to fully react with the open chromatin DNA.
- the SsdAtox reaction system was removed by centrifugation, and the precipitated cells/nuclei were collected. Then, the Tn5 reaction system was added to allow the Tn5 transposase to fully react with the DNA in the open chromatin region of the cells, and then the genomic DNA was extracted using phenol chloroform.
- the DNA Methylation Library Kit for Illumina V3NE103 kit is used to construct a single-stranded DNA library. No 5' ligation is required, and the sample can be directly amplified after extension.
- the first step of amplification is performed using primers containing a Tn5 adapter homologous sequence and an i7 primer. After adding a fragment with a homologous sequence to the i5 primer at the 5' end of the library by overlap PCR, PCR amplification uses i5 and i7 primers for the second step of amplification.
- the schematic diagram of the library construction process is shown in Figure 22.
- Nuclei Extraction Buffer (10mM Tris-HCl 7.4, 10mM NaCl, 1% BSA, 0.1% Tween-20, 0.1% NP40, 0.01% Digitonin, 0.1mM EDTA, 3mM MgCl2, 1xPIC) according to 50 ⁇ l/50000 cells, resuspend, gently blow 3 times, and place on ice for 3min.
- Terminate the reaction add 1 ml Final buffer (10 mM Tris-HCl 7.4, 10 mM NaCl, 1% BSA, 0.1 mM EDTA, 3 mM MgCl2, 1xPIC), gently invert for 3 times to mix, 4 ° C, 700 ⁇ g, 5 min, centrifuge for 1 min, remove the supernatant to about 100 ul, invert and centrifuge again for 1 min, and completely discard the supernatant.
- Final buffer 10 mM Tris-HCl 7.4, 10 mM NaCl, 1% BSA, 0.1 mM EDTA, 3 mM MgCl2, 1xPIC
- Nuclei Extraction Buffer (10mM Tris-HCl 7.4, 10mM NaCl, 1% BSA, 0.1% Tween-20, 0.1% NP40, 0.01% Digitonin, 0.1mM EDTA, 3mM MgCl2, 1xPIC) according to 50 ⁇ l/50000 cells, resuspend, gently pipette 3 times, and place on ice for 3min.
- the total amount of starting DNA was selected based on the measured DNA concentration and experimental requirements, and Novozymes DNA Methylation Library Kit for Illumina V3NE103 was used for single-stranded DNA library construction.
- the 3'Adapter was connected and Extension was performed.
- the first step of amplification could be performed.
- the program was set to amplify 4 rounds, followed by a 1x beads purification step, and the second step of amplification. Taking 50ng as an example, the second step of amplification was performed for 8 rounds, for a total of 12 rounds.
- nucleic acid agarose gel electrophoresis was performed, and then fragments of 300-500bp were selected for gel cutting and recovery to obtain the final library with appropriate concentration (see Figure 22 for details).
- strategy 1 was used to process the R1 cell line and construct a library for sequencing.
- the Tn5 attack was first performed on K562 cells, followed by the addition of deaminase treatment (strategy 1), and the deamination reaction was first performed and then the Tn5 treatment of chromatin was added (strategy 2). Since deaminase, like Tn5, preferentially attacks the chromatin DNA in the open region, the deamination rate of the DNA in the open region is the highest, and the base mutation caused by deamination will destroy the complete double-stranded structure of DNA, so it is necessary to balance the two and select the experimental strategy according to the experimental purpose.
- Strategy 1 has a higher degree of enrichment for open areas and can better show the open areas in cells ( Figure 28, Groups 2-3), but due to the previous treatment of Tn5, transcription factors with weak DNA binding ability fall off the DNA, resulting in low TF depth values of these transcription factors (such as NFYA, RFX1, etc. in Figure 29).
- Strategy 2 is more accurate and realistic in capturing transcription factor footprints, but due to the changes in DNA bases caused by deamination, the subsequent ability of Tn5 to attack open areas is reduced, resulting in poor enrichment of open areas in cells. It is obvious that the enrichment efficiency of Tn5 for open areas will decrease with the increase in the total amount of deaminase added ( Figure 28, Groups 1 and 4-7).
- the qDeAC-ATAC-seq experimental method of the present invention can simultaneously capture the open chromatin area of the cell genome and detect the binding footprint of transcription factors in the open chromatin area, and the experimental data obtained have high repeatability.
- the sequencing cost is reduced, and the binding information of transcription factors can still be captured under the condition of lower sequencing depth, thereby reducing the sequencing cost.
- qDeAC-seq with ATAC-seq technology, it is expected that the unique advantage of qDeAC-seq in capturing transcription factor footprints can be brought into play at the single cell level, thereby achieving the capture of multi-omics information such as chromatin accessibility, transcription factor footprints, and transcriptomes at a higher throughput level, which is of great significance for constructing gene regulatory networks in dynamic changes such as cell fate transformation, development, and carcinogenesis.
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Engineering & Computer Science (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Zoology (AREA)
- Immunology (AREA)
- Analytical Chemistry (AREA)
- Wood Science & Technology (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Biotechnology (AREA)
- Microbiology (AREA)
- Physics & Mathematics (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Genetics & Genomics (AREA)
- Hematology (AREA)
- Biomedical Technology (AREA)
- Pathology (AREA)
- Urology & Nephrology (AREA)
- Cell Biology (AREA)
- Food Science & Technology (AREA)
- Medicinal Chemistry (AREA)
- General Physics & Mathematics (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Description
本发明属于生物技术领域。具体而言,本发明涉及一种检测细胞中染色质可及性或DNA结合蛋白足迹的方法。更具体而言,本发明涉及一种基于DNA脱氨酶对于染色质开放性区域DNA的脱氨作用,把染色质的开放性信息转化成DNA序列的突变信息,从而实现在全基因组水平定量地测定单个DNA分子的染色质可及性与DNA结合蛋白例如转录因子结合足迹的方法。此外,本发明还涉及一种基于脱氨酶和转座酶检测细胞基因组开放区域的DNA结合蛋白足迹的方法,其能够富集细胞基因组的染色质开放区且检测染色质开放区内DNA结合蛋白足迹。The present invention belongs to the field of biotechnology. Specifically, the present invention relates to a method for detecting chromatin accessibility or DNA binding protein footprints in cells. More specifically, the present invention relates to a method for converting the chromatin accessibility information into the mutation information of the DNA sequence based on the deamination of DNA deaminase on the DNA in the chromatin open region, thereby realizing a method for quantitatively measuring the chromatin accessibility of a single DNA molecule and the binding footprints of DNA binding proteins such as transcription factors at the whole genome level. In addition, the present invention also relates to a method for detecting DNA binding protein footprints in the open region of the cell genome based on deaminase and transposase, which can enrich the chromatin open region of the cell genome and detect the DNA binding protein footprints in the chromatin open region.
发明背景Background of the Invention
真核生物的DNA与组蛋白形成复合体,大约147bp的DNA盘绕组蛋白核心颗粒1.65圈形成核小体结构。核小体与连接区共同构成串珠状结构,再经过进一步折叠,超螺旋,形成致密的染色质结构,被核膜包裹,与细胞质分隔开。染色体可及性反映了各类调控因子能够物理接触到基因组DNA的程度,其受到核小体排布、周围染色质空间结构、不同染色质结合因子对染色质结构促进或抑制的影响。转录因子动态地与组蛋白或其他染色质结合蛋白竞争,调控基因转录等生理过程,从而表现出不同细胞的特征。因此,通过研究染色质可及性,明确不同细胞染色质的状态,从转录因子结合、核小体重排等方面建立不同细胞之间相互转化的动态变化图谱,对于进一步明晰细胞的基因调控过程十分重要。The DNA of eukaryotic organisms forms a complex with histones, and about 147bp of DNA is coiled around the histone core particle 1.65 times to form a nucleosome structure. The nucleosome and the connecting region together form a beaded structure, which is further folded and supercoiled to form a dense chromatin structure, which is wrapped by the nuclear membrane and separated from the cytoplasm. Chromosome accessibility reflects the degree to which various regulatory factors can physically contact genomic DNA, which is affected by the nucleosome arrangement, the spatial structure of the surrounding chromatin, and the promotion or inhibition of chromatin structure by different chromatin binding factors. Transcription factors dynamically compete with histones or other chromatin binding proteins to regulate physiological processes such as gene transcription, thereby showing the characteristics of different cells. Therefore, by studying chromatin accessibility, clarifying the state of chromatin in different cells, and establishing a dynamic change map of mutual transformation between different cells from the aspects of transcription factor binding and nucleosome rearrangement, it is very important to further clarify the gene regulation process of cells.
目前大多研究染色质可及性的方法依赖于开放染色质对酶或物理打断方式更为敏感的特性。常用的测定染色质可及性的方法包括依靠切割DNA片段的MNase-seq、DNase-seq和ATAC-seq等,这些方法都是根据DNA序列的富集程度来判断染色质的开放性,因此只能对开放性做出相对性的判断。另外以上几种酶切割DNA存在序列的偏好性,因此也可能造成实验结果出现偏差。依照酶修饰活性测定可及性的NOMe-seq与Fiber-seq分别依赖甲基化酶与m6A腺嘌呤甲基转移酶。NOMe-seq所用甲基化酶只能针对GpC序列进行甲基化,因此分辨率较差。而Fiber-seq存在无法直接扩增的问题,导致依然需要富集步骤,或用极大的细胞起始量弥补不能扩增的问题。At present, most methods for studying chromatin accessibility rely on the property that open chromatin is more sensitive to enzymes or physical interruptions. Common methods for measuring chromatin accessibility include MNase-seq, DNase-seq, and ATAC-seq, which rely on cutting DNA fragments. These methods all judge the openness of chromatin based on the enrichment of DNA sequences, so they can only make relative judgments on openness. In addition, the above enzymes have sequence preferences for cutting DNA, which may also cause deviations in experimental results. NOMe-seq and Fiber-seq, which measure accessibility based on enzyme modification activity, rely on methylases and m6A adenine methyltransferases, respectively. The methylase used in NOMe-seq can only methylate GpC sequences, so the resolution is poor. However, Fiber-seq has the problem of not being able to amplify directly, which still requires an enrichment step, or uses a very large amount of cells to make up for the problem of not being able to amplify.
目前迫切需要一种可以高分辨率地、定量地检测细胞中的染色质开放性和/或转录因子结合足迹的信息的方法。There is an urgent need for a method that can detect chromatin accessibility and/or transcription factor binding footprint information in cells with high resolution and quantitative analysis.
发明简述Brief description of the invention
在本发明提供的qDeAC-seq(Quantitative DNA Deaminase-Accessible Chromatin assay using sequencing)技术中,基于DNA脱氨酶对于染色质开放性区域DNA的脱氨作用,可以把染色质的开放性信息转化成DNA序列的突变信息,从而可以实现定量地测定单个DNA分子水平的染色质可及性与DNA结合蛋白例如转录因子结合足迹(图1)。此外,通过将qDeAC-seq和ATAC-seq组合形成的qDeAC-ATAC-seq方法,能够富集细胞基因组的染色质开放区且检测染色质开放区内DNA结合蛋白足迹(图21)。In the qDeAC-seq (Quantitative DNA Deaminase-Accessible Chromatin In the qDeAC-seq assay using sequencing technology, based on the deamination of DNA in the open region of chromatin by DNA deaminase, the openness information of chromatin can be converted into mutation information of DNA sequence, so that the chromatin accessibility and DNA binding protein binding footprints such as transcription factors at the level of single DNA molecules can be quantitatively measured (Figure 1). In addition, the qDeAC-ATAC-seq method formed by combining qDeAC-seq and ATAC-seq can enrich the open chromatin regions of the cell genome and detect the footprints of DNA binding proteins in the open chromatin regions (Figure 21).
附图简述BRIEF DESCRIPTION OF THE DRAWINGS
图1. qDeAC-seq的流程示意图。Figure 1. Schematic diagram of the qDeAC-seq workflow.
图2. 8M尿素变性处理分离DddA/SsdA和其对应inhibitor。Figure 2. 8M urea denaturation treatment separates DddA/SsdA and its corresponding inhibitor.
图3. 脱氨酶活性测定。Figure 3. Deaminase activity assay.
图4. qDeAC-seq细胞处理流程。Figure 4. qDeAC-seq cell processing workflow.
图5. DNA Methylation Library Kit for Illumina V3单链建库原理和流程。Figure 5. Principle and process of single-stranded library construction using DNA Methylation Library Kit for Illumina V3.
图6. SsdA对于反应位点没有显著的序列偏好性。Figure 6. SsdA has no significant sequence preference for reaction sites.
图7. Sanger测序结果与ATAC-seq结果对比。Figure 7. Comparison of Sanger sequencing results and ATAC-seq results.
图8. MEF细胞qDeAC-seq二代测序TSS区分析与相关性分析。Figure 8. TSS region analysis and correlation analysis of MEF cell qDeAC-seq second-generation sequencing.
图9. R1细胞qDeAC-seq处理后TSS区及CTCF区数据。Figure 9. TSS region and CTCF region data of R1 cells after qDeAC-seq treatment.
图10. Klf4和Thap11结合motif位点内外存在明显转化率差异。Figure 10. There are obvious differences in conversion rates inside and outside the Klf4 and Thap11 binding motif sites.
图11. SP1和Nrf1结合motif位点的转化率在MEF与R1细胞间存在明显差异。Figure 11. There are obvious differences in the conversion rates of SP1 and Nrf1 binding motif sites between MEF and R1 cells.
图12. 不同建库方式对于文库ATAC-seq中心处覆盖度的影响。Figure 12. The impact of different library construction methods on the coverage at the center of the ATAC-seq library.
图13. 不同DNA聚合酶对于文库PCR扩增的影响。Figure 13. Effects of different DNA polymerases on library PCR amplification.
图14. 细胞通透处理条件对于样品转化率的影响。Figure 14. Effect of cell permeabilization conditions on sample conversion rate.
图15. 脱氨酶反应孵育时间对于脱氨效果的影响。Figure 15. Effect of deaminase reaction incubation time on deamination effect.
图16. 脱氨酶的不同浓度对于脱氨效果的影响。Figure 16. Effect of different concentrations of deaminase on deamination efficiency.
图17. 脱氨酶反应体系的差异对于脱氨效果的影响。Figure 17. The impact of differences in deaminase reaction systems on deamination efficiency.
图18. 脱氨酶反应缓冲液的优化。Figure 18. Optimization of deaminase reaction buffer.
图19. 加入UGI对ATAC中心处测序深度的影响。Figure 19. Effect of adding UGI on sequencing depth at the ATAC center.
图20 加入UGI对于转化率的影响。Figure 20 The impact of adding UGI on conversion rate.
图21、qDeAC-ATAC-seq流程示意图,策略一为先加入Tn5处理,后加入SsdAtox脱氨,策略二为先加入SsdAtox脱氨后加入Tn5攻击开放区。Figure 21. Schematic diagram of the qDeAC-ATAC-seq process. Strategy 1 is to add Tn5 treatment first and then add SsdAtox for deamination. Strategy 2 is to add SsdAtox for deamination first and then add Tn5 to attack the open area.
图22、qDeAC-ATAC-seq文库构建流程示意图。Fig. 22. Schematic diagram of the qDeAC-ATAC-seq library construction process.
图23、Tn5接头序列图。Figure 23. Tn5 linker sequence diagram.
图24、不同Tn5反应缓冲液对于富集开放区效率的影响。FIG. 24 . Effects of different Tn5 reaction buffers on the efficiency of enriching open zones.
图25、不同Tn5反应缓冲液对于R1细胞系内不同转录因子足迹深度的影响。FIG. 25 . Effects of different Tn5 reaction buffers on the footprint depths of different transcription factors in the R1 cell line.
图26、qDeAC-ATAC-seq文库片段大小分布。Fig. 26. qDeAC-ATAC-seq library fragment size distribution.
图27、R1细胞系中不同脱氨酶量处理条件下不同TF depth数值比较。Figure 27. Comparison of different TF depth values under different deaminase treatment conditions in R1 cell line.
图28、策略二(图中第一组、与第四到第八组)与策略一(图中第二、三组)的开放区 富集效果对比。Figure 28. Opening area of strategy 2 (the first group, and the fourth to eighth groups in the figure) and strategy 1 (the second and third groups in the figure) Comparison of enrichment effects.
图29、K562细胞中,测试不同实验策略对于不同转录因子的TF depth数值影响(含单独qDeAC-seq数据)。Figure 29. In K562 cells, different experimental strategies were tested to determine the effects of different transcription factors on TF depth values (including separate qDeAC-seq data).
发明详述DETAILED DESCRIPTION OF THE INVENTION
一、定义I. Definition
在本发明中,除非另有说明,否则本文中使用的科学和技术名词具有本领域技术人员所通常理解的含义。同时,为了更好地理解本发明,下面提供相关术语的定义和解释。In the present invention, unless otherwise specified, the scientific and technical terms used herein have the meanings commonly understood by those skilled in the art. Meanwhile, in order to better understand the present invention, the definitions and explanations of the relevant terms are provided below.
如本文所用,术语“和/或”涵盖由该术语连接的项目的所有组合,应视作各个组合已经单独地在本文列出。例如,“A和/或B”涵盖了“A”、“A和B”以及“B”。例如,“A、B和/或C”涵盖“A”、“B”、“C”、“A和B”、“A和C”、“B和C”以及“A和B和C”。As used herein, the term "and/or" encompasses all combinations of items connected by the term, and each combination should be considered to have been listed separately herein. For example, "A and/or B" encompasses "A," "A and B," and "B." For example, "A, B, and/or C" encompasses "A," "B," "C," "A and B," "A and C," "B and C," and "A and B and C."
“包含”一词在本文中用于描述蛋白质或核酸的序列时,所述蛋白质或核酸可以是由所述序列组成,或者在所述蛋白质或核酸的一端或两端可以具有额外的氨基酸或核苷酸,但仍然具有本发明所述的活性。此外,本领域技术人员清楚多肽N端由起始密码子编码的甲硫氨酸在某些实际情况下(例如在特定表达系统表达时)会被保留,但不实质影响多肽的功能。因此,本申请说明书和权利要求书中在描述具体的多肽氨基酸序列时,尽管其可能不包含N端由起始密码子编码的甲硫氨酸,然而此时也涵盖包含该甲硫氨酸的序列,相应地,其编码核苷酸序列也可以包含起始密码子;反之亦然。When the term "comprising" is used herein to describe a protein or nucleic acid sequence, the protein or nucleic acid may consist of the sequence, or may have additional amino acids or nucleotides at one or both ends of the protein or nucleic acid, but still have the activity described in the present invention. In addition, it is clear to those skilled in the art that the methionine encoded by the start codon at the N-terminus of the polypeptide may be retained in certain practical situations (for example, when expressed in a specific expression system), but it does not substantially affect the function of the polypeptide. Therefore, when describing a specific polypeptide amino acid sequence in the specification and claims of this application, although it may not contain a methionine encoded by a start codon at the N-terminus, a sequence containing the methionine is also covered at this time, and accordingly, its encoding nucleotide sequence may also contain a start codon; and vice versa.
序列“相同性”具有本领域公认的含义,并且可以利用公开的技术计算两个核酸或多肽分子或区域之间序列相同性的百分比。可以沿着多核苷酸或多肽的全长或者沿着该分子的区域测量序列相同性。Sequence "identity" has a recognized meaning in the art, and the percentage of sequence identity between two nucleic acid or polypeptide molecules or regions can be calculated using published techniques. Sequence identity can be measured along the full length of a polynucleotide or polypeptide or along a region of the molecule.
“基因组”如本文所用不仅涵盖存在于细胞核中的染色体DNA,而且还包括存在于细胞的亚细胞组分(如线粒体、质体)中的细胞器DNA。"Genome" as used herein encompasses not only the chromosomal DNA present in the cell nucleus, but also the organellar DNA present in subcellular components of the cell (eg, mitochondria, plastids).
二、检测细胞基因组的染色质可及性和/或DNA结合蛋白足迹的方法II. Methods for detecting chromatin accessibility and/or DNA binding protein footprints of cellular genomes
在一方面,本发明提供一种检测细胞基因组的染色质可及性和/或DNA结合蛋白足迹的方法,所述方法包括:In one aspect, the present invention provides a method for detecting chromatin accessibility and/or DNA binding protein footprints of a cell genome, the method comprising:
a)用包含去垢剂的通透缓冲液处理至少一个所述细胞,或从至少一个所述细胞分离细胞核;a) treating at least one of the cells with a permeabilization buffer comprising a detergent, or isolating a cell nucleus from at least one of the cells;
b)用DNA脱氨酶处理步骤a)获得的通透化的细胞或分离的细胞核,由此使得所述DNA脱氨酶与所述细胞中基因组DNA的开放区域充分反应;b) treating the permeabilized cells or isolated cell nuclei obtained in step a) with a DNA deaminase, thereby allowing the DNA deaminase to fully react with the open regions of the genomic DNA in the cells;
c)从步骤b)的产物中分离基因组DNA;和c) isolating genomic DNA from the product of step b); and
d)对所述分离的基因组DNA或其部分进行测序,并基于测序结果确定所述细胞基因组的染色质可及性和/或DNA结合蛋白足迹。d) sequencing the isolated genomic DNA or a portion thereof, and determining the chromatin accessibility and/or DNA binding protein footprint of the cell genome based on the sequencing results.
为了最大限度保留细胞基因组的染色质可及性和/或DNA结合蛋白足迹信息,所述 通透缓冲液应当是温和的,其能够破坏或部分破坏细胞壁/细胞膜,允许DNA脱氨酶进入细胞,但是并不显著破坏细胞的染色质结构。细胞核分离也应该是温和的,不显著破坏染色质结构。In order to preserve the chromatin accessibility and/or DNA binding protein footprint information of the cell genome to the greatest extent possible, the The permeabilization buffer should be gentle enough to disrupt or partially disrupt the cell wall/membrane, allowing DNA deaminases to enter the cells, but not significantly disrupt the chromatin structure of the cells. The nuclear isolation should also be gentle enough not to significantly disrupt the chromatin structure.
在一些实施方案中,所述通透缓冲液包含NP40。In some embodiments, the permeabilization buffer comprises NP40.
在一些优选实施方案中,所述通透缓冲液包含NP40、Tween-20和Digitonin。在一些优选实施方案中,所述通透缓冲液包含大约0.1% NP40、大约0.1% Tween-20和大约0.01% Digitonin。In some preferred embodiments, the permeabilization buffer comprises NP40, Tween-20, and Digitonin. In some preferred embodiments, the permeabilization buffer comprises approximately 0.1% NP40, approximately 0.1% Tween-20, and approximately 0.01% Digitonin.
在一些实施方案中,所述通透缓冲液中还包含Tris-HCl和/或PIC(蛋白酶抑制剂混合物)。In some embodiments, the permeabilization buffer further comprises Tris-HCl and/or PIC (protease inhibitor cocktail).
“脱氨酶”是指催化脱氨基反应的酶。如本文所用,“DNA脱氨酶”指的是能够接受DNA(单链或双链,特别是双链)作为底物并能够催化胞苷或脱氧胞苷分别脱氨化为尿嘧啶或脱氧尿嘧啶的脱氨酶。所述DNA脱氨酶能够将基因组DNA中开放区域的C脱氨化为U,因此在后续的用于测序的扩增反应中,原基因组DNA的C-G被转换为T-A。通过检测C-G至T-A的转换,即可检测出脱氨位点。"Deaminase" refers to an enzyme that catalyzes a deamination reaction. As used herein, "DNA deaminase" refers to a deaminase that can accept DNA (single-stranded or double-stranded, especially double-stranded) as a substrate and can catalyze the deamination of cytidine or deoxycytidine to uracil or deoxyuracil, respectively. The DNA deaminase can deaminate the C in the open region of genomic DNA to U, so that in the subsequent amplification reaction for sequencing, the C-G of the original genomic DNA is converted to T-A. By detecting the conversion of C-G to T-A, the deamination site can be detected.
本发明中所述DNA脱氨酶可以是单链DNA脱氨酶A(SsdA)或双链DNA脱氨酶A(DddA)或其功能性片段。本发明中所述DNA脱氨酶可以来自不同物种,例如来自新洋葱伯克霍尔德菌(Burkholderia cenocepacia)或丁香疫霉病菌(Phytophthora syringae)。The DNA deaminase described in the present invention can be single-stranded DNA deaminase A (SsdA) or double-stranded DNA deaminase A (DddA) or a functional fragment thereof. The DNA deaminase described in the present invention can be from different species, for example, from Burkholderia cenocepacia or Phytophthora syringae.
如本文所用,“功能性片段”指的是DNA脱氨酶基本上保留其脱氨活性的片段,例如本领域已知的DddA tox或SsdA tox。As used herein, "functional fragment" refers to a fragment of a DNA deaminase that substantially retains its deamination activity, such as DddA tox or SsdA tox known in the art.
在一些实施方案中,所述SsdA包含SEQ ID NO:1所示的氨基酸序列或者与SEQ ID NO:1具有至少80%、至少85%、至少90%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.5%、至少99.9%序列相同性的氨基酸序列。在一些实施方案中,所述SsdA的功能性片段如SsdA tox包含SEQ ID NO:5所示的氨基酸序列或者与SEQ ID NO:5具有至少80%、至少85%、至少90%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.5%、至少99.9%序列相同性的氨基酸序列。In some embodiments, the SsdA comprises the amino acid sequence shown in SEQ ID NO: 1 or an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, at least 99.9% sequence identity with SEQ ID NO: 1. In some embodiments, the functional fragment of SsdA, such as SsdA tox, comprises the amino acid sequence shown in SEQ ID NO: 5 or an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, at least 99.9% sequence identity with SEQ ID NO: 5.
在一些实施方案中,所述DddA包含SEQ ID NO:2所示的氨基酸序列或者与SEQ ID NO:2具有至少80%、至少85%、至少90%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.5%、至少99.9%序列相同性的氨基酸序列。In some embodiments, the DddA comprises the amino acid sequence shown in SEQ ID NO:2 or an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, or at least 99.9% sequence identity with SEQ ID NO:2.
由于其基本没有序列偏好性,优选所述DNA脱氨酶是SsdA或其功能性片段。Due to its substantial lack of sequence preference, the DNA deaminase is preferably SsdA or a functional fragment thereof.
在一些实施方案中,所述DNA脱氨酶是重组产生的。在一些实施方案中,所述DNA脱氨酶还含有融合标签,例如用于DNA脱氨酶分离/和或纯化的标签。重组产生蛋白质的方法是本领域已知的。并且本领域已知多种可以用于分离/和或纯化蛋白质的标签,包括但不限于His标签、smt3标签、GST标签等。通常而言,这些标签不会改变目的蛋白的活性。在一些优选实施方案中,所述DNA脱氨酶包含His标签和smt3标签。In some embodiments, the DNA deaminase is recombinantly produced. In some embodiments, the DNA deaminase also contains a fusion tag, such as a tag for separation and/or purification of the DNA deaminase. Methods for recombinant protein production are known in the art. And a variety of tags that can be used to separate and/or purify proteins are known in the art, including but not limited to His tags, smt3 tags, GST tags, etc. Generally speaking, these tags do not change the activity of the target protein. In some preferred embodiments, the DNA deaminase comprises a His tag and a smt3 tag.
在一些具体实施方案中,所述SsdA或其功能性片段包含SEQ ID NO:1、3-5中任一所示的氨基酸序列 In some specific embodiments, the SsdA or its functional fragment comprises an amino acid sequence shown in any one of SEQ ID NOs: 1, 3-5.
在一些实施方案中,步骤b)在大约10μl-大约100μl,优选50μl的反应体系中进行。所述反应体系可以根据细胞或细胞核数量进行调整。In some embodiments, step b) is performed in a reaction system of about 10 μl to about 100 μl, preferably 50 μl. The reaction system can be adjusted according to the number of cells or cell nuclei.
在一些实施方案中,所述步骤b)的反应体系包含反应缓冲液。例如,反应缓冲液可以是Tris-HCl缓冲液或HEPES缓冲液。所述反应缓冲液的pH例如是7.4-8.0。优选地,所述反应缓冲液的pH为大约7.4。In some embodiments, the reaction system of step b) comprises a reaction buffer. For example, the reaction buffer can be a Tris-HCl buffer or a HEPES buffer. The pH of the reaction buffer is, for example, 7.4-8.0. Preferably, the pH of the reaction buffer is about 7.4.
在一些实施方案中,所述反应缓冲液包含大约10mM-大约20mM,优选大约10mM Tris-HCl。In some embodiments, the reaction buffer comprises approximately 10 mM to approximately 20 mM, preferably approximately 10 mM Tris-HCl.
在一些实施方案中,所述反应缓冲液包含大约10mM-大约20mM,优选大约10mM Tris-HCl,以及大约0.1% NP40、大约0.1% Tween-20、大约0.01% Digitonin和/或大约0.01% Triton-100。在一些实施方案中,所述反应缓冲液包含大约10mM-大约20mM,优选大约10mM Tris-HCl,以及大约0.01% Digitonin和大约0.1% Tween-20。在一些实施方案中,所述反应缓冲液不包含额外的金属盐,如NaCl或MgCl2。In some embodiments, the reaction buffer comprises about 10 mM to about 20 mM, preferably about 10 mM Tris-HCl, and about 0.1% NP40, about 0.1% Tween-20, about 0.01% Digitonin and/or about 0.01% Triton-100. In some embodiments, the reaction buffer comprises about 10 mM to about 20 mM, preferably about 10 mM Tris-HCl, and about 0.01% Digitonin and about 0.1% Tween-20. In some embodiments, the reaction buffer does not comprise additional metal salts, such as NaCl or MgCl 2 .
在一些实施方案中,除了Tris-HCl外,所述反应缓冲液不包含额外的试剂。在一些实施方案中,所述反应缓冲液由大约10mM-大约20mM,优选大约10mM Tris-HCl和水组成。In some embodiments, the reaction buffer contains no additional reagents other than Tris-HCl. In some embodiments, the reaction buffer consists of about 10 mM to about 20 mM, preferably about 10 mM Tris-HCl and water.
在一些实施方案中,所述步骤b)还包括在反应中加入尿嘧啶糖基化酶抑制剂(UGI)。添加UGI可以抑制反应体系中存在的酶可能的切除U的活性,避免脱氨反应产生的U被切除导致的影响。添加UGI可以在保证信噪比的同时,提高反应中酶的用量以缩短反应时间。可用于本发明的方法的UGI可以商购获得。In some embodiments, step b) further comprises adding a uracil glycosylase inhibitor (UGI) to the reaction. Adding UGI can inhibit the activity of the enzyme present in the reaction system that may remove U, thereby avoiding the effect caused by the removal of U produced by the deamination reaction. Adding UGI can increase the amount of enzyme used in the reaction to shorten the reaction time while ensuring the signal-to-noise ratio. UGI that can be used in the method of the present invention can be commercially available.
在一些实施方案中,所述至少一个细胞为1个细胞至大约50000个细胞或更多,例如大约100个、大约1000个、大约10000个、大约50000个细胞或更多。In some embodiments, the at least one cell is from 1 cell to about 50,000 cells or more, such as about 100, about 1,000, about 10,000, about 50,000 cells or more.
在一些实施方案中,用大约0.5U/μl-大约50U/μl,优选约7.5U/μl的DNA脱氨酶处理所述细胞。In some embodiments, the cells are treated with about 0.5 U/μl to about 50 U/μl, preferably about 7.5 U/μl of DNA deaminase.
在一些实施方案中,大约每50000个细胞或细胞核用大约25U-大约2500U,优选大约375U的DNA脱氨酶处理。In some embodiments, about 25 U to about 2500 U, preferably about 375 U, of the DNA deaminase is treated for about every 50,000 cells or nuclei.
在一些实施方案中,大约1000细胞/μl使用大约0.5U/μl-大约50U/μl酶。In some embodiments, about 0.5 U/μl to about 50 U/μl enzyme is used for about 1000 cells/μl.
本文所用DNA脱氨酶的酶活单位U可以如本文实施例所示方法确定。例如,酶活单位U定义为在10μl的总反应体系中在37℃在1小时内消化0.1μM双链寡聚DNA底物所需的酶量。The enzyme activity unit U of the DNA deaminase used herein can be determined as shown in the examples herein. For example, the enzyme activity unit U is defined as the amount of enzyme required to digest 0.1 μM double-stranded oligo DNA substrate in a total reaction system of 10 μl at 37°C within 1 hour.
在一些实施方案中,步骤b)中用DNA脱氨酶处理所述细胞或细胞核大约5分钟-大约60分钟,优选大约10分钟-大约30分钟,更优选大约10分钟。In some embodiments, the cells or cell nuclei are treated with the DNA deaminase in step b) for about 5 minutes to about 60 minutes, preferably about 10 minutes to about 30 minutes, more preferably about 10 minutes.
在一些实施方案中,步骤b)在大约4℃-大约40℃,例如大约30℃-大约40℃,优选大约37℃的温度下进行。In some embodiments, step b) is performed at a temperature of about 4°C to about 40°C, such as about 30°C to about 40°C, preferably about 37°C.
在一些优选实施方案中,步骤b)中在50μl的反应体系中,用7.5U/μl的DNA脱氨酶在大约37℃处理大约50000个细胞或来自大约50000个细胞的细胞核大约30分钟。In some preferred embodiments, in step b), about 50,000 cells or nuclei from about 50,000 cells are treated with 7.5 U/μl of DNA deaminase in a 50 μl reaction system at about 37°C for about 30 minutes.
步骤c)中的基因组DNA的分离可以通过本领域已知的常规方法进行,例如可以通 过商品化的基因组DNA提取试剂盒进行。The separation of genomic DNA in step c) can be carried out by conventional methods known in the art, for example, The genomic DNA was extracted using a commercially available kit.
本发明的方法可以用于检测所述细胞基因组某些特定区域的染色质可及性和/或DNA结合蛋白足迹。在此情况下,仅需对这些特定区域进行测序。The method of the present invention can be used to detect the chromatin accessibility and/or DNA binding protein footprints of certain specific regions of the cell genome. In this case, only these specific regions need to be sequenced.
因此,在一些实施方案中,步骤d)包括扩增基因组DNA的特定部分并进行测序,从而确定所述细胞基因组特定部分的染色质可及性和/或DNA结合蛋白足迹。Thus, in some embodiments, step d) comprises amplifying and sequencing a specific portion of the genomic DNA to determine the chromatin accessibility and/or DNA binding protein footprint of the specific portion of the genome of the cell.
在一些实施方案中,所述扩增是PCR扩增,例如使用耐受U的DNA聚合酶进行的PCR扩增,例如,所述耐受U的DNA聚合酶选自:HiFi V3(诺唯赞单链建库kit中组分,NE103)、Phusion U(Thermo Scientific F555L)、Phanta Uc(vazyme,P507-01)、Q5U(NEB,M0515L)和Q6U(碧云天,D7239M)。In some embodiments, the amplification is PCR amplification, such as PCR amplification performed using a U-tolerant DNA polymerase, for example, the U-tolerant DNA polymerase is selected from: HiFi V3 (component in the Novozymes single-stranded library construction kit, NE103), Phusion U (Thermo Scientific F555L), Phanta Uc (vazyme, P507-01), Q5U (NEB, M0515L) and Q6U (Biyuntian, D7239M).
本发明的方法还可以用于在全基因组水平检测所述细胞基因组的染色质可及性和/或DNA结合蛋白足迹。The method of the present invention can also be used to detect the chromatin accessibility and/or DNA binding protein footprints of the cell genome at the whole genome level.
因此,在一些实施方案中,步骤d)包括从所述分离的基因组DNA建立全基因组DNA文库,并基于所述全基因组文库进行测序,由此在全基因组水平确定所述细胞基因组的染色质可及性和/或DNA结合蛋白足迹。Therefore, in some embodiments, step d) comprises establishing a whole genome DNA library from the isolated genomic DNA, and performing sequencing based on the whole genome library, thereby determining the chromatin accessibility and/or DNA binding protein footprint of the cell genome at the whole genome level.
可以使用本领域已知的各种从分离的基因组DNA建立全基因组测序文库的方法,包括单链DNA建库和双链DNA建库。全基因组DNA文库可以通过商品化试剂盒建立,例如单链建库可以使用诺唯赞NE103,swift biosciences Cat.No.33096等;双链建库可以使用Tn5建库,如诺唯赞TD202试剂盒;WGBS建库可以使用zymo research D5455&D5456试剂盒。Various methods known in the art for establishing whole genome sequencing libraries from isolated genomic DNA can be used, including single-stranded DNA library construction and double-stranded DNA library construction. Whole genome DNA libraries can be established using commercial kits, such as Novagen NE103, Swift Biosciences Cat. No. 33096, etc. for single-stranded library construction; Tn5 library construction can be used for double-stranded library construction, such as Novagen TD202 kit; WGBS library construction can use zymo research D5455 & D5456 kit.
然而,本发明人令人惊讶地发现,单链DNA建库的方法比双链DNA建库的方法可以获得显著更高建库深度。However, the present inventors surprisingly found that the method of constructing a library using single-stranded DNA can achieve a significantly higher library construction depth than the method of constructing a library using double-stranded DNA.
因此,在一些优选实施方案中,通过单链DNA建库从所述分离的基因组DNA建立全基因组DNA文库。Therefore, in some preferred embodiments, a whole genomic DNA library is established from the isolated genomic DNA by single-stranded DNA library construction.
在一些具体实施方案中,通过DNA Methylation Library Kit for Illumina V3NE103试剂盒进行所述单链DNA建库。In some specific embodiments, by The single-stranded DNA library was constructed using the DNA Methylation Library Kit for Illumina V3NE103.
在一些实施方案中,所述文库建立中使用耐受U的DNA聚合酶进行DNA扩增,例如,所述耐受U的DNA聚合酶选自:HiFi V3、Phusion U、Phanta Uc、Q5U和Q6U。In some embodiments, a U-tolerant DNA polymerase is used for DNA amplification in the library establishment, for example, the U-tolerant DNA polymerase is selected from: HiFi V3, Phusion U, Phanta Uc, Q5U and Q6U.
本文中所述测序可以是一代测序如Sanger测序,二代测序(NGS)或其它高通量测序。例如,可以使用Illumina公司的二代测序。Sequencing described herein can be first generation sequencing such as Sanger sequencing, second generation sequencing (NGS) or other high throughput sequencing. For example, the second generation sequencing of Illumina can be used.
由于染色质开放区域的DNA更容易被DNA脱氨酶反应导致C转换成U,而被核小体或者DNA结合蛋白结合的DNA会受到保护而无法脱氨。因此通过分析脱氨位点的分布,可以判断染色质的开放性、核小体的排布,以及DNA结合蛋白质的位置信息和结合动态。Since DNA in the open region of chromatin is more easily converted to U by DNA deaminase, DNA bound by nucleosomes or DNA binding proteins is protected and cannot be deaminated. Therefore, by analyzing the distribution of deamination sites, the openness of chromatin, the arrangement of nucleosomes, and the position information and binding dynamics of DNA binding proteins can be determined.
在一些实施方案中,通过分析相对于对照基因组DNA或其部分,所述细胞基因组DNA或其部分中C-G至T-A转换(即C to T及G to A)的存在来确定所述细胞基因组DNA或其部分的染色质可及性和/或DNA结合蛋白足迹。 In some embodiments, the chromatin accessibility and/or DNA binding protein footprint of the genomic DNA or a portion thereof of the cell is determined by analyzing the presence of CG to TA transitions (i.e., C to T and G to A) in the genomic DNA or a portion thereof of the cell relative to a control genomic DNA or a portion thereof.
基因组中特定区域中C-G至T-A转换的存在表明该区域染色质是可及的和/或未被DNA结合蛋白结合。The presence of C-G to T-A transitions in a specific region of the genome indicates that chromatin in that region is accessible and/or not bound by DNA-binding proteins.
在一些实施方案中,通过分析相对于对照基因组DNA或其部分,所述细胞基因组DNA或其部分中C-G至T-A转换的位置和/或密度和/或转换率来确定所述细胞基因组DNA或其部分的染色质可及性和/或DNA结合蛋白足迹。In some embodiments, the chromatin accessibility and/or DNA binding protein footprint of the genomic DNA or a portion thereof of the cell is determined by analyzing the position and/or density and/or conversion rate of C-G to T-A transitions in the genomic DNA or a portion thereof of the cell relative to a control genomic DNA or a portion thereof.
基因组中C-G至T-A转换的位置和/或密度表明染色质开放区域的位置和/或开放程度。The location and/or density of C-G to T-A transitions in the genome indicates the location and/or degree of openness of chromatin regions.
所述对照基因组DNA或其部分的序列可以是来自公共数据库的基因组DNA或其部分的序列。或者,所述对照基因组DNA或其部分的序列还可以是相同类型细胞未接受所述DNA脱氨酶处理的基因组DNA或其部分的序列。The sequence of the control genomic DNA or its part can be the sequence of the genomic DNA or its part from a public database. Alternatively, the sequence of the control genomic DNA or its part can also be the sequence of the genomic DNA or its part of the same type of cell that has not been treated with the DNA deaminase.
本文所述DNA结合蛋白可以是转录因子、组蛋白等,优选是转录因子,如Klf4、Thap11、SP1、Nrf1等。The DNA binding protein described herein can be a transcription factor, histone, etc., preferably a transcription factor, such as Klf4, Thap11, SP1, Nrf1, etc.
本文所述细胞可以是动物细胞、植物细胞、微生物细胞。例如,所述细胞是哺乳动物细胞,包括但不限于人、小鼠、大鼠、猫、狗、猪、牛的细胞;或者,所述细胞是单子叶植物或双子叶植物细胞,例如水稻、玉米、小麦、高粱、大豆、马铃薯、番茄等的细胞;或者,所述细胞是真菌如酵母细胞。The cells described herein can be animal cells, plant cells, or microbial cells. For example, the cells are mammalian cells, including but not limited to cells of humans, mice, rats, cats, dogs, pigs, and cattle; or, the cells are monocotyledonous or dicotyledonous plant cells, such as cells of rice, corn, wheat, sorghum, soybean, potato, tomato, etc.; or, the cells are fungi such as yeast cells.
所述细胞可以是细胞系细胞。或者,所述细胞可以是来自不同器官或组织的原代细胞。例如,所述细胞可以来自血液、脑脊液、活检组织。所述细胞可以是肝细胞、心肌细胞、神经元细胞、成纤维细胞、上皮细胞等等。所述细胞还可以是肿瘤细胞、干细胞如胚胎干细胞或诱导的多能干细胞。The cell may be a cell line cell. Alternatively, the cell may be a primary cell from a different organ or tissue. For example, the cell may be from blood, cerebrospinal fluid, or a biopsy. The cell may be a hepatocyte, a cardiomyocyte, a neuron, a fibroblast, an epithelial cell, or the like. The cell may also be a tumor cell, a stem cell such as an embryonic stem cell, or an induced pluripotent stem cell.
在一些实施方案中,所述细胞是经特定物质(化合物)或条件处理的细胞或是处于特定发育阶段的细胞。由此,通过本发明的方法可以测定该特定物质(化合物)或条件或特定发育阶段对细胞基因组染色质可及性和/或DNA结合蛋白足迹的影响。例如,所述特定物质可以是药物。In some embodiments, the cell is a cell treated with a specific substance (compound) or condition or a cell at a specific developmental stage. Thus, the method of the present invention can measure the effect of the specific substance (compound) or condition or specific developmental stage on the chromatin accessibility of the cell genome and/or the footprint of DNA binding proteins. For example, the specific substance can be a drug.
在另一方面,本发明提供一种检测DNA分子与DNA结合蛋白之间的结合的方法,其包括:In another aspect, the present invention provides a method for detecting binding between a DNA molecule and a DNA binding protein, comprising:
1)使DNA分子和DNA结合蛋白形成反应混合物并允许其充分接触;1) forming a reaction mixture of DNA molecules and DNA binding proteins and allowing them to fully contact;
2)向所述反应混合物加入DNA脱氨酶和任选的尿嘧啶糖基化酶抑制剂(UGI),并允许其与所述DNA分子充分反应;2) adding DNA deaminase and optionally uracil glycosylase inhibitor (UGI) to the reaction mixture and allowing them to fully react with the DNA molecules;
3)从步骤b)的产物中分离DNA;和3) isolating DNA from the product of step b); and
4)对所述分离的DNA或其部分进行测序,并基于测序结果确定DNA结合蛋白与所述DNA分子的结合。4) sequencing the isolated DNA or a portion thereof, and determining the binding of the DNA binding protein to the DNA molecule based on the sequencing results.
所述DNA脱氨酶如上文所定义。在一些实施方案中,所述步骤2)在上文所定义的反应缓冲液中进行。The DNA deaminase is as defined above. In some embodiments, the step 2) is performed in a reaction buffer as defined above.
在一些实施方案中,通过分析相对于对照DNA分子(如未经DNA脱氨酶处理的DNA分子),所述DNA分子或其部分中C-G至T-A转换的存在来确定所述DNA结合 蛋白与所述DNA分子的结合。In some embodiments, the DNA binding is determined by analyzing the presence of a CG to TA transition in the DNA molecule or portion thereof relative to a control DNA molecule (e.g., a DNA molecule that has not been treated with a DNA deaminase). Binding of the protein to the DNA molecule.
在一些实施方案中,通过分析相对于对照DNA分子(如未经DNA脱氨酶处理的DNA分子),所述DNA分子或其部分中C-G至T-A转换的位置来确定所述DNA结合蛋白所结合该DNA分子的位置或序列。In some embodiments, the position or sequence of the DNA molecule to which the DNA binding protein binds is determined by analyzing the position of the C-G to T-A transition in the DNA molecule or a portion thereof relative to a control DNA molecule (e.g., a DNA molecule that has not been treated with a DNA deaminase).
三、检测细胞基因组染色质开放区中的DNA结合蛋白足迹的方法3. Methods for detecting DNA binding protein footprints in open chromatin regions of cell genomes
在一方面,本发明提供一种检测细胞基因组中开放区域的DNA结合蛋白足迹的方法,所述方法包括:In one aspect, the present invention provides a method for detecting DNA binding protein footprints in open regions in a cell genome, the method comprising:
a)用包含去垢剂的通透缓冲液处理至少一个所述细胞,或从至少一个所述细胞分离细胞核;a) treating at least one of the cells with a permeabilization buffer comprising a detergent, or isolating a cell nucleus from at least one of the cells;
b)i)用Tn5转座酶处理步骤a)获得的通透化的细胞或分离的细胞核,由此使得所述Tn5转座酶与所述细胞中基因组DNA的开放区域充分反应;用DNA脱氨酶处理经Tn5转座酶处理的通透化的细胞或分离的细胞核,由此使得所述DNA脱氨酶与所述细胞中基因组DNA的开放区域充分反应;或b) i) treating the permeabilized cells or isolated cell nuclei obtained in step a) with Tn5 transposase, thereby allowing the Tn5 transposase to fully react with the open regions of the genomic DNA in the cells; treating the permeabilized cells or isolated cell nuclei treated with Tn5 transposase with DNA deaminase, thereby allowing the DNA deaminase to fully react with the open regions of the genomic DNA in the cells; or
ii)用DNA脱氨酶处理步骤a)获得的通透化的细胞或分离的细胞核,由此使得所述DNA脱氨酶与所述细胞中基因组DNA的开放区域充分反应;用Tn5转座酶处理经DNA脱氨酶处理的通透化的细胞或分离的细胞核,由此使得所述Tn5转座酶与所述细胞中基因组DNA的开放区域充分反应;ii) treating the permeabilized cells or isolated cell nuclei obtained in step a) with DNA deaminase, thereby allowing the DNA deaminase to fully react with the open regions of the genomic DNA in the cells; treating the permeabilized cells or isolated cell nuclei treated with DNA deaminase with Tn5 transposase, thereby allowing the Tn5 transposase to fully react with the open regions of the genomic DNA in the cells;
c)从步骤b)的产物中分离基因组DNA;和c) isolating genomic DNA from the product of step b); and
d)对所述分离的基因组DNA或其部分进行测序,并基于测序结果确定所述细胞基因组开放区域的DNA结合蛋白足迹。d) sequencing the isolated genomic DNA or a portion thereof, and determining the DNA binding protein footprints of the open regions of the cell genome based on the sequencing results.
在一些实施方案中,所述开放区域是染色质开放区域。本文中的染色质开放区域具有本领域公知的含义,指的是染色质紧密结构被打开的区域。这些区域允许反式作用因子与启动子、增强子、绝缘子、沉默子等顺式调控元件相结合,这种允许反式作用因子结合的特性称为染色质的开放性/可及性。In some embodiments, the open region is an open chromatin region. The open chromatin region herein has a meaning known in the art, and refers to a region where the compact structure of chromatin is opened. These regions allow trans-acting factors to bind to cis-regulatory elements such as promoters, enhancers, insulators, and silencers. This property of allowing trans-acting factors to bind is called the openness/accessibility of chromatin.
为了最大限度保留细胞基因组的DNA结合蛋白足迹信息,所述通透缓冲液应当是温和的,其能够破坏或部分破坏细胞壁/细胞膜,允许DNA脱氨酶和/或Tn5转座酶进入细胞,但是并不显著破坏细胞的染色质结构。细胞核分离也应该是温和的,不显著破坏染色质结构。In order to retain the DNA binding protein footprint information of the cell genome to the maximum extent, the permeabilization buffer should be mild, which can destroy or partially destroy the cell wall/cell membrane, allowing DNA deaminase and/or Tn5 transposase to enter the cell, but does not significantly destroy the chromatin structure of the cell. Cell nuclear isolation should also be mild and not significantly destroy the chromatin structure.
在一些实施方案中,所述通透缓冲液包含NP40。在一些优选实施方案中,所述通透缓冲液包含NP40、Tween-20和Digitonin。在一些优选实施方案中,所述通透缓冲液包含大约0.1% NP40、大约0.1% Tween-20和大约0.01% Digitonin。在一些实施方案中,所述通透缓冲液中还包含Tris-HCl和/或PIC(蛋白酶抑制剂混合物)。In some embodiments, the permeabilization buffer comprises NP40. In some preferred embodiments, the permeabilization buffer comprises NP40, Tween-20 and Digitonin. In some preferred embodiments, the permeabilization buffer comprises approximately 0.1% NP40, approximately 0.1% Tween-20 and approximately 0.01% Digitonin. In some embodiments, the permeabilization buffer further comprises Tris-HCl and/or PIC (protease inhibitor cocktail).
在一些实施方案中,所述细胞核分离使用本领域已知的方法进行。在一些实施方案中,使用包含10mM Tris-HCl 7.4,10mM NaCl,1% BSA,0.1% Tween-20,0.1% NP40,0.01%Digitonin,0.1mM EDTA,3mM MgCl2,1xPIC的缓冲液分离细胞核。 In some embodiments, the cell nucleus separation is performed using methods known in the art. In some embodiments, the cell nucleus is separated using a buffer comprising 10 mM Tris-HCl 7.4, 10 mM NaCl, 1% BSA, 0.1% Tween-20, 0.1% NP40, 0.01% Digitonin, 0.1 mM EDTA, 3 mM MgCl2, 1xPIC.
“脱氨酶”是指催化脱氨基反应的酶。如本文所用,“DNA脱氨酶”指的是能够接受DNA(单链或双链,特别是双链)作为底物并能够催化胞苷或脱氧胞苷分别脱氨化为尿嘧啶或脱氧尿嘧啶的脱氨酶。所述DNA脱氨酶能够将基因组DNA中开放区域的C脱氨化为U,因此在后续的用于测序的扩增反应中,原基因组DNA的C-G被转换为T-A。通过检测C-G至T-A的转换,即可检测出脱氨位点。"Deaminase" refers to an enzyme that catalyzes a deamination reaction. As used herein, "DNA deaminase" refers to a deaminase that can accept DNA (single-stranded or double-stranded, especially double-stranded) as a substrate and can catalyze the deamination of cytidine or deoxycytidine to uracil or deoxyuracil, respectively. The DNA deaminase can deaminate the C in the open region of genomic DNA to U, so that in the subsequent amplification reaction for sequencing, the C-G of the original genomic DNA is converted to T-A. By detecting the conversion of C-G to T-A, the deamination site can be detected.
本发明中所述DNA脱氨酶可以是单链DNA脱氨酶A(SsdA)或双链DNA脱氨酶A(DddA)或其功能性片段。本发明中所述DNA脱氨酶可以来自不同物种,例如来自新洋葱伯克霍尔德菌(Burkholderia cenocepacia)或丁香疫霉病菌(Phytophthora syringae)。The DNA deaminase described in the present invention can be single-stranded DNA deaminase A (SsdA) or double-stranded DNA deaminase A (DddA) or a functional fragment thereof. The DNA deaminase described in the present invention can be from different species, for example, from Burkholderia cenocepacia or Phytophthora syringae.
如本文所用,“功能性片段”指的是DNA脱氨酶基本上保留其脱氨活性的片段,例如本领域已知的DddA tox或SsdA tox。As used herein, "functional fragment" refers to a fragment of a DNA deaminase that substantially retains its deamination activity, such as DddA tox or SsdA tox known in the art.
在一些实施方案中,所述SsdA包含SEQ ID NO:1所示的氨基酸序列或者与SEQ ID NO:1具有至少80%、至少85%、至少90%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.5%、至少99.9%序列相同性的氨基酸序列。在一些实施方案中,所述SsdA的功能性片段如SsdA tox包含SEQ ID NO:5所示的氨基酸序列或者与SEQ ID NO:5具有至少80%、至少85%、至少90%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.5%、至少99.9%序列相同性的氨基酸序列。In some embodiments, the SsdA comprises the amino acid sequence shown in SEQ ID NO: 1 or an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, at least 99.9% sequence identity with SEQ ID NO: 1. In some embodiments, the functional fragment of SsdA, such as SsdA tox, comprises the amino acid sequence shown in SEQ ID NO: 5 or an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, at least 99.9% sequence identity with SEQ ID NO: 5.
在一些实施方案中,所述DddA包含SEQ ID NO:2所示的氨基酸序列或者与SEQ ID NO:2具有至少80%、至少85%、至少90%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.5%、至少99.9%序列相同性的氨基酸序列。In some embodiments, the DddA comprises the amino acid sequence shown in SEQ ID NO:2 or an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, or at least 99.9% sequence identity with SEQ ID NO:2.
由于其基本没有序列偏好性,优选所述DNA脱氨酶是SsdA或其功能性片段。Due to its substantial lack of sequence preference, the DNA deaminase is preferably SsdA or a functional fragment thereof.
在一些实施方案中,所述DNA脱氨酶是重组产生的。在一些实施方案中,所述DNA脱氨酶还含有融合标签,例如用于DNA脱氨酶分离/和或纯化的标签。重组产生蛋白质的方法是本领域已知的。并且本领域已知多种可以用于分离/和或纯化蛋白质的标签,包括但不限于His标签、smt3标签、GST标签等。通常而言,这些标签不会改变目的蛋白的活性。在一些优选实施方案中,所述DNA脱氨酶包含His标签和smt3标签。In some embodiments, the DNA deaminase is recombinantly produced. In some embodiments, the DNA deaminase also contains a fusion tag, such as a tag for separation and/or purification of the DNA deaminase. Methods for recombinant protein production are known in the art. And a variety of tags that can be used to separate and/or purify proteins are known in the art, including but not limited to His tags, smt3 tags, GST tags, etc. Generally speaking, these tags do not change the activity of the target protein. In some preferred embodiments, the DNA deaminase comprises a His tag and a smt3 tag.
在一些具体实施方案中,所述SsdA或其功能性片段包含SEQ ID NO:1、3-5中任一所示的氨基酸序列。In some specific embodiments, the SsdA or its functional fragment comprises the amino acid sequence shown in any one of SEQ ID NO:1, 3-5.
在一些实施方案中,步骤b)在大约10μl-大约100μl,优选50μl的反应体系中进行。所述反应体系可以根据细胞或细胞核数量进行调整。In some embodiments, step b) is performed in a reaction system of about 10 μl to about 100 μl, preferably 50 μl. The reaction system can be adjusted according to the number of cells or cell nuclei.
在一些实施方案中,所述DNA脱氨酶与所述细胞中基因组DNA的开放区域充分反应将导致所述细胞中基因组DNA的开放区域中的C脱氨化为U。In some embodiments, the DNA deaminase reacts sufficiently with the open regions of the genomic DNA in the cell to result in the deamination of C to U in the open regions of the genomic DNA in the cell.
在一些实施方案中,所述DNA脱氨酶处理在脱氨反应缓冲液中进行。In some embodiments, the DNA deaminase treatment is performed in a deamination reaction buffer.
例如,脱氨反应缓冲液可以是Tris-HCl缓冲液或HEPES缓冲液。所述脱氨反应缓冲液的pH例如是7.4-8.0。优选地,所述脱氨反应缓冲液的pH为大约7.4。在一些实施方案中,所述脱氨反应缓冲液包含大约10mM-大约20mM,优选大约10mM Tris-HCl。在一些实施方案中,所述脱氨反应缓冲液包含大约10mM-大约20mM,优选大约10mM Tris-HCl,以及大约0.1% NP40、大约0.1% Tween-20、大约0.01% Digitonin和/或大约0.01% Triton-100。在一些实施方案中,所述脱氨反应缓冲液包含大约10mM-大约20mM,优选大约10mM Tris-HCl,以及大约0.01% Digitonin和大约0.1% Tween-20。在一些实施方案中,所述脱氨反应缓冲液不包含额外的金属盐,如NaCl或MgCl2。在一些实施方案中,除了Tris-HCl外,所述脱氨反应缓冲液不包含额外的试剂。在一些实施方案中,所述脱氨反应缓冲液由大约10mM-大约20mM,优选大约10mM Tris-HCl和水组成。在一些实施方案中,所述脱氨反应缓冲液还包含加入尿嘧啶糖基化酶抑制剂(UGI)。添加UGI可以抑制反应体系中存在的酶可能的切除U的活性,避免脱氨反应产生的U被切除导致的影响。添加UGI可以在保证信噪比的同时,提高反应中酶的用量以缩短反应时间。可用于本发明的方法的UGI可以商购获得。For example, the deamination reaction buffer can be a Tris-HCl buffer or a HEPES buffer. The pH of the deamination reaction buffer is, for example, 7.4-8.0. Preferably, the pH of the deamination reaction buffer is about 7.4. In some embodiments, the deamination reaction buffer comprises about 10 mM to about 20 mM, preferably about 10 mM Tris-HCl. In some embodiments, the deamination reaction buffer comprises about 10 mM to about 20 mM, preferably about 10 mM Tris-HCl, and about 0.1% NP40, about 0.1% Tween-20, about 0.01% Digitonin and/or about 0.01% Triton-100. In some embodiments, the deamination reaction buffer contains about 10mM-about 20mM, preferably about 10mM Tris-HCl, and about 0.01% Digitonin and about 0.1% Tween-20. In some embodiments, the deamination reaction buffer does not contain additional metal salts, such as NaCl or MgCl 2 . In some embodiments, the deamination reaction buffer does not contain additional reagents except Tris-HCl. In some embodiments, the deamination reaction buffer consists of about 10mM-about 20mM, preferably about 10mM Tris-HCl and water. In some embodiments, the deamination reaction buffer further comprises the addition of uracil glycosylase inhibitor (UGI). Adding UGI can inhibit the activity of enzymes present in the reaction system that may remove U, and avoid the effects caused by the removal of U produced by the deamination reaction. Adding UGI can increase the amount of enzyme in the reaction to shorten the reaction time while ensuring the signal-to-noise ratio. UGI that can be used in the method of the present invention can be purchased commercially.
在一些实施方案中,用大约0.5U/μl-大约50U/μl,优选约0.5-7.5U/μl的DNA脱氨酶处理所述细胞或细胞核。在一些实施方案中,大约每50000个细胞或细胞核用大约25U-大约2500U,优选大约375U的DNA脱氨酶处理。在一些实施方案中,大约1000细胞/μl使用大约0.5U/μl-大约50U/μl的DNA脱氨酶处理。In some embodiments, the cells or cell nuclei are treated with about 0.5 U/μl to about 50 U/μl, preferably about 0.5-7.5 U/μl of DNA deaminase. In some embodiments, about 25 U to about 2500 U, preferably about 375 U of DNA deaminase is used for every 50,000 cells or cell nuclei. In some embodiments, about 1000 cells/μl are treated with about 0.5 U/μl to about 50 U/μl of DNA deaminase.
在一些优选实施方案中,步骤b)i)中用大约5U/μl的DNA脱氨酶处理所述细胞或细胞核。在一些优选实施方案中,步骤b)ii)中用大约2U/μl的DNA脱氨酶处理所述细胞或细胞核。In some preferred embodiments, the cells or cell nuclei are treated with about 5 U/μl of DNA deaminase in step b)i). In some preferred embodiments, the cells or cell nuclei are treated with about 2 U/μl of DNA deaminase in step b)ii).
在一些实施方案中,用DNA脱氨酶处理所述细胞或细胞核大约5分钟-大约60分钟,优选大约10分钟-大约30分钟,更优选大约10分钟。In some embodiments, the cells or cell nuclei are treated with the DNA deaminase for about 5 minutes to about 60 minutes, preferably about 10 minutes to about 30 minutes, and more preferably about 10 minutes.
在一些实施方案中,在大约4℃-大约40℃,例如大约30℃-大约40℃,优选大约37℃的温度下用DNA脱氨酶处理所述细胞或细胞核。In some embodiments, the cells or cell nuclei are treated with a DNA deaminase at a temperature of about 4°C to about 40°C, such as about 30°C to about 40°C, preferably about 37°C.
本文中,所述Tn5转座酶与所述细胞中基因组DNA的开放区域充分反应将导致所述开放区域的基因组DNA被片段化且获得的DNA片段的5’端添加上含有Tn5ME序列的接头(Tagmentation)。Herein, the Tn5 transposase fully reacts with the open region of the genomic DNA in the cell, resulting in the fragmentation of the genomic DNA in the open region and the addition of a linker containing a Tn5ME sequence to the 5' end of the obtained DNA fragment (Tagmentation).
在一些实施方案中,所述Tn5转座酶包含SEQ ID NO:6所示的氨基酸序列或者与SEQ ID NO:6具有至少80%、至少85%、至少90%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.5%、至少99.9%序列相同性的氨基酸序列。在一些实施方案中,所述Tn5转座酶能够将底物DNA片段化并在片段化的DNA上添加接头。In some embodiments, the Tn5 transposase comprises the amino acid sequence of SEQ ID NO: 6 or an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, at least 99.9% sequence identity to SEQ ID NO: 6. In some embodiments, the Tn5 transposase is capable of fragmenting substrate DNA and adding adapters to the fragmented DNA.
在一些实施方案中,所述Tn5转座酶已经与含有Tn5ME序列的接头复合形成转座体(Transposome)。In some embodiments, the Tn5 transposase has been complexed with an adaptor containing a Tn5ME sequence to form a transposome.
本领域已有合适的用于核酸测序建库的Tn5转座酶、构建带有接头的转座体的方法和相应的商品化试剂盒。这些都可以应用于本发明。In the art, there are suitable Tn5 transposases for nucleic acid sequencing library construction, methods for constructing transpososomes with adapters, and corresponding commercial kits, which can all be applied to the present invention.
在一些实施方案中,所述Tn5转座酶添加的接头与后续的文库构建和测序步骤兼容。在一些优选实施方案中,所述Tn5转座酶添加的接头不包含C,特别是所述接头中除Tn5ME序列外的序列不包含C。在一些实施方案中,所述Tn5转座酶添加的接头包含如下所示的核苷酸序列: 5’-TGGTAGAGAGGGTGAGATGTGTATAAGAGACAG-3’(SEQ ID NO:7)(下划线为双链Tn5ME序列,非下划线为单链部分),其由SEQ ID NO:7所示序列的单链Tn5adapter-F和SEQ ID NO:8(5’-CTGTCTCTTATACACATCT-3’)所示序列的单链Tn5adapter-R退火形成。In some embodiments, the adapter added by the Tn5 transposase is compatible with subsequent library construction and sequencing steps. In some preferred embodiments, the adapter added by the Tn5 transposase does not contain C, and in particular, the sequence in the adapter other than the Tn5ME sequence does not contain C. In some embodiments, the adapter added by the Tn5 transposase comprises a nucleotide sequence as shown below: 5'-TGGTAGAGAGGGTG AGATGTGTATAAGAGACAG-3' (SEQ ID NO:7) (the underlined part is the double-stranded Tn5ME sequence, and the non-underlined part is the single-stranded part), which is formed by annealing the single-stranded Tn5adapter-F of the sequence shown in SEQ ID NO:7 and the single-stranded Tn5adapter-R of the sequence shown in SEQ ID NO:8 (5'-CTGTCTCTTATACACATCT-3').
在一些实施方案中,所述Tn5转座酶处理在Tagmentation缓冲液中进行。所述Tagmentation缓冲液可以是商品化的Tn5转座酶Tagmentation缓冲液。在一些实施方案中,所述Tagmentation缓冲液包含DMF(二甲基甲酰胺)。在一些实施方案中,所述Tagmentation缓冲液不包含去污剂。在一些优选实施方案中,所述Tagmentation缓冲液包含10mM Tris-HCl(pH7.6)、5mM MgCl2、10% DMF、33%PBS。PBS的组成为:155mM NaCl,3mM Na2HPO4,and 1mM KH2PO4。In some embodiments, the Tn5 transposase treatment is performed in a Tagmentation buffer. The Tagmentation buffer may be a commercially available Tn5 transposase Tagmentation buffer. In some embodiments, the Tagmentation buffer comprises DMF (dimethylformamide). In some embodiments, the Tagmentation buffer does not comprise a detergent. In some preferred embodiments, the Tagmentation buffer comprises 10 mM Tris-HCl (pH 7.6), 5 mM MgCl 2 , 10% DMF, 33% PBS. The composition of PBS is: 155 mM NaCl, 3 mM Na 2 HPO 4 , and 1 mM KH 2 PO 4 .
在一些实施方案中,所述Tagmentation缓冲液还包含UGI,例如在步骤b)ii)中使用的Tagmentation缓冲液包含UGI。In some embodiments, the Tagmentation buffer further comprises UGI, for example, the Tagmentation buffer used in step b) ii) comprises UGI.
在一些实施方案中,所述Tn5转座酶的浓度为大约50nM-大约150nM,例如100nM。In some embodiments, the concentration of the Tn5 transposase is about 50 nM to about 150 nM, such as 100 nM.
在一些实施方案中,用Tn5转座酶处理所述细胞或细胞核大约5分钟-大约60分钟,优选大约10分钟-大约30分钟,更优选大约15分钟。In some embodiments, the cells or cell nuclei are treated with Tn5 transposase for about 5 minutes to about 60 minutes, preferably about 10 minutes to about 30 minutes, and more preferably about 15 minutes.
在一些实施方案中,在大约4℃-大约40℃,例如大约30℃-大约40℃,优选大约37℃的温度下用Tn5转座酶处理所述细胞或细胞核。In some embodiments, the cells or cell nuclei are treated with Tn5 transposase at a temperature of about 4°C to about 40°C, such as about 30°C to about 40°C, preferably about 37°C.
在一些实施方案中,如果细胞量较少,可以采用步骤b)i),其可以导致细胞基因组开放区的更高的富集。在一些实施方案中,对于DNA结合能力较弱的DNA结合蛋白的检测,可以采用步骤b)ii),避免Tn5转座酶处理导致DNA结合蛋白从DNA上掉落的影响。In some embodiments, if the amount of cells is small, step b)i) can be used, which can lead to higher enrichment of open regions of the cell genome. In some embodiments, for the detection of DNA binding proteins with weak DNA binding ability, step b)ii) can be used to avoid the effect of Tn5 transposase treatment causing DNA binding proteins to fall off the DNA.
在一些实施方案中,步骤b)i)在所述Tn5转座酶处理后,通过离心分离所述细胞或细胞核,再进行所述DNA脱氨酶处理。In some embodiments, in step b) i), after the Tn5 transposase treatment, the cells or cell nuclei are separated by centrifugation and then subjected to the DNA deaminase treatment.
在一些实施方案中,步骤b)ii)在所述DNA脱氨酶处理后,通过离心分离所述细胞或细胞核,再进行所述Tn5转座酶处理。In some embodiments, in step b) ii), after the DNA deaminase treatment, the cells or cell nuclei are separated by centrifugation and then subjected to the Tn5 transposase treatment.
本发明的方法步骤c)中的基因组DNA的分离可以通过本领域已知的常规方法进行,例如可以通过商品化的基因组DNA提取试剂盒进行。The isolation of genomic DNA in step c) of the method of the present invention can be carried out by conventional methods known in the art, for example, by using a commercial genomic DNA extraction kit.
在一些实施方案中,步骤d)包括从所述分离的基因组DNA建立基因组DNA文库,并基于所述基因组文库进行测序,由此在基因组水平确定所述细胞基因组开放区域的DNA结合蛋白足迹。In some embodiments, step d) comprises establishing a genomic DNA library from the isolated genomic DNA, and performing sequencing based on the genomic library, thereby determining the DNA binding protein footprints of the open regions of the cell genome at the genome level.
可以使用本领域已知的各种从分离的基因组DNA建立基因组测序文库的方法,包括单链DNA建库和双链DNA建库。基因组DNA文库可以通过商品化试剂盒建立,例如单链建库可以使用诺唯赞NE103,swift biosciences Cat.No.33096等;双链建库可以使用Tn5建库,如诺唯赞TD202试剂盒;WGBS建库可以使用zymo research D5455&D5456试剂盒。Various methods known in the art for establishing a genomic sequencing library from isolated genomic DNA can be used, including single-stranded DNA library construction and double-stranded DNA library construction. Genomic DNA libraries can be established using commercial kits, such as Novagen NE103, Swift Biosciences Cat. No. 33096, etc. for single-stranded library construction; Tn5 library construction can be used for double-stranded library construction, such as Novagen TD202 kit; WGBS library construction can use zymo research D5455 & D5456 kit.
本发明人令人惊讶地发现,单链DNA建库的方法比双链DNA建库的方法可以获 得显著更高建库深度。因此,在一些优选实施方案中,通过单链DNA建库从所述分离的基因组DNA建立基因组DNA文库。The inventors surprisingly found that the method of building a library with single-stranded DNA can obtain more Therefore, in some preferred embodiments, a genomic DNA library is established from the isolated genomic DNA by single-stranded DNA library construction.
在一些具体实施方案中,通过DNA Methylation Library Kit for Illumina V3NE103试剂盒进行所述单链DNA建库。In some specific embodiments, by The single-stranded DNA library was constructed using the DNA Methylation Library Kit for Illumina V3NE103.
在一些实施方案中,所述文库建立中使用耐受U的DNA聚合酶进行DNA扩增,例如,所述耐受U的DNA聚合酶选自:HiFi V3、Phusion U、Phanta Uc、Q5U和Q6U。In some embodiments, a U-tolerant DNA polymerase is used for DNA amplification in the library establishment, for example, the U-tolerant DNA polymerase is selected from: HiFi V3, Phusion U, Phanta Uc, Q5U and Q6U.
本文中所述测序可以是一代测序如Sanger测序,二代测序(NGS)或其它高通量测序。例如,可以使用Illumina公司的二代测序。Sequencing described herein can be first generation sequencing such as Sanger sequencing, second generation sequencing (NGS) or other high throughput sequencing. For example, the second generation sequencing of Illumina can be used.
由于基因组染色质开放区域的DNA更容易被DNA脱氨酶反应导致C转换成U,而被DNA结合蛋白结合的DNA会受到保护而无法脱氨。因此通过分析脱氨位点的分布,可以判断DNA结合蛋白质的位置信息和结合动态。Since DNA in the open region of genomic chromatin is more susceptible to DNA deaminase reaction, resulting in C conversion to U, and DNA bound by DNA binding proteins is protected and cannot be deaminated, the positional information and binding dynamics of DNA binding proteins can be determined by analyzing the distribution of deamination sites.
在一些实施方案中,通过分析相对于对照基因组DNA或其部分,所述细胞基因组DNA或其部分中C-G至T-A转换(即C to T及G to A)的存在来确定所述细胞基因组DNA或其部分的DNA结合蛋白足迹。In some embodiments, the DNA binding protein footprint of the cell's genomic DNA or a portion thereof is determined by analyzing the presence of C-G to T-A transitions (i.e., C to T and G to A) in the cell's genomic DNA or a portion thereof relative to a control genomic DNA or a portion thereof.
基因组中特定区域中C-G至T-A转换的存在表明该区域基因组DNA未被DNA结合蛋白结合。The presence of C-G to T-A transitions in a specific region of the genome indicates that the genomic DNA in that region is not bound by DNA binding proteins.
在一些实施方案中,通过分析相对于对照基因组DNA或其部分,所述细胞基因组DNA或其部分中C-G至T-A转换的位置和/或密度和/或转换率来确定所述细胞基因组DNA或其部分的DNA结合蛋白足迹。In some embodiments, the DNA binding protein footprint of the genomic DNA or a portion thereof is determined by analyzing the position and/or density and/or conversion rate of C-G to T-A transitions in the genomic DNA or a portion thereof of the cell relative to a control genomic DNA or a portion thereof.
所述对照基因组DNA或其部分的序列可以是来自公共数据库的基因组DNA或其部分的序列。或者,所述对照基因组DNA或其部分的序列还可以是相同类型细胞未接受所述DNA脱氨酶处理的基因组DNA或其部分的序列。The sequence of the control genomic DNA or its part can be the sequence of the genomic DNA or its part from a public database. Alternatively, the sequence of the control genomic DNA or its part can also be the sequence of the genomic DNA or its part of the same type of cell that has not been treated with the DNA deaminase.
本文所述DNA结合蛋白可以是转录因子、组蛋白等,优选是转录因子。例如,所述转录因子选自NRF1,MYBL2,NFYA,RFX1等。The DNA binding protein described herein can be a transcription factor, a histone, etc., preferably a transcription factor. For example, the transcription factor is selected from NRF1, MYBL2, NFYA, RFX1, etc.
本文所述细胞可以是动物细胞、植物细胞、微生物细胞。例如,所述细胞是哺乳动物细胞,包括但不限于人、小鼠、大鼠、猫、狗、猪、牛的细胞;或者,所述细胞是单子叶植物或双子叶植物细胞,例如水稻、玉米、小麦、高粱、大豆、马铃薯、番茄等的细胞;或者,所述细胞是真菌如酵母细胞。The cells described herein can be animal cells, plant cells, or microbial cells. For example, the cells are mammalian cells, including but not limited to cells of humans, mice, rats, cats, dogs, pigs, and cattle; or, the cells are monocotyledonous or dicotyledonous plant cells, such as cells of rice, corn, wheat, sorghum, soybean, potato, tomato, etc.; or, the cells are fungi such as yeast cells.
所述细胞可以是细胞系细胞。或者,所述细胞可以是来自不同器官或组织的原代细胞。例如,所述细胞可以来自血液、脑脊液、活检组织。所述细胞可以是肝细胞、心肌细胞、神经元细胞、成纤维细胞、上皮细胞等等。所述细胞还可以是肿瘤细胞、干细胞如胚胎干细胞或诱导的多能干细胞。The cell may be a cell line cell. Alternatively, the cell may be a primary cell from a different organ or tissue. For example, the cell may be from blood, cerebrospinal fluid, or a biopsy. The cell may be a hepatocyte, a cardiomyocyte, a neuron, a fibroblast, an epithelial cell, or the like. The cell may also be a tumor cell, a stem cell such as an embryonic stem cell, or an induced pluripotent stem cell.
在一些实施方案中,所述细胞是经特定物质(化合物)或条件处理的细胞或是处于特定发育阶段的细胞。由此,通过本发明的方法可以测定该特定物质(化合物)或条件或特定发育阶段对细胞基因组染色质可及性和/或DNA结合蛋白足迹的影响。例如,所述特定物质可以是药物。 In some embodiments, the cell is a cell treated with a specific substance (compound) or condition or a cell at a specific developmental stage. Thus, the method of the present invention can measure the effect of the specific substance (compound) or condition or specific developmental stage on the chromatin accessibility of the cell genome and/or the footprint of DNA binding proteins. For example, the specific substance can be a drug.
四、试剂盒IV. Test Kit
在另一方面,本发明提供一种用于检测细胞基因组的染色质可及性和/或DNA结合蛋白足迹或用于检测DNA分子与DNA结合蛋白之间的结合的试剂盒,其至少包括本文上文所述的DNA脱氨酶和任选的尿嘧啶糖基化酶抑制剂(UGI)。On the other hand, the present invention provides a kit for detecting chromatin accessibility and/or DNA binding protein footprints of a cell genome or for detecting the binding between a DNA molecule and a DNA binding protein, which comprises at least the DNA deaminase described herein above and an optional uracil glycosylase inhibitor (UGI).
在一些实施方案中,所述试剂盒用于通过本发明的方法检测细胞基因组的染色质可及性和/或DNA结合蛋白足迹,或用于通过本发明的方法检测DNA分子与DNA结合蛋白之间的结合。In some embodiments, the kit is used to detect chromatin accessibility and/or DNA binding protein footprints of a cell genome by the method of the present invention, or is used to detect the binding between a DNA molecule and a DNA binding protein by the method of the present invention.
在一些实施方案中,所述试剂盒还包括包含去垢剂的通透缓冲液和/或用于分离细胞核的试剂。所述通透缓冲液如上文所定义。In some embodiments, the kit further comprises a permeabilization buffer comprising a detergent and/or a reagent for isolating cell nuclei. The permeabilization buffer is as defined above.
在一些实施方案中,所述试剂盒还包括有利于DNA脱氨酶反应的反应缓冲液。所述反应缓冲液如上文所定义。In some embodiments, the kit further comprises a reaction buffer that is conducive to the DNA deaminase reaction. The reaction buffer is as defined above.
在一些实施方案中,所述试剂盒还包括用于扩增感兴趣的基因组DNA部分的试剂,如特异性引物。In some embodiments, the kit further includes reagents for amplifying the genomic DNA portion of interest, such as specific primers.
在一些实施方案中,所述试剂盒还包括用于建立基因组文库的试剂。In some embodiments, the kit further comprises reagents for establishing a genomic library.
在一些实施方案中,所述试剂盒还包括用于测序例如Sanger测序或二代测序的试剂。In some embodiments, the kit further comprises reagents for sequencing, such as Sanger sequencing or next generation sequencing.
在另一方面,本发明提供一种用于检测细胞基因组开放区域中的DNA结合蛋白足迹的试剂盒,其至少包括本文上文所述的DNA脱氨酶和Tn5转座酶。In another aspect, the present invention provides a kit for detecting DNA binding protein footprints in open regions of a cell genome, which comprises at least the DNA deaminase and Tn5 transposase described herein above.
在一些实施方案中,所述试剂盒用于通过本发明的方法检测细胞基因组开放区域中的DNA结合蛋白足迹。In some embodiments, the kit is used to detect DNA binding protein footprints in open regions of a cell genome by the methods of the present invention.
在一些实施方案中,所述试剂盒还包括包含去垢剂的通透缓冲液和/或用于分离细胞核的试剂。所述通透缓冲液和所述用于分析细胞核的试剂如上文所定义。In some embodiments, the kit further comprises a permeabilization buffer comprising a detergent and/or a reagent for isolating cell nuclei. The permeabilization buffer and the reagent for analyzing cell nuclei are as defined above.
在一些实施方案中,所述试剂盒还包括有利于DNA脱氨酶反应的脱氨反应缓冲液。所述脱氨反应缓冲液如上文所定义。In some embodiments, the kit further comprises a deamination reaction buffer that is conducive to the DNA deaminase reaction. The deamination reaction buffer is as defined above.
在一些实施方案中,所述试剂盒还包括含有Tn5ME序列的接头,以及使所述接头与Tn5转座酶复合形成转座体的试剂。含有Tn5ME序列的接头如上文所定义。In some embodiments, the kit further comprises an adaptor comprising a Tn5ME sequence, and a reagent for complexing the adaptor with Tn5 transposase to form a transposome. The adaptor comprising a Tn5ME sequence is as defined above.
在一些实施方案中,所述试剂盒还包括有利于Tn5转座酶反应的Tagmentation反应缓冲液。所述Tagmentation反应缓冲液如上文所定义。In some embodiments, the kit further comprises a Tagmentation reaction buffer that facilitates the Tn5 transposase reaction. The Tagmentation reaction buffer is as defined above.
在一些实施方案中,所述试剂盒还包括用于建立基因组文库的试剂。In some embodiments, the kit further comprises reagents for establishing a genomic library.
在一些实施方案中,所述试剂盒还包括用于测序例如Sanger测序或二代测序的试剂。In some embodiments, the kit further comprises reagents for sequencing, such as Sanger sequencing or next generation sequencing.
试剂盒一般还包括表明试剂盒内容物的预期用途和/或使用方法的标签。术语标签包括在试剂盒上或与试剂盒一起提供的或以其他方式随试剂盒提供的任何书面的或记录的材料。 The kit generally also includes a label indicating the intended use and/or method of use of the contents of the kit. The term label includes any written or recorded material provided on or with the kit or otherwise provided with the kit.
以下实施例中所使用的技术,包括测序文库建立与高通量测序技术,以及细胞固定、DNA纯化等分子生物学技术,以及细胞培养、检测技术等,除非特别说明,均为本领域内的技术人员已知的常规技术。所使用的仪器设备、试剂和细胞系等,除非是本说明书特别注明,均为一般本领域的研究和技术人员可以通过公共途径获得的。在具体实施方法所列举的实例中,其中的细胞收集、处理与单细胞分离等方案以及部分缓冲体系仅代表众多可行的处理方案中的几种。The techniques used in the following examples, including sequencing library establishment and high-throughput sequencing technology, as well as molecular biology techniques such as cell fixation and DNA purification, and cell culture and detection techniques, are conventional techniques known to those skilled in the art, unless otherwise specified. The instruments, equipment, reagents, and cell lines used, unless otherwise specified in this specification, are generally available to researchers and technicians in the field through public channels. In the examples listed in the specific implementation methods, the cell collection, processing, and single cell separation schemes and some buffer systems represent only a few of the many feasible treatment schemes.
实施例1、在全基因组水平检测细胞中染色质可及性或DNA结合蛋白足迹Example 1: Detecting chromatin accessibility or DNA binding protein footprints in cells at the whole genome level
一般方法描述General method description
1).构建表达脱氨酶蛋白的质粒1). Construction of plasmid expressing deaminase protein
蛋白表达纯化选用pETDuet载体,该载体包含氨苄抗性基因与两个多克隆位点,每个多克隆位点均包含一个T7启动子、Lac操纵子与核糖体结合位点。将DddA/SsdA序列或其带标签序列或其截短序列与其相对应的Inhibitor序列分别构建到pETDuet载体的两个位点上。The pETDuet vector was used for protein expression and purification. The vector contains an ampicillin resistance gene and two multiple cloning sites, each of which contains a T7 promoter, a Lac operator and a ribosome binding site. The DddA/SsdA sequence or its tag sequence or its truncated sequence and its corresponding Inhibitor sequence were constructed into the two sites of the pETDuet vector respectively.
2).纯化脱氨酶蛋白2). Purification of deaminase protein
利用热激法将载体转化进Rosetta(DE3)、Rosetta Gami、BL21(DE3)等蛋白表达菌株中,氨苄霉素和氯霉素双抗条件下培养,收集部分菌体,使用冻存液(50%甘油:双抗2YT=1:1)重悬,液氮速冻后存放于-80℃。测试不同菌株表达效果,选择适宜菌株进行后续实验。利用双抗培养基在合适的转速下培养菌株,不同时间测试菌液OD值,待菌液培养至一定的OD值后,利用IPTG诱导蛋白表达。诱导一定时间后,离心收集细菌沉淀,并进行超压破碎。The vector was transformed into protein expression strains such as Rosetta (DE3), Rosetta Gami, and BL21 (DE3) using the heat shock method, and cultured under the conditions of ampicillin and chloramphenicol double resistance. Some bacteria were collected and resuspended in freezing solution (50% glycerol: double resistance 2YT = 1:1), and stored at -80°C after quick freezing in liquid nitrogen. Test the expression effect of different strains and select suitable strains for subsequent experiments. Use double resistance culture medium to culture the strain at a suitable speed, test the OD value of the bacterial solution at different times, and use IPTG to induce protein expression after the bacterial solution is cultured to a certain OD value. After a certain period of induction, the bacterial precipitate is collected by centrifugation and crushed by overpressure.
对于亲和纯化与变复性,①SsdA:将破碎后菌液离心,收集包涵体。加入含有8M尿素的缓冲液溶解包涵体,离心去除杂质后,加入Ni-NTA珠,亲和纯化蛋白。②DddA:将破碎后菌液离心,收集上清,加入Ni-NTA珠孵育,之后加入含有8M尿素的缓冲液使蛋白变性。8M尿素的条件进行处理可使DddA/SsdA和其对应inhibitor分离(见图2),之后通过降低尿素浓度的方式使DddA/SsdA复性。利用Imidazole洗脱,并透析除去Imidazole,最终通过考马斯亮蓝染色的方式进行蛋白定量。最终使用AKTA蛋白纯化系统,通过离子交换层析、分子筛等方式分离纯化DddA/SsdA蛋白,将对应组分浓缩后,获得目标蛋白样品。For affinity purification and denaturation, ①SsdA: centrifuge the broken bacterial solution and collect the inclusion bodies. Add a buffer containing 8M urea to dissolve the inclusion bodies, centrifuge to remove impurities, add Ni-NTA beads, and affinity purify the protein. ②DddA: centrifuge the broken bacterial solution, collect the supernatant, add Ni-NTA beads for incubation, and then add a buffer containing 8M urea to denature the protein. Treatment with 8M urea can separate DddA/SsdA and its corresponding inhibitor (see Figure 2), and then DddA/SsdA is renatured by reducing the urea concentration. Elute with Imidazole, dialyze to remove Imidazole, and finally quantify the protein by Coomassie Brilliant Blue staining. Finally, use the AKTA protein purification system to separate and purify the DddA/SsdA protein by ion exchange chromatography, molecular sieves, etc., and concentrate the corresponding components to obtain the target protein sample.
3).体外脱氨酶活性验证。3). Verification of in vitro deaminase activity.
设计寡核苷酸引物(5’-AATATAATATAATAACTCGCCATAATTTTAATTAAT-3’,5’带有6-FAM荧光基团)以及其互补序列,利用退火的方式产生双链寡核苷酸。按照反应体系,将不同浓度的脱氨酶与终浓度为1nM的双链寡核苷酸底物混合,37℃孵育1小 时,使底物上的胞嘧啶脱氨形成尿嘧啶。加入尿嘧啶DNA糖基化酶,37℃孵育30分钟,切掉尿嘧啶,形成无嘌呤嘧啶位点。之后通过100mM NaOH、甲酰胺在95℃高温处理,使双链寡核苷酸解旋,并在无嘌呤嘧啶位点处断裂。最终利用含有8M尿素的15%丙烯酰胺凝胶电泳分离片段,通过片段大小判断脱氨效果。具体见图3。Design oligonucleotide primers (5'-AATATAATATAATAACTCGCCATAATTTTAATTAAT-3', 5' with 6-FAM fluorescent group) and their complementary sequences, and use annealing to generate double-stranded oligonucleotides. According to the reaction system, different concentrations of deaminase were mixed with double-stranded oligonucleotide substrates with a final concentration of 1 nM and incubated at 37°C for 1 hour. When cytosine on the substrate is deaminated to form uracil. Uracil DNA glycosylase was added and incubated at 37°C for 30 minutes to cut off uracil and form a purine-free pyrimidine site. After that, the double-stranded oligonucleotide was unwound and broken at the purine-free pyrimidine site by high temperature treatment with 100mM NaOH and formamide at 95°C. Finally, the fragments were separated by 15% acrylamide gel electrophoresis containing 8M urea, and the deamination effect was judged by the fragment size. See Figure 3 for details.
酶活性单位U定义为在10μl的总反应体系中在37℃1小时内消化0.1μM底物所需的酶量。The enzyme activity unit U is defined as the amount of enzyme required to digest 0.1 μM substrate in a total reaction system of 10 μl at 37°C for 1 hour.
反应体系为:The reaction system is:
10x annealing buffer(100mM Tris pH 7.5,500mM NaCl,10mM EDTA)
10x annealing buffer (100mM Tris pH 7.5, 500mM NaCl, 10mM EDTA)
4).qDeAC-seq的细胞处理(图4)4). Cell processing for qDeAC-seq (Figure 4)
(1)细胞收集与抽核处理:(1) Cell collection and nuclear extraction:
加入适量胰酶消化细胞,培养基中和后计数,转移到1.5mL低吸附EP管中,常温,300×g离心4min,弃上清。加入1mL冷PBS洗涤2次,4℃,300×g离心4min,弃上清。加入1mL冷Wash buffer(10mM Tris-HCl 7.4,10mM NaCl,1% BSA,0.1mM EDTA,3mM MgCl2,1xPIC)重悬,分出50000细胞×N组。4℃,300×g,4min,尽量吸弃上清,共计洗涤1次。Add appropriate amount of trypsin to digest cells, count after neutralization with medium, transfer to 1.5mL low-absorption EP tube, centrifuge at room temperature, 300×g for 4min, and discard supernatant. Add 1mL cold PBS to wash twice, centrifuge at 4℃, 300×g for 4min, and discard supernatant. Add 1mL cold Wash buffer (10mM Tris-HCl 7.4, 10mM NaCl, 1% BSA, 0.1mM EDTA, 3mM MgCl 2 , 1xPIC) to resuspend, and separate 50,000 cells × N groups. 4℃, 300×g, 4min, aspirate and discard supernatant as much as possible, and wash once in total.
裂解:轻弹管底重悬细胞,之后50000细胞中+50ul Lysis Buffer(10mM Tris-HCl 7.4,10mM NaCl,1% BSA,0.1% Tween-20,0.1% NP40,0.01%Digitonin,0.1mM EDTA,3mM MgCl2,1xPIC),轻柔吹打3次,冰上放置3min。终止反应:加入1ml Final buffer (10mM Tris-HCl 7.4,10mM NaCl,1% BSA,0.1mM EDTA,1xPIC),轻柔颠倒混匀3次,4℃,700×g,5min,颠倒离心1min,去除上清至约100ul,再次颠倒离心1min,彻底吸弃上清。Lysis: Resuspend the cells by flicking the bottom of the tube, then add 50ul Lysis Buffer (10mM Tris-HCl 7.4, 10mM NaCl, 1% BSA, 0.1% Tween-20, 0.1% NP40, 0.01% Digitonin, 0.1mM EDTA, 3mM MgCl 2 , 1xPIC) to 50,000 cells, gently pipette 3 times, and place on ice for 3 minutes. Terminate the reaction: Add 1ml Final buffer (10mM Tris-HCl 7.4, 10mM NaCl, 1% BSA, 0.1mM EDTA, 1xPIC), gently invert to mix 3 times, 4℃, 700×g, 5min, centrifuge for 1min, remove the supernatant to about 100ul, invert and centrifuge again for 1min, and completely discard the supernatant.
(2)脱氨反应:(2) Deamination reaction:
按照50ul体系(10mM Tris-HCl 7.4,1mM DTT,1xPIC)分别加入酶、buffer及适量lamda DNA与打孔后的细胞样品,轻柔颠倒混匀重悬细胞。37℃反应0.5h。设置热台600rpm,3min pause/30s摇晃。According to the 50ul system (10mM Tris-HCl 7.4, 1mM DTT, 1xPIC), add enzyme, buffer and appropriate amount of lamda DNA to the punched cell sample, gently invert to mix and resuspend the cells. React at 37℃ for 0.5h. Set the hot plate to 600rpm, 3min pause/30s shaking.
(3):DNA抽提(3) DNA extraction
之后进行热孵育,50ul反应体系,加200ul gDNA Hot Elution Buffer,65℃,1350rpm热震荡孵育1.5h。加入gDNA与等体积(250ul)的酚:氯仿:异戊醇DNA提取液,充分振荡混匀。室温离心13000g 5min。异丙醇沉淀获得gDNA,QUBIT测定浓度。Then, heat incubate the reaction system with 200ul gDNA Hot Elution Buffer at 65°C and 1350rpm for 1.5h. Add gDNA and an equal volume (250ul) of phenol:chloroform:isoamyl alcohol DNA extract and shake thoroughly to mix. Centrifuge at room temperature at 13000g for 5min. Obtain gDNA by isopropanol precipitation and measure the concentration by QUBIT.
5).文库建立5). Library establishment
根据测得的DNA浓度与实验需求选择起始DNA总量,以5ng起始量为例,通过超声的方法将基因组片段化。使用诺唯赞的DNA Methylation Library Kit for Illumina V3NE103试剂盒进行单链DNA建库处理(图5)。The total amount of starting DNA was selected based on the measured DNA concentration and experimental requirements. Taking 5 ng as an example, the genome was fragmented by ultrasound. DNA Methylation Library Kit for Illumina V3NE103 was used for single-stranded DNA library construction (Figure 5).
根据试剂盒步骤分别连接上3’Adapter与5’Adapter,最后进行引物扩增,获得完整文库。用Agilent 2100生物芯片分析系统进行DNA片段的分析。将建构建好的文库送到公司进行高通量测序,测序使用的机器为Illumina X TEN,测序模式为双端150bp。According to the kit steps, the 3'Adapter and 5'Adapter were connected respectively, and finally primer amplification was performed to obtain a complete library. DNA fragments were analyzed using the Agilent 2100 biochip analysis system. The constructed library was sent to the company for high-throughput sequencing. The sequencing machine used was Illumina X TEN, and the sequencing mode was double-end 150bp.
6).qDeAC-seq碱基突变识别。6).qDeAC-seq base mutation identification.
通过对测序数据分析,计算碱基转换位点和数量,从而判断基因组的可及性,以及特定位点结合蛋白的情况。By analyzing the sequencing data and calculating the base transition sites and numbers, we can determine the accessibility of the genome and the situation of protein binding at specific sites.
1.1、qDeAC-seq测定MEF基因组DNA可及性1.1. qDeAC-seq to determine MEF genomic DNA accessibility
为了明晰脱氨酶对基因组DNA的选择性,在多个菌株中纯化了带有不同纯化标签及辅助标签的脱氨酶后,分别使用不同的脱氨酶处理小鼠胚胎成纤维干细胞,并进行数据分析:通过PCR扩增选定位点,之后提出所有C位点,计算AC、TC、GC与CC中的C的转化率判定SsdA对于底物转化的偏好性。分析结果显示,DddA具有一定的TC偏好性,而SsdA基本没有序列偏好性(图6)。In order to clarify the selectivity of deaminases for genomic DNA, deaminases with different purification tags and auxiliary tags were purified in multiple strains, and mouse embryonic fibroblast stem cells were treated with different deaminases, and data analysis was performed: the selected sites were amplified by PCR, and then all C sites were proposed, and the conversion rates of AC, TC, GC and C in CC were calculated to determine the preference of SsdA for substrate conversion. The analysis results showed that DddA had a certain TC preference, while SsdA had basically no sequence preference (Figure 6).
通过分析公开数据库中的ATAC-seq数据,选择了一个开放性较高的区域,并在这个区域设计引物PCR,将产物连接T载体,后续进行sanger测序。测序结果趋势与ATAC-seq结果一致,且从序列上能看到类似转录因子SP1结合的印记(图7)。By analyzing the ATAC-seq data in the public database, we selected a region with high openness, designed primers for PCR in this region, connected the product to the T vector, and then performed Sanger sequencing. The trend of the sequencing results was consistent with the ATAC-seq results, and a mark similar to the binding of transcription factor SP1 could be seen in the sequence (Figure 7).
使用qDeAC-seq处理后小鼠胚胎成纤维干细胞基因组后进行二代测序建库,获得测序结果清楚的反映了TSS(转录起始位点)区域的高开放性,并且能看到TSS后最多至第三个核小体位置。与ATAC-seq进行相关性分析,发现qDeAC-seq与ATAC之间存在相 关关系(图8)。After processing the mouse embryonic fibroblast genome with qDeAC-seq, the next generation sequencing library was constructed. The sequencing results clearly reflected the high openness of the TSS (transcription start site) region, and up to the third nucleosome position after the TSS could be seen. Correlation analysis with ATAC-seq revealed that there was a correlation between qDeAC-seq and ATAC. relationship (Figure 8).
1.2、qDeAC-seq测定R1细胞基因组DNA可及性和特定TF位点结合情况1.2. qDeAC-seq determination of R1 cell genomic DNA accessibility and specific TF site binding
使用qDeAC-seq处理后基因组进行二代测序建库,获得结果清楚的反映了TSS区域的高开放性,并且能看到TSS后最多至第三个核小体位置。同时可以观测到CTCF(CCCTC结合因子)两侧明确的开放性波动(图9)。The genome was processed with qDeAC-seq and then subjected to next-generation sequencing library construction. The results clearly reflected the high openness of the TSS region, and up to the third nucleosome position after the TSS could be seen. At the same time, clear openness fluctuations on both sides of CTCF (CCCTC binding factor) could be observed (Figure 9).
此外,将已知特定TF(Klf4、Thap11)结合区域按照结合motif为中心对齐,按照开放程度排序,可清晰观察到蛋白结合后影响脱氨酶反应,中心位点转化率降低的情况。说明qDeAC-seq可以直观且高分辨率的看到转录因子结合的印迹。且对于特定转录因子,qDeAC-seq可以看到正负链差异。(图10)In addition, the binding regions of known specific TFs (Klf4, Thap11) were aligned according to the binding motif and sorted according to the degree of openness. It can be clearly observed that the deaminase reaction is affected by protein binding and the conversion rate of the central site is reduced. This shows that qDeAC-seq can intuitively and high-resolutionly see the imprint of transcription factor binding. And for specific transcription factors, qDeAC-seq can see the difference between positive and negative chains. (Figure 10)
1.3、qDeAC-seq对比R1与MEF基因组结合TF的差别1.3. Differences in TF binding between R1 and MEF genomes by qDeAC-seq
使用qDeAC-seq处理两种细胞基因组后进行二代测序建库。通过测序结果分析,将已知特定TF结合区域按照结合motif为中心对齐,按照R1细胞系中心的开放程度排序,同时将MEF相应的数据按照此顺序排序,可以看到两种细胞存在较大差异。图11示出对比SP1与Nrf1两种转录因子,可以看出明显差异。After processing the genomes of the two cells using qDeAC-seq, the next generation sequencing library was constructed. Through the analysis of the sequencing results, the known specific TF binding regions were aligned according to the binding motif as the center, and sorted according to the openness of the center of the R1 cell line. At the same time, the corresponding data of MEF were sorted in this order. It can be seen that there are large differences between the two cells. Figure 11 shows a comparison of the two transcription factors SP1 and Nrf1, and obvious differences can be seen.
因此,从上述实施例1-3的实验结果可以看出,利用本发明的qDeAC-seq进行细胞基因组开放性与转录因子结合印迹检测,得到的实验数据有高重复性,且可直观看到转化率受到核小体占位的影响,以及可以观测到特定转录因子结合位点的占位情况。说明本发明的qDeAC-seq方法能够直观看到染色质开放程度变化,与特定转录因子结合位点的占位情况,与其他方法相比大大提高了分辨率,且使数据可以直观观测到,更加真实可靠,并且可以看到特定转录因子结合位点正负链的差别,推测其结合的方向性。这是目前其他方法无法完成的。Therefore, it can be seen from the experimental results of Examples 1-3 above that the experimental data obtained by using the qDeAC-seq of the present invention to perform cell genome openness and transcription factor binding imprint detection have high repeatability, and the conversion rate can be intuitively seen to be affected by nucleosome occupancy, and the occupancy of specific transcription factor binding sites can be observed. It shows that the qDeAC-seq method of the present invention can intuitively see the changes in the degree of chromatin openness and the occupancy of specific transcription factor binding sites, which greatly improves the resolution compared with other methods, and makes the data intuitively observable, more real and reliable, and can see the difference between the positive and negative chains of the specific transcription factor binding site, and infer the directionality of its binding. This is currently impossible to accomplish with other methods.
1.4、比较不同建库方式对qDeAC-seq结果的影响1.4. Comparison of the effects of different library construction methods on qDeAC-seq results
本实施例中比较了单链建库和双链建库对qDeAC-seq结果的影响。建库流程如下:单链建库流程: In this example, the effects of single-stranded library construction and double-stranded library construction on qDeAC-seq results were compared. The library construction process is as follows: Single-stranded library construction process:
qDeAC-seq单链建库方法遵从诺唯赞试剂盒NE103(EpiArt DNA Methylation Library Kit for Illumina V3)说明书。The qDeAC-seq single-strand library construction method complies with the instructions of the EpiArt DNA Methylation Library Kit for Illumina V3).
收集经过脱氨酶SsdA脱氨处理后的细胞全基因组DNA进行超声处理,得到片段化的双链DNA。从中取一定量DNA作为起始文库,使用95℃高温变性双链DNA后立刻放置于冰上,得到碎片化单链DNA。利用3’接头连接酶在单链DNA 3’端连接3’端接头后,加入延伸体系进行反应,随后利用1.2x的DNA纯化磁珠(诺唯赞,N411,VAHTS DNA Clean Beads)得到连接好3’端接头的双链DNA产物。利用5’接头连接酶在双链DNA 5’端连接接头,随后利用1x的DNA磁珠纯化,得到连接好双端接头的双链DNA产物。最后利用诺唯赞扩增引物(包括但不限于N321/N322,VAHTS Multiplex Oligos Set 4/5for Illumina)对文库进行适当轮数的PCR扩增,利用0.85x DNA纯化磁珠纯化得到最终文库。The whole genome DNA of the cells after deamination treatment by deaminase SsdA was collected and ultrasonically treated to obtain fragmented double-stranded DNA. A certain amount of DNA was taken as the starting library, and the double-stranded DNA was denatured at 95°C and immediately placed on ice to obtain fragmented single-stranded DNA. After connecting the 3' end adapter to the 3' end of the single-stranded DNA using 3' adapter ligase, an extension system was added for reaction, and then 1.2x DNA purification magnetic beads (Novozymes, N411, VAHTS DNA Clean Beads) were used to obtain a double-stranded DNA product with a 3' end adapter connected. The 5' adapter ligase was used to connect the adapter to the 5' end of the double-stranded DNA, and then purified using 1x DNA magnetic beads to obtain a double-stranded DNA product with a double-end adapter connected. Finally, Novozymes amplification primers (including but not limited to N321/N322, VAHTS Multiplex Oligos Set The library was amplified by PCR for an appropriate number of rounds and purified using 0.85x DNA purification magnetic beads to obtain the final library.
双链建库流程Double-chain library construction process
qDeAC-seq双链建库方法遵从诺唯赞试剂盒TD502(TruePrep DNA Library Prep Kit V2for Illumina)说明书。The qDeAC-seq double-strand library construction method complies with the instructions of the TruePrep DNA Library Prep Kit V2 for Illumina.
收集经过脱氨酶SsdA脱氨处理后的细胞全基因组DNA,从中取5ng加入转座酶Tn5进行处理,后利用2x AMP XP磁珠纯化得到被转座酶Tn5连接上街头的DNA片段。The whole genome DNA of cells was collected after deamination treatment by deaminase SsdA, and 5 ng was added to treat with transposase Tn5. The DNA fragments connected to the street by transposase Tn5 were purified using 2x AMP XP magnetic beads.
最后利用诺唯赞扩增引物(包括但不限于TD204,TruePrep Index Kit V4for Illumina),更使用耐受尿嘧啶(U)的DNA聚合酶(包括但不限于Phusion U)进行适当轮数的PCR扩增,磁珠纯化或胶回收300-500bp之间的片段得到最终的纯化文库。Finally, use Novazon amplification primers (including but not limited to TD204, TruePrep Index Kit V4for Illumina) and a uracil (U)-tolerant DNA polymerase (including but not limited to Phusion U) to perform appropriate rounds of PCR amplification, and magnetic bead purification or gel recovery of fragments between 300-500bp to obtain the final purified library.
两种建库方法有以下区别:The two database construction methods have the following differences:
1)起始文库的DNA状态不同,单链建库的起始文库是单链DNA,双链DNA的起始文库是双链DNA。1) The DNA state of the starting library is different. The starting library of single-stranded library construction is single-stranded DNA, and the starting library of double-stranded DNA is double-stranded DNA.
2)文库的打断方式不同,单链建库利用超声的物理作用完整DNA的打断,双链建库利用转座酶Tn5本身具有的DNA切割活动打断DNA。2) The library is fragmented in different ways. Single-stranded library construction uses the physical effect of ultrasound to fragment intact DNA, while double-stranded library construction uses the DNA cutting activity of the transposase Tn5 itself to fragment DNA.
3)连接接头的方式不同,单链建库利用基于连接酶的方式,分别在两轮反应中进行3’与5’接头的连接,双链建库利用转座酶Tn5本身具有的连接接头的活性,在片段化DNA的过程中进行接头的连接。3) The methods of connecting adapters are different. The single-stranded library construction uses a ligase-based method to connect the 3' and 5' adapters in two rounds of reactions. The double-stranded library construction uses the adapter-connecting activity of the transposase Tn5 itself to connect the adapters during the fragmentation of DNA.
对于原始已经过脱氨酶转化后的序列,DNA双链序列上会存在较多U-G错配区域,这类区域会影响Tn5对于DNA序列的攻击效率,导致编辑较多的区域较难被Tn5攻击,建库深度较低。单链建库可显著改善这一现象。For the original sequence that has been converted by deaminase, there will be more U-G mismatch regions on the double-stranded DNA sequence, which will affect the attack efficiency of Tn5 on the DNA sequence, making the edited regions more difficult to be attacked by Tn5 and the library construction depth lower. Single-stranded library construction can significantly improve this phenomenon.
利用2.8U SsdA处理MEF细胞系反应30分钟的DNA样品,分别用上述方法进行DNA双链建库,及DNA单链建库,并测序。分析发现单链建库所得到的DNA文库在已发表ATAC数据分析得到有peak存在的部分的中心处的覆盖度显著提升,明显高于双链DNA建库样品。(图12)DNA samples of MEF cell lines were treated with 2.8U SsdA for 30 minutes, and double-stranded DNA libraries and single-stranded DNA libraries were constructed using the above method, and then sequenced. Analysis found that the coverage of the DNA library obtained by single-stranded library construction was significantly improved in the center of the peak obtained in the published ATAC data analysis, which was significantly higher than that of the double-stranded DNA library sample. (Figure 12)
1.5、qDeAC-seq的优化1.5 Optimization of qDeAC-seq
DNA聚合酶的选择Choice of DNA polymerase
由于脱氨酶的脱氨处理效果,所得到的扩增前DNA中含有大量的尿嘧啶碱基(Uracil,U),因此在最后进行文库PCR扩增时,需要利用耐受U的DNA聚合酶。发明人测试了不同的U耐受DNA聚合酶,分析扩增效果差异,发现不同的酶的效果类似(图13)。所测试的酶包括:HiFi V3、Phusion U、Phanta Uc、Q5U、Q6U。 Due to the deamination effect of the deaminase, the obtained pre-amplification DNA contains a large amount of uracil bases (Uracil, U), so when the library is amplified by PCR at the end, a DNA polymerase that tolerates U is required. The inventors tested different U-tolerant DNA polymerases, analyzed the differences in amplification effects, and found that the effects of different enzymes were similar (Figure 13). The enzymes tested include: HiFi V3, Phusion U, Phanta Uc, Q5U, and Q6U.
细胞打孔条件优化Optimization of cell punching conditions
分析了其他各种组学方法常用的打孔条件,发现0.5%NP-40以及0.1%NP-40,0.1%Tween-20以及0.01%Digitonin(图中称为Mix)这两种条件是打孔反应中最为常用的两种去垢剂使用条件。分别使用这两种打孔buffer处理细胞,发现两种打孔方式均有使细胞膜通透的效果。但同等酶量条件下,Mix组转化率明显高于0.5%NP40组,在开放处有更高的碱基转化率。(图14)We analyzed the punching conditions commonly used in various other omics methods and found that 0.5% NP-40 and 0.1% NP-40, 0.1% Tween-20 and 0.01% Digitonin (referred to as Mix in the figure) are the two most commonly used detergent conditions in the punching reaction. We used these two punching buffers to treat cells and found that both punching methods had the effect of permeabilizing the cell membrane. However, under the same enzyme dosage conditions, the conversion rate of the Mix group was significantly higher than that of the 0.5% NP40 group, and there was a higher base conversion rate at the open area. (Figure 14)
脱氨酶孵育时间Deaminase incubation time
分别测试了10分钟与30分钟的脱氨酶反应孵育时长,两种时长均能得到有明显转化率的样品。30分钟的反应条件所得到的基因组转化率较高。(图15)The deaminase reaction incubation time of 10 minutes and 30 minutes was tested respectively, and samples with significant conversion rate were obtained under both incubation times. The genome conversion rate obtained under the 30-minute reaction condition was higher. (Figure 15)
脱氨酶浓度Deaminase concentration
使用不同浓度的脱氨酶处理基因组DNA,可以看出随着酶浓度的升高,整体的基因组碱基转化率也在上升。通过分析,随着酶浓度上升,开放区与旁边区域的信噪比呈现先上升后下降的趋势。对于本批次酶来说,7.5U/μl的酶浓度下转化率及信噪比较高(图16)。15U组和30U组数据未示出,但30U组的信噪比小于15U组。Using different concentrations of deaminase to treat genomic DNA, it can be seen that as the enzyme concentration increases, the overall genomic base conversion rate also increases. Through analysis, as the enzyme concentration increases, the signal-to-noise ratio of the open area and the adjacent area shows a trend of first increasing and then decreasing. For this batch of enzymes, the conversion rate and signal-to-noise ratio are higher at an enzyme concentration of 7.5U/μl (Figure 16). The data for the 15U group and the 30U group are not shown, but the signal-to-noise ratio of the 30U group is less than that of the 15U group.
脱氨酶反应体系Deaminase reaction system
基础酶反应体系总体系为50μl、50000个细胞进行反应。通过实验测试等比减少了反应中的各个组分至总体积为10μl,处理10000个细胞进行反应。通过分析,发现50μl基础反应体系整体转化率明显高于10μl体系组。说明减量体系也可以进行反应,但基础反应体系反应更稳定。(图17)The total volume of the basic enzyme reaction system is 50μl and 50,000 cells are used for the reaction. Through experimental testing, the components in the reaction were reduced in equal proportions to a total volume of 10μl, and 10,000 cells were treated for the reaction. Through analysis, it was found that the overall conversion rate of the 50μl basic reaction system was significantly higher than that of the 10μl system group. This shows that the reduced system can also react, but the basic reaction system reaction is more stable. (Figure 17)
反应缓冲液的优化Optimization of reaction buffer
首先测试了测试反应缓冲液中额外附加成分对SsdA活性的影响(参见上文“体外脱氨酶活性验证”)。分组实验条件如下表:
First, the effect of additional components in the test reaction buffer on SsdA activity was tested (see "In vitro deaminase activity verification" above). The grouped experimental conditions are as follows:
结果如图18A所示。第4组具有最优的脱氨效果。The results are shown in Figure 18A. Group 4 had the best deamination effect.
此外,还测试NaCl与MgCl2对SsdA活性的影响。分组实验条件如下表:
In addition, the effects of NaCl and MgCl 2 on SsdA activity were also tested. The grouping experimental conditions are as follows:
结果如图18B所示。The results are shown in Figure 18B.
添加UGI优化Add UGI optimization
本发明人令人意外地发现,所制备的脱氨酶存在切掉脱氨产生的U的活性,由此影响后续反应和检测的可靠性。因此,本发明人测试了在反应体系中添加尿嘧啶糖基化酶抑制剂(UGI)的影响。The inventors surprisingly found that the prepared deaminase has the activity of cleaving off U produced by deamination, thereby affecting the reliability of subsequent reactions and detection. Therefore, the inventors tested the effect of adding uracil glycosylase inhibitor (UGI) to the reaction system.
分别比较30U酶反应30分钟、15U酶反应30分钟和30U酶反应10分钟条件是否加UGI的效果。结果见图19和20。其中图19示出添加UGI后,ATAC peak中心处深度明显提高。图20示出,添加UGI后显著提高了转化率。可见添加UGI允许采用更高酶量,反应更长时间,取得更优结果。The effects of adding or not adding UGI were compared under the conditions of 30U enzyme reaction for 30 minutes, 15U enzyme reaction for 30 minutes, and 30U enzyme reaction for 10 minutes. The results are shown in Figures 19 and 20. Figure 19 shows that after adding UGI, the depth at the center of the ATAC peak is significantly improved. Figure 20 shows that the conversion rate is significantly improved after adding UGI. It can be seen that adding UGI allows the use of higher enzyme amounts, longer reaction times, and better results.
脱氨酶截短变体Deaminase truncated variants
本发明人还测试了SsdA脱氨酶的截短变体SsdA-tox以及全长SsdA脱氨酶在多种不同缓冲液体系和反应温度下的活性。The inventors also tested the activities of a truncated variant of SsdA deaminase, SsdA-tox, and the full-length SsdA deaminase in a variety of different buffer systems and reaction temperatures.
结果显示,SsdA脱氨酶以及其截短变体SsdA-tox在多种常用的缓冲体系以及在低至4℃的温度下均具有良好的脱氨活性,极大拓展了其应用范围。The results showed that SsdA deaminase and its truncated variant SsdA-tox had good deamination activity in a variety of commonly used buffer systems and at temperatures as low as 4°C, greatly expanding its range of applications.
综上,本发明提供了一种用于在全基因组水平上测定染色质开放性状态以及转录因子等DNA结合蛋白位置信息和结合动态的方法和系统。本技术创建了一种新的qDEAC-seq(Quantitative DNA Deaminase-Accessible Chromatin assay using sequencing)的方法,通过单链DNA脱氨酶A(SsdA)或双链DNA脱氨酶A(DddA)处理通透化处理的细胞或者细胞核。脱氨酶接触后DNA脱氨酶导致DNA序列上裸露的胞嘧啶碱基脱氨变为尿嘧啶;以及后续的PCR扩增中尿嘧啶与腺嘌呤互补,将原有的胞嘧啶替换为胸腺嘧啶。之后通过sanger测序、二代测序(NGS)或单分子长读长测序,以检测基因组DNA中发生碱基突变的位置。鉴于染色质开放区域的DNA更容易被脱氨酶的反 应,而被核小体或者DNA结合蛋白结合的DNA会受到保护,因此通过分析脱氨位点的分布,可以在全基因组水平上判断染色质的开放性、核小体的排布,以及DNA结合蛋白质的位置信息和结合动态。In summary, the present invention provides a method and system for determining the chromatin accessibility state and the position information and binding dynamics of DNA binding proteins such as transcription factors at the whole genome level. This technology creates a new qDEAC-seq (Quantitative DNA Deaminase-Accessible Chromatin assay using sequencing) method, which treats permeabilized cells or cell nuclei with single-stranded DNA deaminase A (SsdA) or double-stranded DNA deaminase A (DddA). After deaminase contact, the DNA deaminase causes the exposed cytosine bases on the DNA sequence to be deaminated into uracil; and in the subsequent PCR amplification, uracil complements adenine and replaces the original cytosine with thymine. Subsequently, Sanger sequencing, next-generation sequencing (NGS) or single-molecule long-read sequencing is used to detect the location of base mutations in genomic DNA. Given that the DNA in the open chromatin region is more easily reacted by deaminase, DNA bound by nucleosomes or DNA-binding proteins will be protected. Therefore, by analyzing the distribution of deamination sites, the openness of chromatin, the arrangement of nucleosomes, and the positional information and binding dynamics of DNA-binding proteins can be determined at the whole genome level.
通过与ATAC-seq、ChIP-seq或者CUT&TAG等技术联用,还可以进一步特别富集染色质开放区或者特定蛋白结合区域,进而分析这些位点上蛋白质结合的印迹。通过把蛋白质印迹信息与转录因子相关数据库和转录组信息进行综合分析,可以更准确地构建细胞的转录调控网络。通过与CRISPR技术联合使用,有望在全基因组水平分析包括转录因子在内任一蛋白对于转录调控网络的影响。By combining with ATAC-seq, ChIP-seq or CUT&TAG and other technologies, it is possible to further enrich chromatin open areas or specific protein binding areas, and then analyze the protein binding imprints at these sites. By comprehensively analyzing protein imprint information with transcription factor-related databases and transcriptome information, the transcriptional regulatory network of cells can be constructed more accurately. By combining with CRISPR technology, it is expected to analyze the impact of any protein, including transcription factors, on the transcriptional regulatory network at the whole genome level.
实施例2、检测细胞基因组开放区域的DNA结合蛋白足迹Example 2: Detection of DNA-binding protein footprints in open regions of the cell genome
如实施例1所示,本发明人已经实现利用单链DNA胞嘧啶脱氨酶(SsdA)在全基因组水平上的转录因子足迹的检测,称为qDeAC-seq。但是由于qDeAC-seq是全基因组水平检测,因此对于测序量的要求较高。转录因子出现在染色质开放区的频率更高,因此通过富集开放区,就能够在更低的测序量条件下获取更多转录因子的足迹信息。本发明通过联合基于脱氨酶检测全基因组水平转录因子足迹的方法(qDeAC-seq)和转座酶可及染色质的测序分析方法(Assay for Transposase-Accessible Chromatin using sequencing,ATAC-seq),创立了新的用于特异性分析染色质开放区内的转录因子足迹的方法,称为qDeAC-ATAC-seq。As shown in Example 1, the inventors have realized the detection of transcription factor footprints at the whole genome level using single-stranded DNA cytosine deaminase (SsdA), which is called qDeAC-seq. However, since qDeAC-seq is a whole genome level detection, it has a high requirement for sequencing amount. Transcription factors appear more frequently in open chromatin regions, so by enriching open regions, more transcription factor footprint information can be obtained under lower sequencing conditions. The present invention combines a method for detecting transcription factor footprints at the whole genome level based on deaminase (qDeAC-seq) and a sequencing analysis method for transposase-accessible chromatin (Assay for Transposase-Accessible Chromatin using sequencing, ATAC-seq) to create a new method for specifically analyzing transcription factor footprints in open chromatin regions, called qDeAC-ATAC-seq.
本方法利用Tn5以及SsdA联合依次处理通透化的细胞或者细胞核,通过利用Tn5的转座酶活性,打断较为开放区的染色质DNA,同时在染色质DNA的5’端加上Tn5的特殊接头(考虑到SsdA对于胞嘧啶碱基的脱氨作用,此处采用了不含胞嘧啶的Tn5接头序列,于是可以通过后续采用Tn5的接头序列引物特异性扩增位于染色质开放区的DNA片段,实现对开放区染色质的富集),同时利用单链DNA脱氨酶A(SsdA)脱氨双链DNA中胞嘧啶的活性:SsdA接触基因组DNA后导致DNA序列上裸露的胞嘧啶碱基脱氨变为尿嘧啶碱基。尿嘧啶在后续PCR扩增过程中,被识别为胸腺嘧啶,与腺嘌呤互补配对,最终被脱氨酶脱氨的胞嘧啶碱基会替换为胸腺嘧啶。这种突变通过一代sanger测序、二代测序(Next Generation Sequencing,NGS)或单分子长读长测序得以检测出来。鉴于染色质开放区域内的DNA受到更多转录因子的调控,因此通过Tn5富集开放区内的DNA片段,结合脱氨反应下产生的转录因子足迹,能够做到低测序成本条件下对于转录因子的结合位点分析。This method uses Tn5 and SsdA to treat permeabilized cells or cell nuclei in sequence, and uses the transposase activity of Tn5 to interrupt the chromatin DNA in the more open area, and at the same time, adds a special Tn5 adapter to the 5' end of the chromatin DNA (considering the deamination effect of SsdA on cytosine bases, a Tn5 adapter sequence without cytosine is used here, so that the DNA fragment located in the open area of chromatin can be specifically amplified by using the Tn5 adapter sequence primer in the subsequent process to achieve the enrichment of chromatin in the open area). At the same time, the activity of single-stranded DNA deaminase A (SsdA) to deaminize cytosine in double-stranded DNA is used: after SsdA contacts genomic DNA, the exposed cytosine base on the DNA sequence is deaminated and converted into uracil base. In the subsequent PCR amplification process, uracil is recognized as thymine, complementary to adenine, and the cytosine base deaminated by the deaminase is eventually replaced by thymine. This mutation can be detected by first-generation Sanger sequencing, second-generation sequencing (Next Generation Sequencing, NGS) or single-molecule long-read sequencing. Given that DNA in the open region of chromatin is regulated by more transcription factors, the DNA fragments in the open region are enriched by Tn5, and combined with the transcription factor footprints produced by the deamination reaction, the binding site analysis of transcription factors can be achieved under low sequencing cost conditions.
一般方法描述General method description
本发明的qDeAC-ATAC-seq技术可以包括两种实验策略(图21),其基本步骤分别如下:The qDeAC-ATAC-seq technology of the present invention can include two experimental strategies ( FIG. 21 ), and the basic steps thereof are as follows:
策略一:Strategy 1:
(A)通透处理或者抽核 (A) Permeabilization or nuclear extraction
使用包含一定浓度去垢剂的通透缓冲液处理细胞,在细胞膜上打孔提高膜的通透性,或者去掉胞质实现细胞核的抽提和通透。Treat cells with a permeabilization buffer containing a certain concentration of detergent to make holes in the cell membrane to increase membrane permeability, or remove the cytoplasm to extract and permeabilize the cell nucleus.
(B)Tn5攻击开放区反应(B) Tn5 attack open zone reaction
将步骤(A)中收集的细胞或核样品与提前包埋好接头序列的Tn5进行孵育,让转座酶充分地与开放的染色质DNA进行反应。The cells or nuclear samples collected in step (A) are incubated with Tn5 in which the adapter sequence is embedded in advance, so that the transposase can fully react with the open chromatin DNA.
(C)脱氨酶反应(C) Deaminase reaction
通过离心的方式去除Tn5的反应体系,收集沉淀的细胞/细胞核后,加入脱氨酶反应体系,让脱氨酶充分地与细胞中染色质DNA反应后,利用酚氯仿抽提基因组DNA。The Tn5 reaction system is removed by centrifugation, and the precipitated cells/nuclei are collected. The deaminase reaction system is then added to allow the deaminase to fully react with the chromatin DNA in the cells, and then the genomic DNA is extracted using phenol chloroform.
(D)抽提DNA与单链DNA建库(D) DNA extraction and single-stranded DNA library construction
终止反应后,抽提基因组DNA。无需超声打断,使用诺唯赞的DNA Methylation Library Kit for Illumina V3NE103试剂盒进行单链DNA建库处理,无需进行5’连接(ligation)处理,样品经过延伸后可直接进行扩增。首先利用含有Tn5接头同源序列的引物与i7引物进行第一步扩增,通过overlap PCR的方式在文库5’端加上与i5引物具有同源序列的片段后,PCR扩增利用i5和i7引物进行第二步扩增。文库构建流程示意图如图22。After the reaction is terminated, genomic DNA is extracted. No ultrasonic disruption is required. Use Novozyme The DNA Methylation Library Kit for Illumina V3NE103 kit is used to construct a single-stranded DNA library. No 5' ligation is required, and the sample can be directly amplified after extension. First, the first step of amplification is performed using primers containing a Tn5 adapter homologous sequence and an i7 primer. After adding a fragment with a homologous sequence to the i5 primer at the 5' end of the library by overlap PCR, PCR amplification uses i5 and i7 primers for the second step of amplification. The schematic diagram of the library construction process is shown in Figure 22.
策略二:Strategy 2:
(A)通透处理或者抽核(A) Permeabilization or nuclear extraction
使用包含一定浓度去垢剂的通透缓冲液处理细胞,在细胞膜上打孔提高膜的通透性,或者去掉胞质实现细胞核的抽提和通透。Treat cells with a permeabilization buffer containing a certain concentration of detergent to make holes in the cell membrane to increase membrane permeability, or remove the cytoplasm to extract and permeabilize the cell nucleus.
(B)SsdAtox脱氨反应(B) SsdAtox deamination reaction
将步骤(A)中收集的细胞或核样品与SsdAtox进行孵育,让脱氨酶充分地与开放的染色质DNA进行反应。The cells or nuclear samples collected in step (A) are incubated with SsdAtox to allow the deaminase to fully react with the open chromatin DNA.
(C)Tn5攻击开放区反应(C) Tn5 attack open zone reaction
通过离心的方式去除SsdAtox的反应体系,收集沉淀的细胞/细胞核后,加入Tn5反应体系,让Tn5转座酶充分地与细胞中染色质开放区DNA反应后,利用酚氯仿抽提基因组DNA。The SsdAtox reaction system was removed by centrifugation, and the precipitated cells/nuclei were collected. Then, the Tn5 reaction system was added to allow the Tn5 transposase to fully react with the DNA in the open chromatin region of the cells, and then the genomic DNA was extracted using phenol chloroform.
(D)抽提DNA与单链DNA建库(D) DNA extraction and single-stranded DNA library construction
终止反应后,抽提基因组DNA。无需超声打断,使用诺唯赞的DNA Methylation Library Kit for Illumina V3NE103试剂盒进行单链DNA建库处理,无需进行5’连接(ligation)处理,样品经过延伸后可直接进行扩增。首先利用含有Tn5接头同源序列的引物与i7引物进行第一步扩增,通过overlap PCR的方式在文库5’端加上与i5引物具有同源序列的片段后,PCR扩增利用i5和i7引物进行第二步扩增。文库构建流程示意图如图22。After the reaction is terminated, genomic DNA is extracted. No ultrasonic disruption is required. Use Novozyme The DNA Methylation Library Kit for Illumina V3NE103 kit is used to construct a single-stranded DNA library. No 5' ligation is required, and the sample can be directly amplified after extension. First, the first step of amplification is performed using primers containing a Tn5 adapter homologous sequence and an i7 primer. After adding a fragment with a homologous sequence to the i5 primer at the 5' end of the library by overlap PCR, PCR amplification uses i5 and i7 primers for the second step of amplification. The schematic diagram of the library construction process is shown in Figure 22.
具体实验方法Specific experimental methods
1).设计Tn5包埋接头序列 1). Design Tn5 embedding linker sequence
为了避免接头被后续的SsdAtox脱氨,借鉴参考了sci-MET中Tn5所使用的不包含胞嘧啶的adapter序列(图23),将其与Tn5ME序列进行退火后,与Tn5进行孵育,完成Tn5的接头包埋。In order to prevent the linker from being deaminated by the subsequent SsdAtox, the adapter sequence that does not contain cytosine used by Tn5 in sci-MET was referenced (Figure 23). After annealing with the Tn5ME sequence, it was incubated with Tn5 to complete the linker embedding of Tn5.
2).qDeAC-ATAC-seq流程2).qDeAC-ATAC-seq process
策略一:先进行Tn5反应再进行SsdAtox脱氨反应Strategy 1: Tn5 reaction first and then SsdAtox deamination reaction
(1)细胞收集与抽核处理:(1) Cell collection and nuclear extraction:
加入适量胰酶消化细胞,培养基中和后计数,从细胞间中转移至测序间,转移到1.5mL低吸附EP管中,常温,300×g离心4min,弃上清。加入1mL冷PBS+0.04%BSA洗涤1次,4℃,300×g离心4min,弃上清,尽量吸弃完全。按照50μl/50000细胞加入对应体积的Nuclei Extraction Buffer(10mM Tris-HCl 7.4,10mM NaCl,1% BSA,0.1%Tween-20,0.1% NP40,0.01%Digitonin,0.1mM EDTA,3mM MgCl2,1xPIC)重悬,轻柔吹打3次,冰上放置3min。终止反应:+1ml Final buffer(10mM Tris-HCl 7.4,10mM NaCl,1% BSA,0.1mM EDTA,3mM MgCl2,1xPIC),轻柔颠倒混匀3次,4℃,700×g,5min,颠倒离心1min,去除上清至约100ul,再次颠倒离心1min,彻底吸弃上清。Add appropriate amount of trypsin to digest cells, count after neutralization with culture medium, transfer from cell room to sequencing room, transfer to 1.5mL low adsorption EP tube, centrifuge at room temperature, 300×g for 4min, and discard supernatant. Add 1mL cold PBS + 0.04% BSA to wash once, centrifuge at 4℃, 300×g for 4min, discard supernatant, and try to discard completely. Add corresponding volume of Nuclei Extraction Buffer (10mM Tris-HCl 7.4, 10mM NaCl, 1% BSA, 0.1% Tween-20, 0.1% NP40, 0.01% Digitonin, 0.1mM EDTA, 3mM MgCl2, 1xPIC) according to 50μl/50000 cells, resuspend, gently blow 3 times, and place on ice for 3min. Terminate the reaction: add 1 ml Final buffer (10 mM Tris-HCl 7.4, 10 mM NaCl, 1% BSA, 0.1 mM EDTA, 3 mM MgCl2, 1xPIC), gently invert for 3 times to mix, 4 ° C, 700 × g, 5 min, centrifuge for 1 min, remove the supernatant to about 100 ul, invert and centrifuge again for 1 min, and completely discard the supernatant.
(2)Tn5打断反应:(2) Tn5 interruption reaction:
按照50μl体系(10mM Tris-HCl 7.6,5mM MgCl2,10% Dimethyl Formamide,33%DPBS,100mM Tn5)配好反应所需Mix,利用反应体系重悬细胞核,后37℃反应15min,设置热台1000rpm摇晃。Prepare the required mix for the reaction according to 50μl system (10mM Tris-HCl 7.6, 5mM MgCl2, 10% Dimethyl Formamide, 33% DPBS, 100mM Tn5), resuspend the cell nuclei using the reaction system, then react at 37℃ for 15min, and set the hot plate to shake at 1000rpm.
(3)SsdAtox脱氨反应:(3) SsdAtox deamination reaction:
加入500ul 1x SsdA Reaction Buffer(10mM Tris-HCl 7.4,1mM DTT,1x PCI)稀释Tn5的反应体系,后4℃,700×g,5min,颠倒离心1min,去除上清至约100μl,再次颠倒离心1min,彻底吸弃上清。加入50μl脱氨酶反应体系(10mM Tris-HCl 7.4,1mM DTT,1x PCI,0.1U/μl UGI,5U/μl SsdAtox),后37℃反应10min,设置热台1000rpm摇晃。Add 500ul 1x SsdA Reaction Buffer (10mM Tris-HCl 7.4, 1mM DTT, 1x PCI) to dilute the Tn5 reaction system, then centrifuge at 4℃, 700×g, 5min, invert for 1min, remove the supernatant to about 100μl, invert again for 1min, and completely discard the supernatant. Add 50μl deaminase reaction system (10mM Tris-HCl 7.4, 1mM DTT, 1x PCI, 0.1U/μl UGI, 5U/μl SsdAtox), react at 37℃ for 10min, set the hot plate to shake at 1000rpm.
(4)基因组DNA抽提(4) Genomic DNA extraction
之后进行热孵育,50ul反应体系,加200ul gDNA Hot Elution Buffer,65℃,1350rpm热震荡孵育1.5h。加入gDNA与等体积(250μl)的酚:氯仿:异戊醇DNA提取液,充分振荡混匀。室温离心13000g 5min。异丙醇沉淀获得gDNA,QUBIT测定浓度。Then, heat incubate the reaction system with 200ul gDNA Hot Elution Buffer at 65℃ and 1350rpm for 1.5h. Add gDNA and an equal volume (250μl) of phenol:chloroform:isoamyl alcohol DNA extract and shake thoroughly to mix. Centrifuge at room temperature at 13000g for 5min. Obtain gDNA by isopropanol precipitation and measure the concentration by QUBIT.
策略二:先进行SsdAtox脱氨反应再进行Tn5反应Strategy 2: SsdAtox deamination reaction first and then Tn5 reaction
(1)细胞收集与抽核处理:(1) Cell collection and nuclear extraction:
加入适量胰酶消化细胞,培养基中和后计数,从细胞间中转移至测序间,转移到1.5mL低吸附EP管中,常温,300×g离心4min,弃上清。加入1mL冷PBS+0.04%BSA洗涤1次,4℃,300×g离心4min,弃上清,尽量吸弃完全。按照50μl/50000细胞加入对应体积的Nuclei Extraction Buffer(10mM Tris-HCl 7.4,10mM NaCl,1% BSA,0.1%Tween-20,0.1% NP40,0.01%Digitonin,0.1mM EDTA,3mM MgCl2,1xPIC)重悬,轻柔吹打3次,冰上放置3min。终止反应:+1ml Final buffer(10mM Tris-HCl 7.4,10mM NaCl,1% BSA,0.1mM EDTA,1xPIC),轻柔颠倒混匀3次,4℃,700×g,5min,颠 倒离心1min,去除上清至约100ul,再次颠倒离心1min,彻底吸弃上清。Add appropriate amount of trypsin to digest cells, count after neutralization with culture medium, transfer from cell room to sequencing room, transfer to 1.5mL low adsorption EP tube, centrifuge at room temperature, 300×g for 4min, discard supernatant. Add 1mL cold PBS + 0.04% BSA to wash once, centrifuge at 4°C, 300×g for 4min, discard supernatant, and try to discard completely. Add corresponding volume of Nuclei Extraction Buffer (10mM Tris-HCl 7.4, 10mM NaCl, 1% BSA, 0.1% Tween-20, 0.1% NP40, 0.01% Digitonin, 0.1mM EDTA, 3mM MgCl2, 1xPIC) according to 50μl/50000 cells, resuspend, gently pipette 3 times, and place on ice for 3min. Stop the reaction: add 1 ml Final buffer (10 mM Tris-HCl 7.4, 10 mM NaCl, 1% BSA, 0.1 mM EDTA, 1xPIC), gently invert and mix 3 times, 4 ° C, 700 × g, 5 min, invert and Centrifuge for 1 min, remove the supernatant to about 100 ul, centrifuge again for 1 min, and completely discard the supernatant.
(2)SsdAtox脱氨反应:(2) SsdAtox deamination reaction:
按照50ul体系(10mM Tris-HCl 7.4,1mM DTT,1xPIC,0.1U/μl UGI,2U/μl SsdAtox)配好反应所需Mix,利用反应体系重悬细胞核,后37℃反应10min,设置热台1000rpm摇晃。Prepare the required mix for the reaction according to the 50ul system (10mM Tris-HCl 7.4, 1mM DTT, 1xPIC, 0.1U/μl UGI, 2U/μl SsdAtox), use the reaction system to resuspend the cell nuclei, then react at 37℃ for 10min, and set the hot plate to shake at 1000rpm.
(3)Tn5打断反应:(3) Tn5 interruption reaction:
加入500ul 1x SsdA Reaction Stop Buffer(10mM Tris-HCl 7.4,1mM DTT,1x PCI,5mM MgCl2)稀释脱氨反应体系,后4℃,700×g,5min,颠倒离心1min,去除上清至约100ul,再次颠倒离心1min,彻底吸弃上清。加入50μl Tn5反应体系(10mM Tris-HCl7.6,5mM MgCl2,10% Dimethyl Formamide,33%DPBS,100mM Tn5,0.1U/μl UGI),后37℃反应15min,设置热台1000rpm摇晃。Add 500ul 1x SsdA Reaction Stop Buffer (10mM Tris-HCl 7.4, 1mM DTT, 1x PCI, 5mM MgCl2) to dilute the deamination reaction system, then centrifuge at 4℃, 700×g, 5min, upside down for 1min, remove the supernatant to about 100ul, centrifuge again upside down for 1min, and completely discard the supernatant. Add 50μl Tn5 reaction system (10mM Tris-HCl7.6, 5mM MgCl2, 10% Dimethyl Formamide, 33% DPBS, 100mM Tn5, 0.1U/μl UGI), react at 37℃ for 15min, set the hot plate to shake at 1000rpm.
(4)基因组DNA抽提(4) Genomic DNA extraction
之后进行热孵育,50ul反应体系,加200ul gDNA Hot Elution Buffer,65℃,1350rpm热震荡孵育1.5h。加入gDNA与等体积(250μl)的酚:氯仿:异戊醇DNA提取液,充分振荡混匀。室温离心13000g 5min。异丙醇沉淀获得gDNA,QUBIT测定浓度。Then, heat incubate the reaction system with 200ul gDNA Hot Elution Buffer at 65℃ and 1350rpm for 1.5h. Add gDNA and an equal volume (250μl) of phenol:chloroform:isoamyl alcohol DNA extract and shake thoroughly to mix. Centrifuge at room temperature at 13000g for 5min. Obtain gDNA by isopropanol precipitation and measure the concentration by QUBIT.
3).文库建立3). Library establishment
根据测得的DNA浓度与实验需求选择起始DNA总量,使用诺唯赞的DNA Methylation Library Kit for Illumina V3NE103试剂盒进行单链DNA建库处理。The total amount of starting DNA was selected based on the measured DNA concentration and experimental requirements, and Novozymes DNA Methylation Library Kit for Illumina V3NE103 was used for single-stranded DNA library construction.
根据试剂盒步骤分别连接上3’Adapter后进行Extension,纯化后即可进行第一步扩增,设置程序扩增4轮,随后进行一步1x beads纯化,进行第二步扩增,以50ng起始量为例,则第二步扩增8轮,共计12轮。获得完整文库后进行核酸琼脂糖凝胶电泳,随后选择300-500bp大小的片段进行切胶回收,获得浓度合适的最终文库(详见图22)。According to the steps of the kit, the 3'Adapter was connected and Extension was performed. After purification, the first step of amplification could be performed. The program was set to amplify 4 rounds, followed by a 1x beads purification step, and the second step of amplification. Taking 50ng as an example, the second step of amplification was performed for 8 rounds, for a total of 12 rounds. After obtaining the complete library, nucleic acid agarose gel electrophoresis was performed, and then fragments of 300-500bp were selected for gel cutting and recovery to obtain the final library with appropriate concentration (see Figure 22 for details).
4).分析qDeAC-ATAC-seq数据,从而分析开放区内的转录因子结合情况4) Analyze qDeAC-ATAC-seq data to analyze transcription factor binding in open regions
通过生物信息学分析,分析对于细胞中染色质开放区的富集程度,同时计算碱基转换位点和数量,从而判断特定位点结合蛋白的情况。Through bioinformatics analysis, the enrichment of open chromatin regions in cells is analyzed, and the base transition sites and numbers are calculated to determine the situation of protein binding at specific sites.
2.1、使用策略一进行qDeAC-ATAC-seq可以富集开放区的同时检测开放区内的转录因子结合2.1. Using strategy 1 for qDeAC-ATAC-seq can enrich open regions and detect transcription factor binding in open regions
本实施例使用策略一对R1细胞系进行处理并建库测序。In this example, strategy 1 was used to process the R1 cell line and construct a library for sequencing.
首先,为了研究Tn5反应缓冲液中,不同组分对于开放区内不同转录因子结合的影响,本发明人尝试了让Tn5在不同的Tn5反应缓冲液中反应后,加入脱氨酶处理细胞。实验分组如下表1所示。First, in order to study the effects of different components in the Tn5 reaction buffer on the binding of different transcription factors in the open region, the inventors tried to react Tn5 in different Tn5 reaction buffers and then add deaminase to treat the cells. The experimental groups are shown in Table 1 below.
表1、Tn5反应缓冲液分组条件
Table 1. Tn5 reaction buffer grouping conditions
结果如图24和图25所示,发现在含有DMF的条件下Tn5才具有更高的富集开放区的效率。最终得出结论,10mM Tris-HCl 7.6,5mM MgCl2,10% DMF,33%PBS的条件下,对于TF depth影响最小。反应缓冲液中添加detergent会降低富集效率。The results are shown in Figures 24 and 25. It was found that Tn5 had a higher efficiency of enriching the open area under the condition of containing DMF. Finally, it was concluded that the condition of 10mM Tris-HCl 7.6, 5mM MgCl 2 , 10% DMF, 33% PBS had the least effect on TF depth. Adding detergent to the reaction buffer would reduce the enrichment efficiency.
通过qDeAC-ATAC-seq得到的文库测序结果来看,获得结果能够较为清楚得展示核小体的分布规律(图26),并且能够看到文库在ATAC peak center处深度的升高,说明对于开放区具有富集效果(图24)。From the library sequencing results obtained by qDeAC-ATAC-seq, the results can show the distribution pattern of nucleosomes more clearly (Figure 26), and the increase in the depth of the library at the ATAC peak center can be seen, indicating an enrichment effect on the open area (Figure 24).
此外,通过测试加入不同量的脱氨酶(5U/μl、7.5U/μl、15U/μl、30U/μl),对比组别之间的整体脱氨效率以及对于转录因子足迹的捕获情况发现,对于小鼠胚胎干细胞R1细胞系来说,5U/μl的脱氨酶浓度能够达到最好的转录因子结合足迹的捕获(图27)。对于不同的细胞系来说,最适酶量需要近一步测试。In addition, by testing the addition of different amounts of deaminase (5U/μl, 7.5U/μl, 15U/μl, 30U/μl), comparing the overall deamination efficiency and the capture of transcription factor footprints between groups, it was found that for the mouse embryonic stem cell R1 cell line, a deaminase concentration of 5U/μl can achieve the best capture of transcription factor binding footprints (Figure 27). For different cell lines, the optimal enzyme amount needs to be further tested.
2.2、qDeAC-ATAC-seq反应顺序测试2.2. qDeAC-ATAC-seq reaction sequence test
本实施例在K562细胞上测试先进行Tn5攻击随后加入脱氨酶处理(策略一),同时也测试先进行脱氨反应随后加入Tn5处理染色质(策略二)。由于脱氨酶同Tn5一样,均优先攻击开放区染色质DNA,所以开放区DNA的脱氨率最高,而脱氨导致的碱基突变会破坏DNA的完整双链结构,因此要平衡好两者,根据实验目的选择实验策略。In this example, the Tn5 attack was first performed on K562 cells, followed by the addition of deaminase treatment (strategy 1), and the deamination reaction was first performed and then the Tn5 treatment of chromatin was added (strategy 2). Since deaminase, like Tn5, preferentially attacks the chromatin DNA in the open region, the deamination rate of the DNA in the open region is the highest, and the base mutation caused by deamination will destroy the complete double-stranded structure of DNA, so it is necessary to balance the two and select the experimental strategy according to the experimental purpose.
策略一和二对比起来,各有优劣。策略一对于开放区拥有更高的富集程度,能更好的展现出细胞中的开放区域(图28,第2-3组),但是由于Tn5的前期处理,导致与DNA结合能力较弱的转录因子从DNA上掉落,导致这些转录因子的TF depth数值偏低(如图29中,NFYA,RFX1等)。策略二对于转录因子足迹的捕捉更为准确、真实,但是由于脱氨导致的DNA碱基变化,导致后续Tn5攻击开放区的能力降低,从而对细胞中开放区的富集效果较差,明显可见,Tn5对开放区的富集效率会随着加入脱氨酶总量的升高而降低(图28,第1以及4-7组)。Compared with strategies 1 and 2, each has its own advantages and disadvantages. Strategy 1 has a higher degree of enrichment for open areas and can better show the open areas in cells (Figure 28, Groups 2-3), but due to the previous treatment of Tn5, transcription factors with weak DNA binding ability fall off the DNA, resulting in low TF depth values of these transcription factors (such as NFYA, RFX1, etc. in Figure 29). Strategy 2 is more accurate and realistic in capturing transcription factor footprints, but due to the changes in DNA bases caused by deamination, the subsequent ability of Tn5 to attack open areas is reduced, resulting in poor enrichment of open areas in cells. It is obvious that the enrichment efficiency of Tn5 for open areas will decrease with the increase in the total amount of deaminase added (Figure 28, Groups 1 and 4-7).
此外,发明人还在在策略二的实验方法条件下也尝试了测试不同的脱氨酶量(图8,第1以及4-7组),通过对比策略一(图28第2-3组)采用的脱氨酶量,可以看出为了尽可能更大程度上富集开放区,提高Tn5的切割活性,策略二使用的酶量应当比策略一更低(图28、图29)。 In addition, the inventors also tried to test different amounts of deaminase under the experimental conditions of Strategy 2 (Figure 8, Groups 1 and 4-7). By comparing the amount of deaminase used in Strategy 1 (Figure 28, Groups 2-3), it can be seen that in order to enrich the open area to the greatest extent possible and improve the cutting activity of Tn5, the amount of enzyme used in Strategy 2 should be lower than that in Strategy 1 (Figures 28 and 29).
从上述实验结果可以看出,利用本发明的qDeAC-ATAC-seq实验方法能够同时进行细胞基因组染色质开放区的捕获以及染色质开放区内转录因子结合足迹的检测,得到的实验数据有高重复性。与基于脱氨酶在全基因组水平检测染色质开放性以及转录因子足迹的方法相比,降低了测序成本,在更低测序深度的情况下依然能够捕获转录因子的结合信息,实现测序成本的降低。而且,通过将qDeAC-seq与ATAC-seq技术联合使用,有望将qDeAC-seq特有的捕获转录因子足迹的优势发挥到单细胞水平,从而实现更高通量水平下,染色质可及性、转录因子足迹、转录组等多组学信息的捕捉,对于构建细胞命运转变、发育、癌变等动态变化过程中的基因调控网络具有重要的意义。As can be seen from the above experimental results, the qDeAC-ATAC-seq experimental method of the present invention can simultaneously capture the open chromatin area of the cell genome and detect the binding footprint of transcription factors in the open chromatin area, and the experimental data obtained have high repeatability. Compared with the method based on deaminase to detect chromatin openness and transcription factor footprints at the whole genome level, the sequencing cost is reduced, and the binding information of transcription factors can still be captured under the condition of lower sequencing depth, thereby reducing the sequencing cost. Moreover, by combining qDeAC-seq with ATAC-seq technology, it is expected that the unique advantage of qDeAC-seq in capturing transcription factor footprints can be brought into play at the single cell level, thereby achieving the capture of multi-omics information such as chromatin accessibility, transcription factor footprints, and transcriptomes at a higher throughput level, which is of great significance for constructing gene regulatory networks in dynamic changes such as cell fate transformation, development, and carcinogenesis.
本文中涉及的序列信息:Sequence information involved in this article:
>SEQ ID NO:1 SsdA
>SEQ ID NO:1 SsdA
>SEQ ID NO:2 DddAtox
>SEQ ID NO:2 DddAtox
>SEQ ID NO:3 His-SsdA
>SEQ ID NO:3 His-SsdA
>SEQ ID NO:4 His-smt3-SsdA
>SEQ ID NO:4 His-smt3-SsdA
>SEQ ID NO:5 SsdA tox
>SEQ ID NO:5 SsdA tox
>SEQ ID NO:6 Tn5转座酶
>SEQ ID NO:6 Tn5 transposase
>SEQ ID NO:7 Tn5 adapter-F
>SEQ ID NO:7 Tn5 adapter-F
>SEQ ID NO:8 Tn5 adapter-R
>SEQ ID NO:8 Tn5 adapter-R
Claims (57)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN2023122645 | 2023-09-28 | ||
| CNPCT/CN2023/122645 | 2023-09-28 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025066245A1 true WO2025066245A1 (en) | 2025-04-03 |
Family
ID=95077990
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2024/096659 Pending WO2025066245A1 (en) | 2023-09-28 | 2024-05-31 | Method for detecting chromatin accessibility or dna-binding protein footprints in cells |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN119709945A (en) |
| WO (1) | WO2025066245A1 (en) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190211404A1 (en) * | 2016-09-02 | 2019-07-11 | New England Biolabs, Inc. | Analysis of Chromatin Using a Nicking Enzyme |
| CN110372799A (en) * | 2019-08-01 | 2019-10-25 | 北京大学 | A kind of fusion protein and its application for the preparation of the unicellular library ChIP-seq |
| WO2022072393A1 (en) * | 2020-09-29 | 2022-04-07 | University Of Washington | Use of a double-stranded dna cytosine deaminase for mapping dna-protein interactions |
| WO2022212584A1 (en) * | 2021-04-01 | 2022-10-06 | University Of Washington | Bacterial dna cytosine deaminases for mapping dna methylation sites |
| WO2024065721A1 (en) * | 2022-09-30 | 2024-04-04 | Peking University | Methods of determining genome-wide dna binding protein binding sites by footprinting with double stranded dna deaminase |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11733248B2 (en) * | 2017-09-25 | 2023-08-22 | Fred Hutchinson Cancer Center | High efficiency targeted in situ genome-wide profiling |
| US12297426B2 (en) * | 2019-10-01 | 2025-05-13 | The Broad Institute, Inc. | DNA damage response signature guided rational design of CRISPR-based systems and therapies |
| CN115279917B (en) * | 2020-09-16 | 2025-10-10 | 深圳华大生命科学研究院 | Methods for multi-dimensional analysis of cellular epigenomics |
-
2024
- 2024-05-31 WO PCT/CN2024/096659 patent/WO2025066245A1/en active Pending
- 2024-06-19 CN CN202410799872.2A patent/CN119709945A/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190211404A1 (en) * | 2016-09-02 | 2019-07-11 | New England Biolabs, Inc. | Analysis of Chromatin Using a Nicking Enzyme |
| CN110372799A (en) * | 2019-08-01 | 2019-10-25 | 北京大学 | A kind of fusion protein and its application for the preparation of the unicellular library ChIP-seq |
| WO2022072393A1 (en) * | 2020-09-29 | 2022-04-07 | University Of Washington | Use of a double-stranded dna cytosine deaminase for mapping dna-protein interactions |
| WO2022212584A1 (en) * | 2021-04-01 | 2022-10-06 | University Of Washington | Bacterial dna cytosine deaminases for mapping dna methylation sites |
| WO2024065721A1 (en) * | 2022-09-30 | 2024-04-04 | Peking University | Methods of determining genome-wide dna binding protein binding sites by footprinting with double stranded dna deaminase |
Non-Patent Citations (2)
| Title |
|---|
| GALLAGHER LARRY A.; VELAZQUEZ ELENA; PETERSON S. BROOK; CHARITY JAMES C.; RADEY MATTHEW C.; GEBHARDT MICHAEL J.; HSU FOSHENG; SHUL: "Genome-wide protein–DNA interaction site mapping in bacteria using a double-stranded DNA-specific cytosine deaminase", NATURE MICROBIOLOGY, NATURE PUBLISHING GROUP UK, LONDON, vol. 7, no. 6, 1 June 2022 (2022-06-01), London, pages 844 - 855, XP037908818, DOI: 10.1038/s41564-022-01133-9 * |
| XU LAN, REN LI-CHENG: "Research Progress of Chromatin Accessibility Analysis", SHENGWU HUAXUE YU SHENGWU WULI JINZHAN - BIOCHEMISTRY AND BIOPHYSICS, KEXUE CHUBANSHE, BEIJING, CN, vol. 49, no. 8, 31 December 2022 (2022-12-31), CN , pages 1462 - 1470, XP093296617, ISSN: 1000-3282, DOI: 10.16476/j.pibb.2021.0313 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN119709945A (en) | 2025-03-28 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10731152B2 (en) | Method for controlled DNA fragmentation | |
| JP7100680B2 (en) | Systems and methods for clonal replication and amplification of nucleic acid molecules for genomic and therapeutic applications | |
| US20200332359A1 (en) | Transposition of Native Chromatin for Personal Epigenomics | |
| JP2024160269A (en) | Dislocations that maintain continuity | |
| CN104854246B (en) | Target enrichment without restriction enzymes | |
| US20150275267A1 (en) | Method and kit for preparing a target rna depleted sample | |
| CN105886608B (en) | ApoE gene primer set, detection kit and detection method | |
| JP2020522243A (en) | Multiplexed end-tagging amplification of nucleic acids | |
| CN110886021B (en) | A method for constructing a single-cell DNA library | |
| CN102105586A (en) | Method for detecting or quantifying DNA | |
| WO2017181880A1 (en) | Method for constructing dna sequencing library for to-be-detected genome, and applications thereof | |
| CN105658813A (en) | Chromosome conformation capture method including selection and enrichment steps | |
| CN117402951A (en) | Genome-wide identification of chromatin interactions | |
| CN113466444A (en) | Chromatin conformation capture method | |
| JP2023537850A (en) | Sequence-specific targeted transposition and selection and selection of nucleic acids | |
| CN102037140A (en) | DNA methylation assay method | |
| JP2023547394A (en) | Nucleic acid detection method by oligohybridization and PCR-based amplification | |
| CN113462748A (en) | Preparation method and kit of DNA sequencing library | |
| WO2025066245A1 (en) | Method for detecting chromatin accessibility or dna-binding protein footprints in cells | |
| WO2004011606A2 (en) | Macromolecular protection assay | |
| US20240287586A1 (en) | Product and method for analyzing omics information of sample | |
| US20250145988A1 (en) | Methods of enriching nucleic acids | |
| US11136576B2 (en) | Method for controlled DNA fragmentation | |
| US20240352452A1 (en) | Crispr-based protein barcoding and surface assembly | |
| US20120003663A1 (en) | Method of rapidly quantifying hydroxymethylated DNA |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24869807 Country of ref document: EP Kind code of ref document: A1 |