WO2023225352A2 - Méthodes d'évaluation d'effets d'édition génique - Google Patents
Méthodes d'évaluation d'effets d'édition génique Download PDFInfo
- Publication number
- WO2023225352A2 WO2023225352A2 PCT/US2023/022982 US2023022982W WO2023225352A2 WO 2023225352 A2 WO2023225352 A2 WO 2023225352A2 US 2023022982 W US2023022982 W US 2023022982W WO 2023225352 A2 WO2023225352 A2 WO 2023225352A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- cell
- genome
- seq
- analysis
- cancer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2310/00—Structure or type of the nucleic acid
- C12N2310/10—Type of nucleic acid
- C12N2310/20—Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
Definitions
- the invention relates to methods for assessing the effects of in vivo gene editing technologies, particularly in relation to therapeutic CRISPR/Cas gene editing.
- RNA-targeted endonucleases such as the clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated system (Cas) gene-editing system represent a promising tool for therapeutic genome manipulation.
- CRISPR clustered regularly interspaced short palindromic repeat
- Cas CRISPR-associated system
- AAV adeno-associated viral
- LNPs non-viral
- Peptide based nanoparticles the industry has not yet achieved 100% targeted delivery.
- AAVs have better tissue targeting ability, however several natural AAVs have the ability to cross physiological barriers to transduce non target tissues.
- Non-viral vectors such as lipidoid nanoparticles (LNPs) are able to carry larger payloads, but are non-specifically targeted. Under in vivo conditions, they show wide biodistribution and tend to congregate in the liver.
- LNPs lipidoid nanoparticles
- ligands, aptamers etc. can be used to make LNPs somewhat more specific but can increase immunogenicity. Whilst AAVs have more tissue specific targetability, they are severely restricted by size of the payload, with a typical AAV payload limited to less than 5 kb of DNA.
- the invention provides a method for determining the effect of a gene editing event upon the genome of a cell that has been modified by the gene editing event, the method comprising: i. on a sample comprising the modified cell, undertaking a first genomic analysis; ii. on a sample comprising a reference cell that has not been subjected to a gene editing event, undertaking a second genomic analysis; ill. comparing the first genomic analysis with the second genomic analysis; and iv. determining whether there has been a change in the genome of the modified cell.
- the invention further comprises a step (v) in which if a change in the genome of the modified cell is identified, correlating the change with a clinical outcome.
- a second aspect of the invention provides for a method for determining the effect of a gene editing event upon the genome of a cell that has been modified by a gene editing event, the method comprising: a. on a sample comprising the modified cell, undertaking a first analysis that comprises identifying whether an epigenetic modification has occurred within the genome of the modified cell when compared to a reference cell that has not been subjected to a gene editing event; and b. determining whether there has been an epigenetic change in the genome of the modified cell.
- the invention provides a method for determining the effect of a gene editing event upon the epigenomic environment in the cell.
- Figure 1 shows a pipeline for an embodiment of the invention that can be followed in order to estimate the long-term impact of a gene editing event in a cell or cell line obtained from a subject.
- Figure 2 shows a schematic of the transcriptomic RNA-seq pipeline
- FIG 3 shows a schematic of the bioinformatic whole exome sequence (WES) pipeline
- Figure 4 shows a diagram depicting the global impact of a single gene editing event
- Figure 5 shows a graph depicting intra-chromosomal contact frequencies between edited sites are compared to the average. Dashed line smoothed average intra-chromosomal contact frequency; black points, intra-chromosomal contact frequency between CRISPR/Cas edited sites; black line, linear regression line of the intra-chromosomal contact frequency between CRISPR/Cas edited sites with 95% confidence interval in translucent bands.
- Figure 8 shows a graph depicting contact frequency of an off-target RO1 1 -469H8.6 in hippocampus and astrocytes.
- the arrow indicates RO11 -469H8.6 gene location. Peaks represent bias-removed contact frequency (primary y-axis). Points represent distance normalized contact frequency (secondary y-axis). Grey bands represent regions that are highly expressed in brain and have high distance normalized contact frequency to the RO11 -469H8.6 gene. Genes highly expressed in brain are circled.
- Figure 9 shows a graph depicting the relationship between intra-chromosomal contact frequency and transcription changes post CRISPR/Cas assay.
- Figure 9A depicts the intra- chromosomal contact frequency of each gene promoter was compared to the average. Dashed line represents smoothed average intra-chromosomal contact frequency. Darker shading of the point represents Iog2 fold change of the RNA-seq data between control and CRISPR assay.
- Figure 9B depicts the transcriptional change (Iog2 fold change) of each gene between control and CRISPR assay was compared to intra-chromosomal contact frequency between the gene’s promoter region and the CRISPR/Cas target.
- Figure 10 is a graph depicting indel frequency in CRISPR/Cas edits.
- Figure 10(A) depicts potential false negative off-target with six mismatch with high chromatin accessibility.
- Figure 10(B) depicts potential false positive off-target with two mismatches with low chromatin accessibility.
- the term ‘comprising’ means any of the recited elements are necessarily included and other elements may optionally be included as well. ‘Consisting essentially of means any recited elements are necessarily included, elements that would materially affect the basic and novel characteristics of the listed elements are excluded, and other elements may optionally be included. 'Consisting of means that all elements other than those listed are excluded. Embodiments defined by each of these terms are within the scope of this invention.
- operably linked refers to the joining of distinct DNA molecules, or DNA sequences, to produce a functional transcriptional unit.
- DNA sequences for example in an expression vector or a recombinantly modified gene construct, it indicates that the sequences are arranged, or juxtaposed, so that they function cooperatively in order to achieve their intended purposes, e.g. a promoter sequence allows for initiation of transcription that proceeds through a linked coding sequence as far as a termination sequence.
- a ‘polynucleotide’ is a single or double stranded covalently-linked sequence of nucleotides in which the 3' and 5' ends on each nucleotide are joined by phosphodiester bonds.
- the polynucleotide may be made up of deoxyribonucleotide bases or ribonucleotide bases.
- Polynucleotides include DNA and RNA and may be manufactured synthetically in vitro or isolated from natural sources. Sizes of polynucleotides are typically expressed as the number of base pairs (bp) for double stranded polynucleotides, or in the case of single stranded polynucleotides as the number of nucleotides (nt).
- oligonucleotides One thousand bp or nt equal a kilobase (kb). Polynucleotides of less than around 40 nucleotides in length are typically called “oligonucleotides”. The term further includes known types of chemical modifications, for example, labels which are known in the art, methylation, caps, substitution of one or more of the naturally occurring nucleotides with nucleotide modifications such as pseudouridine, or those with uncharged linkages (e.g., phosphorothioates, phosphorodithioates, etc.), those containing nucleotide analogs (e.g., peptide nucleic acids and locked nucleic acids), as well as unmodified forms of the polynucleotide.
- labels which are known in the art, methylation, caps, substitution of one or more of the naturally occurring nucleotides with nucleotide modifications such as pseudouridine, or those with uncharged linkages (e.g., phosphorot
- sequence referred to as upstream of a given reference point in a gene such as the transcription start codon of an open reading frame (ORF)
- sequence that is 5’ to the reference point is sequence that is 5’ to the reference point.
- sequence denoted as downstream is 3’ to the reference point.
- the term 'gene expression control sequence comprises regulatory sequences, sometimes referred to as a cis-regulatory element (CRE) and includes promoters, ribosome binding sites, enhancers, silencers and insulators and other control elements which regulate transcription of a gene or translation of a resultant mRNA.
- the gene expression control sequences confer tissue or cell-type specificity that assist in determining the phenotype of the cell.
- Gene expression control sequences may also contribute to regulation of gene expression levels.
- the expression level of a particular gene can be considered as the amount of mRNA and/or polypeptide produced from that particular gene.
- Gene expression levels can refer to an absolute (e.g., molar or gram-quantity) abundance of mRNA or polypeptide, or a relative (e.g., the amount relative to a standard, reference, calibration, or to another gene expression level).
- Cell-type specificity refers to the observable characteristics or traits of a particular cell, such as its morphology, development, biochemical or physiological properties, phenology, or behaviour.
- the cell-type may refer to the ‘phenotype’ of the cell and results primarily from the expression of the genes within the cell as well as any influence from external/environmental factors, such as disease pathogens or physical stresses (e.g. hypoxia, hypo- or hyperthermia and/or dehydration).
- a genetic regulatory element that confers cell or tissue type specificity may be defined as a tissue-specific regulatory element.
- tissue-specific regulators may include promoter sequences that direct gene expression primarily in a desired tissue of interest.
- Tissues or cells may be comprised within organ systems within the body, such as but not limited to those selected from the group consisting of: muscle; liver; central nervous system (CNS); brain; breast; endothelium; pancreas; oesophagus; colon; gastrointestinal tract; kidney; lung; spleen; skin; heart; thyroid; lymphatic tissue; cardiovascular; eye; bone marrow; blood; connective tissue; bladder; reproductive organs; and placenta.
- organ systems within the body, such as but not limited to those selected from the group consisting of: muscle; liver; central nervous system (CNS); brain; breast; endothelium; pancreas; oesophagus; colon; gastrointestinal tract; kidney; lung; spleen; skin; heart; thyroid; lymphatic tissue; cardiovascular; eye; bone marrow; blood; connective tissue; bladder; reproductive organs; and placenta.
- organ systems such as but not limited to those selected from the group consisting of: muscle; liver; central nervous system (CN
- diseased indicates tissues and organs (or parts thereof) and cells which exhibit an aberrant, non-healthy or disease pathology.
- diseased cells may be infected with a virus, bacterium, prion or eukaryotic parasite; may comprise deleterious mutations; and/or may be cancerous, precancerous, tumoral or neoplastic.
- disease cells may be pathologically normal but comprise an altered intra-cellular miRNA environment that represents a precursor state to disease.
- Diseased tissues may comprise healthy tissues that have been infiltrated by diseased cells from another organ or organ system.
- many inflammatory diseases comprise pathologies where otherwise healthy organs are subjected to infiltration with immune cells such as T cells and neutrophils.
- organs and tissues subjected to stenotic or cirrhotic lesions may comprise both healthy and diseased cells in close proximity.
- cancer refers to neoplasms in tissue, including malignant tumours which may be primary cancer starting in a particular tissue, or secondary cancer having spread by metastasis from elsewhere.
- the terms cancer, neoplasm and malignant tumours are used interchangeably herein.
- Cancer may denote a tissue, or a cell located within a neoplasm or with properties associated with a neoplasm.
- Neoplasms typically possess characteristics that differentiate them from normal tissue and normal cells. Among such characteristics are included, but not limited to: a degree of anaplasia, changes in morphology, irregularity of shape, reduced cell adhesiveness, the ability to metastasize, and increased cell proliferation.
- cancer terms pertaining to and often synonymous with ‘cancer’ include sarcoma, carcinoma, malignant tumour, epithelioma, leukaemia, lymphoma, transformation, neoplasm and the like.
- cancer includes premalignant, and/or precancerous tumours, as well as malignant cancers.
- healthy indicates tissues and organs (or parts thereof) and cells which are not themselves diseased and approximate to a typically normal functioning phenotype. It can be appreciated that in the context of the invention the term ‘healthy’ is relative, as, for example, non-neoplastic cells in a tissue affected by tumours may well not be entirely healthy in an absolute sense. Therefore ‘non-healthy cells’ is used mean cells which are not themselves neoplastic, cancerous or pre-cancerous but which may be cirrhotic, inflamed, or infected, or otherwise diseased for example.
- promoter denotes a genetic regulatory element in a DNA sequence to which an RNA polymerase will bind and initiate transcription of the DNA. Promoters are commonly, but not always, located in the 5’ non-coding regions of genes.
- enhancer denotes a genetic regulatory element in a DNA sequence that, when bound by one or more transcription factors, enhances the transcription of an associated gene. Enhancer sequences may be located upstream or downstream from an associated gene, enhancer sequences can be positioned in both forward or reversed sequence orientations and still affect gene transcription.
- silencer denotes a genetic regulatory element in a DNA sequence that reduces transcription from an associated promoter; typically they are the repressive counterparts of an enhancer.
- a silencer may be a bifunctional regulatory element that can also act as an enhancer, depending upon cellular context.
- insulator is used to refer to genetic regulatory elements that have evolved as a complementary mechanism for structurally and functionally distinguishing regions of euchromatin from heterochromatin.
- insulator elements are positioned peripherally with respect to a given transcriptional unit - e.g. a gene. Insulators function by establishing boundaries between neighbouring transcriptional units to prevent encroachment by adjacent regions of heterochromatin. Insulators may also function as gatekeepers in permitting or preventing access to a transcription unit by transcriptional regulatory proteins.
- Insulators may serve at least two functions that contribute to cell-type specificity: (1 ) providing a protective shield against deleterious effects of neighbouring enhancer regions on the transcriptional activity of a gene, and (2) facilitating or to amplifying the activity of distantly positioned, multi-element enhancer complexes or locus control regions within a given transcriptional unit.
- Gene editing refers to a type of genetic engineering in which the nucleotide sequence of a target polynucleotide is changed through introduction of deletions, insertions, or base substitutions to the polynucleotide sequence.
- Examples of gene editing technologies include specific nucleases, or non-nuclease-based methods. Nuclease-based methods include Zinc fingers (ZFNs), tai effector nucleases (TALENs), meganucleases or meganuclease-TALEN fusions (MegaTALEs), or clustered regularly interspaced short palindromic repeats (CRISPR) delivered with a Cas nuclease.
- ZFNs Zinc fingers
- TALENs tai effector nucleases
- MegaTALEs meganucleases or meganuclease-TALEN fusions
- CRISPR clustered regularly interspaced short palindromic repeats
- non-nuclease-mediated gene editing examples include chimeric transcription factors, chimeric chromatin modifiers, forced DNA looping or biodegradable nanoparticles.
- Genome editing may include correcting or restoring a mutant gene.
- Genome editing may include knocking out a gene, such as a mutant gene or a normal gene.
- Genome editing may be used to treat disease or enhance tissue repair by changing the gene of interest.
- the methods detailed herein are for use in somatic cells and not germ line cells.
- the ‘accessibility’ of a DNA comprising region of chromatin also referred to as ‘accessible DNA’ interchangeably, refers to the ability of a particular locus within a chromosome of a cell to be contacted and modified by a particular DNA cleaving or modifying agent - such as an RNA-guided endonuclease complex.
- a DNA cleaving or modifying agent - such as an RNA-guided endonuclease complex.
- chromatin structure comprised within a given DNA region will affect the efficiency of genetic modification, such as through gene editing, for that particular DNA region.
- the DNA region may be comprised within condensed heterochromatin that prevents or reduces access of the gene editing agent to the DNA in the region of interest.
- Accessibility can therefore be considered as a function of the quantity or efficiency of DNA cleavage or modification, such as via the action of a DNA endonuclease.
- Relative accessibility between two DNA regions can be determined by comparing (e.g., generating a ratio) of the amount of cleavage or modification between the two regions, or loci.
- chromatin refers to the condensation of genomic DNA into an organized complex of chromosomal DNA associated with histone proteins found in eukaryotic cells.
- Heterochromatin refers to a condensed and tightly packed form of chromatin and is characterized by its transcriptionally repressive state, which prevents the expression of genes in these regions. It is typically located near the centromeres and telomeres of chromosomes, and plays important roles in chromosome organization, DNA replication, and overall genome stability. Heterochromatin can be distinguished from its less condensed counterpart, euchromatin, by its dark staining properties in microscopy and its relative inaccessibility to enzymes involved in DNA transcription and repair.
- heterochromatin may refer to transcriptionally inactive regions of a chromosomal DNA consisting of highly condensed DNA/Histone complexes, called nucleosomes, that are insensitive to endonuclease treatment, e.g. with DNAse I. Heterochromatin can be characterized by detecting the deacetylation states of Histone 3 and Histone 4 and the methylation state of Histone 3 at lysine 9 (i.e. H3K9 methylation). In contrast, ‘euchromatin’ refers to a more accessible genomic region enriched with less condensed chromatin.
- a euchromatic region is a genomic region that is hypersensitive to nuclease digestion, e.g., by DNAse I or micrococcal nuclease.
- genomic regions may be identified using DNase-Seq (DNase I hypersensitive sites sequencing), which is based on sequencing of regions sensitive to cleavage by DNase I.
- DNase-Seq DNase I hypersensitive sites sequencing
- a genomic region or locus that is relatively depleted of nucleosomes.
- euchromatic regions may be identified using FAIRE-Seq (Formaldehyde-Assisted Isolation of Regulatory Elements), which is based on an observation that formaldehyde cross-linking is more efficient in nucleosome-bound DNA than it is in nucleosome-depleted regions of the genome.
- FAIRE-Seq Formaldehyde-Assisted Isolation of Regulatory Elements
- This method segregates the non-cross-linked DNA that is usually found in open chromatin, which is then sequenced.
- the protocol typically involves cross linking, phenol extraction and sequencing DNA in aqueous phase.
- a euchromatic region is comprised of a genomic region that is enriched in methylated histones (e.g., methylated Histone H1 , H2A, H2B, H3 or H4) compared to an appropriate control.
- an appropriate control is a corresponding genomic region in a reference cell type or tissue, e.g. an undifferentiated or less differentiated cell, or terminally differentiated cell.
- aberrant chromatin formation may include the formation of heterochromatin in regions of the genome where it is not normally found.
- heterochromatin is typically associated with transcriptional repression, and aberrant heterochromatin formation can lead to the silencing of genes that are necessary for normal cellular function. This can contribute to a variety of disease states, including cancer, developmental disorders, and neurological disorders.
- aberrant chromatin formation is the formation of aberrant histone modifications, which can result in altered gene expression and contribute to disease. For example, abnormal histone modifications have been implicated in the development of cancer, as they can lead to the activation of oncogenes and the silencing of tumour suppressor genes.
- RNA-based molecules that are capable of forming a complex with an RNA-guided endonuclease complex, such as a CRISPR-Cas protein.
- a gRNA typically comprises a guide sequence having sufficient complementarity with a target nucleic acid sequence to hybridize with the target nucleic acid sequence and direct sequence-specific binding of the complex to the target nucleic acid sequence.
- the guide molecule or guide RNA may encompass RNA-based molecules having one or more chemical modifications, including synthetic bases, or by chemical linking two ribonucleotides or by replacement of one or more ribonucleotides with one or more deoxyribonucleotides).
- the disclosure provides a guide nucleic acid suitable for use in a CRISPR/Cas system.
- a gRNA binds to a Cas protein and targets the Cas protein to a specific location within a target nucleic acid.
- a gRNA comprises a nucleic acid-targeting segment and a Cas protein binding segment.
- a guide nucleic acid comprises a single nucleic acid molecule, referred to as a single guide nucleic acid (sgRNA).
- a guide nucleic acid comprises two separate nucleic acid molecules, referred to as a double guide nucleic acid.
- homology to any of the nucleic acid sequences is not limited simply to 100%, 99%, 98%, 97%, 95%, 90%, 85% or even 80% sequence identity.
- Optimal alignments may be determined with the use of any suitable algorithm for aligning sequences, nonlimiting example of which include the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows-Wheeler Transform (e.g., the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies; available at www.novocraft.com), ELAND (Illumina, San Diego, Calif.), SOAP (available at soap.genomics.org.cn), and Maq (available at maq.sourceforge.net).
- nucleic acid sequences can demonstrate biochemical equivalence to each other despite having apparently low sequence identity.
- homologous nucleic acid sequences are considered to be those that will hybridise to common target sequence under conditions of low stringency (Sambrook J. et al, Molecular Cloning: a Laboratory Manual, Cold Spring Harbor Press, Cold Spring Harbor, NY). However, it may be desired in some cases to distinguish between two sequences which can hybridise to common target sequence but contain some mismatches - an “inexact match”, “imperfect match”, or “inexact complementarity” - and two sequences which can hybridise to the target with no mismatches - an “exact match”, “perfect match”, or “exact complementarity”.
- a sequence capable of hybridizing with a given target sequence is referred to as the “complement” of the given sequence.
- thymine (T) and uracil (U) may be considered equivalent.
- target sequence in the context of formation of an RNA-guided endonuclease complex, refers to a sequence to which a guide sequence is configured to target, e.g. have complementarity with where hybridization between a target sequence and a guide sequence promotes the formation of a endonuclease complex, such as a CRISPR/Cas complex.
- a target sequence may comprise any polynucleotide, such as DNA or RNA polynucleotides.
- a target sequence is located in the nucleus or cytoplasm of a cell, and may include nucleic acids in or from mitochondrial, organelles, vesicles, liposomes or particles present within the cell.
- the target sequence will be comprised within a tissue specific region of a chromosome within a cell.
- the target sequence will be comprised within an accessible chromatin region, such as within a locus that is active within a specific cell type, and that is uniquely accessible within the cell-type or tissue type, thereby conferring a level of phenotypic specificity to a gRNA that binds to the target sequence.
- the target sequence may be comprised within candidate nucleic acid sequences and/or tissue specific candidate sequences identified via the methods of the present invention.
- RNA guided endonucleases are consistent with the gene editing performed and analysed by the methods of the present disclosure. Typically these sequence guided endonucleases fall within the general disclosure of a CRISPR/Cas endonuclease system.
- CRISPR/Cas endonuclease system refers collectively to transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated (“Cas”) genes, including sequences encoding a Cas gene, a tracr (trans-activating CRISPR) sequence (e.g.
- tracrRNA or an active partial tracrRNA a tracr-mate sequence (encompassing a “direct repeat” and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system), a guide sequence (also referred to as a “spacer” in the context of an endogenous CRISPR system), or other sequences and transcripts from a CRISPR locus.
- one or more elements of a CRISPR system is derived from a type I, type II, or type III CRISPR system.
- one or more elements of a CRISPR system is derived from a particular organism comprising an endogenous CRISPR system, such as those described in more detail below.
- a CRISPR/Cas endonuclease system is characterized by elements that promote the formation of a CRISPR complex at the site of a target sequence as defined herein.
- the target sequence may be associated with a PAM (protospacer adjacent motif); that is, a short sequence recognized by the CRISPR complex as the site for cleavage of the DNA.
- PAM protospacer adjacent motif
- the precise sequence and length requirements for the PAM differ depending on the CRISPR enzyme used, but PAMs are typically 2-5 base pair sequences located adjacent to a protospacer - i.e. the target sequence.
- the endonuclease is selected from Cas9, Cpfl, c2cl, C2c2, Casl3, c2c3, Casl, CasIB, Cas2, Cas3, Cas4, Cas5, Cas5e (CasD), Cas6, Cas6e, Cas6f, Cas7, Cas8, Cas8a, Cas8al, Cas8a2, Cas8b, Cas8c, Csnl, Csxl2, Cas9, Casi o, Cas10d, Cas12a, Cas12b, Cas12c, Cas12d, Cas12e, Cas13a, Cas13b, Cas13c, Cas13d, CasF, CasG, CasH, Csyl, Csy2, Csy3, Csel (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC
- RNA-guided endonucleases are modified versions of the wildtype form, for example, comprising an amino acid change such as a deletion, insertion, substitution, variant, mutation, fusion, chimera, or any combination thereof, relative to a wild-type version of the protein.
- the endonuclease comprises a region exhibiting at least 70% identity over at least 70% of its residues to a Cas9 domain or a Cpfl domain.
- the Cas9 is selected from the group consisting of SpCas9 SaCas9, StCas9, NmCas9, FnCas9, and CjCas9.
- the region is a Cpfl domain, optionally a MAD-7.
- RNA-guided nucleases of the types disclosed herein are derived either directly or modified from a number of possible sources. Such endonucleases may be eubacterial, archaeal, or thermostable in origin. In specific embodiments, the programmable endonuclease is derived from a species selected from the group consisting of Streptococcus pyogenes (S.
- Streptococcus thermophilus Streptococcus sp., Staphylococcus aureus, Nocardiopsis rougevillei, Streptomyces pristinae spiralis, Streptomyces viridochromo genes, Streptomyces viridochromogenes, Streptosporangium roseum, Streptosporangium roseum, Alicyclobacillus acidocaldarius, Bacillus pseudomycoides, Bacillus selenitireducens, Exiguobacterium sibiricum, Lactobacillus delbrueckii, Lactobacillus salivarius, Microscilla marina, Burkholderiales bacterium, Polaromonas naphthalenivorans, Polaromonas sp., Crocosphaera watsonii, Cyanothece sp., Microcystis aeruginosa, Pseudomonas aeruginosa,
- Subject may mean either a human or non-human animal.
- the term includes, but is not limited to, mammals (e.g., humans, other primates, pigs, rodents (e.g., mice and rats or hamsters), rabbits, guinea pigs, cows, horses, cats, dogs, sheep, and goats).
- the subject is a human.
- the subject is agricultural livestock or poultry.
- the subject is a fish, including farmed fish stocks.
- Chromosomal contact frequency refers to the frequency with which different regions of a chromosome come into physical proximity with one another.
- Intra-chromosomal contact frequency suitably describes how frequently different regions of the same chromosome interact and come into close physical proximity with each other. This can occur, for example, when two distal regions of a chromosome fold and loop together, bringing different parts of the chromosome into contact with one another.
- inter-chromosomal contact frequency describes how frequently different regions of different chromosomes come into physical proximity with one another. This can occur when two chromosomes are physically close to one another in the cellular nucleus and are brought into contact as a result.
- Chromosomal contact frequency can be studied using a variety of techniques as described herein, including Hi-C and 3C, which allow mapping of the interactions between different regions of the genome and identification of regions of the genome that are in close physical proximity with one another. Understanding chromosomal contact frequency is important for understanding the three-dimensional organization of the genome and for studying the regulation of gene expression.
- chromosomal contact frequency can have significant effects on genome organization and function, which in turn may contribute to the development of disease.
- changes in chromosomal contact frequency have been linked to several diseases, including cancer.
- cancer alterations in chromosomal contact frequency are commonly observed in a range of different types of tumours, including those linked to MYC binding sites in promoters and enhancers (See et al Genome Res. (2022) Apr;32(4):629-642). These alterations can affect the expression of key oncogenes and tumour suppressor genes, as well as contribute to genomic instability.
- the term "on-target” editing event refers to a gene edit that occurs at a location of a target gene, sequence or nucleic acid to which a target specific gene editor complementarily binds
- the term "off-target” as used herein refers to a sequence or position of a target or non-target gene or nucleic acid to which a target-specific gene editor fully or partially binds, but where undesired editing activity occurs. Consequently, off-target effects are defined as undesired editing outcomes, outside of their intended target scope, i.e., unintentional cleavage and/or mutations at non-directed genomic therapeutic sites.
- the non-directed genomic site often has a similar, but not an identical, sequence to the directed target genomic site.
- the non-directed genomic site is also known as an off-target site, even though the target sequence may be similar.
- off-target sites may be identified for example by determining the number of base mismatches between the guide RNA and the off-target site.
- High mismatches refer to a mismatch of three or more, for example three, four, five, six, etc. nucleotides.
- target sites may have no mismatch or a very low mismatch, for example a maximum of two mismatches, suitably only a single mismatch.
- an off-target editing event corresponds to a gene editing occurring at a sequence or location of a gene or nucleic acid that is not targeted by a target specific base editor, or a nucleic acid sequence that has less than 100% sequence homology with the nucleic acid sequence of the on-target.
- the off target site has sequence homology with the target site of less than 99%, less than 98%, less than 95%, less than 90%, less than 85%, or even less than 80%.
- the off-target nucleic acid sequence having less than 100% sequence homology with the on-target nucleic acid sequence is typically a nucleic acid sequence similar to the on-target nucleic acid sequence, but may include one or more additional nucleotides and/or has one or more nucleotides deleted.
- a “secondary effect” may refer to any unintended effect of gene editing, including but not limited to off-target effects, changes to the 3D organization of the genome, chromatin conformation, transcriptome, proteome, associated pathway changes, changes in clinical outcome, etc.
- ‘Short-term’ as used herein may mean hours or days after the gene editing event has taken place. ‘Long-term’ may mean days, weeks, months or years after the gene editing event has taken place. In some embodiments, long-term may mean for the remaining life of the subject.
- a methodology, or ‘platform’ is built to assess the impact of a given genomic editing event not being limited to the coding region, intronic region, splice site junction, or intergenic region.
- the methodology allows for assessment of the effects of such an edit upon deeper transcriptional and post transcriptional changes within the genome of the individual subject.
- the platform combines inferences drawn from a range of specific sequencing methodologies that may be conducted upon a sample of tissue obtained from the individual subject post-edit.
- the sequencing methodologies may include:
- ATAC-seq is a method to investigate chromatin accessibility in a sample.
- the genome is treated with a transposase (enzyme) called T n5.
- T n5 marks open chromatin regions by cutting and inserting adapter sequences which can then be detected by later sequencing.
- ATAC-seq shows utility in assessing changes in genome wide chromatin accessibility post editing event (Buenrostro et al. Curr Protoc Mol Biol. (2015) ;2015:21 .29.1-21 .29.9).
- Chromosome conformation capture (3C) methodologies such as Hi-C analysis may be used to assess chromatin accessibility, 3D organization of the genome and interconnectivity to identify any changes to chromatin accessibility and 3D architecture of the genome including the local chromosome neighbourhood and/or transcription factories (Lieberman-Alden et al. Science. 2009 Oct 9; 326(5950): 289- 293)
- ChlP-seq Chromatin immunoprecipitation followed by sequencing.
- Genome-wide analysis of histone modifications such as enhancer analysis and genome-wide chromatin state annotation, enables systematic analysis of how the epigenomic landscape contributes to cell identity, development, lineage specification, and disease.
- post edit impact on histone modifications should be assessed such as, but not limited to, regulatory elements (H3K27Ac, H3K4Me1 ), promoter accessibility (H3K4Me3), formation of heterochromatin (H3K9Me3), gene bodies (H3K36Me3, H3K27Me3) - see Furey (2012) Nat. Rev. Genet., 13 (12), pp. 840-852.
- RNA-sequencing data Assess the impact of edit on the transcriptomics- regulation/expression profiles of genes including genes coding for long non-coding RNAs, miRNAs, other regulatory elements.
- a change in the transcriptome post edit may be indicative of a therapeutic effect or also of wider long-term change.
- Methyl-sequencing (Methyl-seq): This approach assesses the impact of an edit on DNA methylation profiles within the genome thereby estimating changes on chromatin accessibility. Methyl-seq can be carried out using chemical (bisulphite sequencing) or enzymatic approaches (EM-Seq) - see Vaisvila et al. (2021 ) Genome Res. Jul: 31 (7): 1280-1289.
- DNase accessibility data A deoxyribonuclease (DNase, for short) is an enzyme that catalyses the hydrolytic cleavage of phosphodiester linkages in the DNA backbone, thus degrading DNA.
- Deoxyribonucleases are one type of nuclease, a generic term for enzymes capable of hydrolysing phosphodiester bonds that link nucleotides.
- DNase activity is one way to assess chromatin accessibility and to define the importance of the tissue/cell specific region within the tissue/cell of interest and in potential off target tissues/cell. Impact of edit on chromatin accessibility via studying the fluctuations at the DNase 1 hypersensitivity sites.
- the analysis of the data obtained from one, or more than one, of the above sequencing techniques after a gene edit event allows for the assessment of changes at the epigenetic and transcriptional level.
- This assessment comprises determination of the activity and chromosomal accessibility of, for example, transcription factor binding sites, transcription factor coding regions, genetic regulatory elements (such as enhancers, silencers, and repressors), DNA methylation, histone modifications, and transcription factories, thereby providing quantifiable insights regarding the consequences of an edit in the targeted gene.
- the analysis may be performed upon an appropriate sample of tissue obtained from a subject, or upon an in vitro test sample of tissue.
- an integrative analysis of more than one or even all the above-mentioned sequencing data-sets is performed in order to determine the relationship between gene expression, chromatin accessibility, histone modification and long- range contact dynamics.
- the integrative analysis provides for the assessment of an edit event and its potential for an impact on certain off targets.
- the order of edit impact from high to low is ranked based on an identified genomic region (e.g. a locus) which is (i) Highly transcribed + Highly accessible (ii) Non transcribed + highly accessible (iii) Highly transcribed + least accessible (iv) non-transcribed + Non accessible.
- Off target gene editing events that are identified as being of high impact (e.g.
- Off target gene editing events identified as falling within category (iii) may require ongoing monitoring but may be considered as of lower risk to the subject, contingent upon the clinical relevance of the locus. Any off target gene editing events identified as falling within category (iv) may be considered as being of the lowest risk to the individual subject, again contingent upon the clinical relevance of the edited locus. It will be appreciated that clinical relevance of a genetic region or locus may depend upon whether it shows linkage or proximity to a known oncogene, or other region associated with clinical pathology that may result from aberrant gene expression or genomic instability.
- a differential enrichment analysis of multiple -omics and/or sequencing data may be performed in vitro or in vivo that compares between cellular states of a test cell/tissue that has been subjected to an editing event and a reference cell (e.g. a wild type cell) that has not.
- a comparison is made between wild type cells without any gene editing and cells that have been subjected to gene-editing.
- the wild type cells without any gene editing may be selected from biopsy samples taken prior to gene editing, from a test subject or intended recipient of therapy, and kept in storage for later validation.
- the wild type cells may be obtained from a reference source, such as another subject (e.g. a close genetic relative, or from a tissue bank).
- a further option is for reference cells to be taken from tissue within the individual subject that is adjacent or proximate to a graft of modified tissue - such as ex vivo modified cells - that has been subjected to a gene editing event. This data is further used to derive differentially expressed genes, or to identify different areas of epigenetic modification between edited and comparator non-edited cells.
- an analysis pipeline is provided that further categorises the inferred changes at the transcriptional, post- transcriptional, and epigenetic level on the genome to known or postulated clinical consequences.
- the method of the invention helps provide an estimate of the safety both immediate and long term.
- a bioinformatics pipeline is integrated into the platform to provide the insights regarding the pathogenicity of a change in the genome due to a given editing event both at the on target and any off-target locations.
- the bioinformatics pipeline is configured to analyse and quantify the level of chromatin remodelling, chromatin accessibility, regulation of crucial genetic regulatory elements and their cascading effects which overall determine the clinical significance of a particular edit on the entire genome of a modified cell or tissue in the form of an output profile.
- Such information may be useful as part of an iterative design process for improving the design of CRISPR/Cas components.
- the methods of the invention may be utilised in in vitro, ex vivo or in vivo validation assays to improve gRNA or sgRNA design, such as to improve editing stability, reduce off-target effects or improve tissue specificity.
- the output profile can be used to build a virtual system to provide an in silico model for an individual subject or if combined with a plurality of other individuals to provide a virtual population, or sub-population.
- Such models can be tested to predict the individual's or a population’s ongoing risk profile in response to a gene editing event.
- the risk may be quantified in various ways including presenting a probability of developing side effects, further complications or even acute or chronic disease as a result of treatment.
- the virtual system can be further refined by the addition of information derived from biomarkers found within the same or a different sample, and/or with other physiological and/or epidemiological information, which may be gathered by questionnaire, interview, health professional analysis, measurement with medical diagnostic equipment, or similar.
- Biomarker levels within a tissue sample may be determined by a range of techniques including macromolecule microarray analysis, mass spectrometry (MS) proteomic profiling, quantitative RT-PCR, ELISA or other antibody-based assays, and chromatographic or spectrophotometric techniques.
- MS mass spectrometry
- the described methods can be implemented via one or more computer systems - e.g. in silico.
- an apparatus comprising one or more memories and one or more processors is provided, wherein the one or more memories and the one or more processors are in electronic communication with each other, the one or more memories tangibly encoding a set of instructions for implementing the described methods of the invention.
- the invention provides a computer readable medium containing program instructions for implementing the method of the invention, wherein execution of the program instructions by a controller comprising one or more processors of a computer system causes the one or more processors to carry out the steps as described herein.
- the data may be stored in a database, and accessed via a server.
- the server is provided with communication modules to receive and send information, and processing modules to carry out the steps described herein.
- the data is provided through a cloud service.
- the method is accessible as a web service.
- users may access the service for recordal or retrieval of scores via a website, in a browser.
- Networking of computers permits various aspects of the invention to be carried out, stored in, and shared amongst one or more computer systems locally and at remote sites.
- two or more computer systems may be linked using wired or wireless means and may communicate with one another or with other computer systems directly and/or using a publicly available networking system such as the Internet.
- the computer system includes at least: an input device, an output device, a storage medium, and a microprocessor).
- Possible input devices include a keyboard, a computer mouse, a touch screen, and the like.
- Output devices computer monitor, a liquid-crystal display (LCD), light emitting diode (LED or OLED) computer monitor, virtual reality (VR) headset and the like.
- information can be output to a user, a user interface device (e.g. tablet PC, mobile phone), a computer-readable storage medium, or another local or networked computer.
- Storage media include various types of memory such as a hard disk, RAM, flash memory, and other magnetic, optical, physical, or electronic memory devices.
- the microprocessor is a computer microprocessor (e.g.
- the computer processor may comprise an artificial neural network (ANN).
- ANN artificial neural network
- the computer processor may comprise a machine learning algorithm, suitably a machine learning algorithm that has been trained against one or more appropriate data sets.
- one or more devices may be configured to perform the processes of the invention.
- the device may include a computer system.
- the system can comprise one or more processors and one or more computer-readable storage media.
- the computer readable storage media can have stored thereon computer-executable instructions that are executable by the one or more processors to cause the computer system to perform the methods and procedures described herein.
- the processor(s) controls and adjusts a laboratory protocol that may be implemented via operation of an automated liquid handling system. In this way process parameters within the protocol are adjusted in response to data inputs or parameters.
- liquid handling systems suitable for the performance of automated laboratory protocols may include Freedom EVO (Tecan), Fluent (Tecan), JANUS® (PerkinElmer), Biomek® (Beckman Coulter), Microlab STAR® (Hamilton Robotics) Microlab VANTAGE® (Hamilton Robotics), EpMotion® (Eppendorf), Echo® (LabCyte), Mosquito® (TTP Labtech), OT-1 and OT-2 (Opentrons), LYNX® (Dynamic Devices), PIPETMAX® (Gilson), and Bravo (Agilent).
- dispensers suitable for the performance of automated laboratory protocols may include SPT Dragonfly Discovery®, Formulatrix Mantis®, and Thermo Scientific Multidrop.
- acoustic liquid handlers suitable for the performance of automated laboratory protocols may include Beckman Coulter Echo Acoustic series liquid handlers.
- optofluidic systems suitable for the performance of automated laboratory protocols include Berkley Lights The Beacon®, The LighteningTM and The Culture StationTM platforms.
- the automated liquid handling system may comprise one or more of: a pipette; a pipette tip feeder; a plate reader; a plate handling system; a thermocycler; an agitating/vibrational mixer; an aspirator; an ultrasound mixer; an incubator; a chiller unit; and a fluid dispenser.
- Reporting of the output data from a modelling system of the invention may be achieved via a graphical user interface (GUI) or via an output file that may comprise a .csv file or spreadsheet, such as Microsoft ExcelTM (Microsoft Corp., Redmond (WA), USA) or Google Sheets (Google LLC., Mountain View (CA), USA).
- GUI graphical user interface
- graphical representations such as dashboards, charts, pictograms or graphs may are added if applicable.
- risk profiles may include one or more predictive or probability based data representations , for example, pie charts or scores which are created based on the output data comprised within the worksheet and formatted individually based on user selections such as number format, dashboard arrangement and also the colour ‘skin’ chosen before displaying the data.
- output data is comprised within a relational database.
- a simulator algorithm may be comprised as part of an organisational workflow as it can then write directly into a corporate database, for example. This enables formatting and visualisation and data analytics to be customised by the user.
- An exemplary whole exome sequence (WES) bioinformatics analysis pipeline is shown in Figure 3. After receiving a whole exome sequence reads file (.fastq format) the sequenced data files generated from unedited (control) and edited cells are identified and segregated from each other. The files will typically undergo a quality check to evaluate per sequence and per base quality scores along with sequence length distribution and adapter content.
- the next step is to align the reads to a reference genome assembly, for example the Genome Reference Consortium Human Build 37 (GRCh37).
- This step creates data processed in Sequence Alignment/Map format (.sam) and the corresponding compressed binary version (.bam) files.
- .bam Sequence Alignment/Map format
- .bam compressed binary version
- All the above steps may be performed in parallel on both the control and edited cell data files.
- the .bam files after removing duplicates are then used as an input together into a Bayesian somatic genotyping model to identify somatic short mutations via local assembly of haplotypes, e.g.
- the filtered data is then annotated using a functional annotator that analyzes given variants for their function (as retrieved from a set of data sources - e.g. literature) and produces the analysis in a specified output file - for example, Funcotator (available from https://gatk.broadinstitute.org/).
- This analysis provides an output that includes all the short mutations, such as SNPs and Indels, that have been created in the given cell line because of the gene editing event.
- the data may be represented in graphical form to the user or in any other suitable format that provides information regarding the efficiency and efficacy of the edit, including whether off-target events have occurred.
- changes to three dimensional (3D) genome organization, chromatin structures, intra- and inter- chromosomal interactions as well as transcriptional changes as a result of gene editing are assessed.
- an assessment of the secondary effect of gene editing is performed by determining intra-chromosomal contact frequency.
- Sequenced Hi-C reads are mapped to reference genome using read mapping software such as Burrows- Wheeler Aligner (BWA).
- BWA Burrows- Wheeler Aligner
- Duplicates are removed and aligned reads are filtered to obtain valid reads by pairtools a commonly used command-line framework to process sequencing data.
- Technical and/or biological replicates are then merged.
- Contact frequency is calculated and normalized for downstream analysis using software such as juicebox and cooler (Durand et al. Cell Syst. 2016;3:99-101 ; and Abdennur & Mirny Bioinformatics. 2019; 36:31 1-6).
- the methods of the invention are used to assess long term safety of gene editing in diseases I disorders amenable to gene therapy.
- diseases / disorders include but are not limited to: cancer including breast cancer, colorectal cancer, lung cancer, ovarian cancer, liver cancer, gastric cancer, pancreatic cancer, acute myeloid leukaemia, chronic myeloid leukemia, osteosarcoma, squamous cell carcinoma, peripheral nerve sheath tumours schwannoma, head and neck cancer, bladder cancer, oesophageal cancer, Barretts esophageal cancer, glioblastoma, clear cell sarcoma of soft tissue, malignant mesothelioma, neurofibromatosis, renal cancer, melanoma and prostate cancer, autoimmune or inflammatory conditions (including arthritis (for example rheumatoid arthritis, arthritis chronica progrediente and arthritis deformans) and rheumatic diseases, including inflammatory conditions and rheumatic diseases involving bone loss,
- arthritis
- ulcerative colitis Crohn's disease and irritable bowel syndrome
- endocrine ophthalmopathy Graves' disease, sarcoidosis, multiple sclerosis, systemic sclerosis, fibrotic diseases, primary biliary cirrhosis, juvenile diabetes (diabetes mellitus type I), uveitis, keratoconjunctivitis sicca and vernal keratoconjunctivitis, interstitial lung fibrosis, periprosthetic osteolysis, glomerulonephritis, autoimmune thyroid diseases, hypersensitivity (including both airways hypersensitivity and dermal hypersensitivity) and allergies metabolic disorders including obesity, diabetes mellitus type II, atherosclerosis and other cardiovascular diseases including dilated cardiomyopathy, myocarditis, and dyslipidemia ocular disorders including retinitis pigmentosa, maculopathies, Leber's congenital amaurosis, Leber's hereditary optic neuropathy, early onset severe retinal dyst
- the methods of the invention are used to assess unintended transcriptional changes associated with gene editing.
- An unintended transcriptional change may result in an increase in gene transcription, or a decrease in gene transcription relative to a benchmark level identified in normal healthy tissue, or in tissue prior to the gene editing event. Increases in gene transcription may be global, or limited to specific genes. Changes in gene transcription may be temporal or permanent. The skilled person is able to determine changes in gene transcription within a tissue biopsy or sample using standard high throughput transcriptomics analysis (e.g. RNA-seq, and/or microarray analysis).
- the methods of the invention are used to determine clinical outcomes of gene therapies. Correlating the effects of gene therapy to clinical outcomes in accordance with the invention can be performed using in silica analytical techniques. Suitable software based approaches can utilise accurate off-target analysis using chromatin accessibility and 3D genome proximity data, prediction of transcriptional change based on epigenetics, and assessment of long term safety of gene editing using predicted off-target and transcriptional changes.
- Example 1 An analysis of gene edit impact post CRISPR/Cas editing to capture the global changes in a genome and estimate its associated long-term impact
- Figure 1 illustrates an exemplary pipeline followed in order to estimate the long-term impact of a gene edit.
- Genomic DNA sequence information may be obtained using the exemplary WES workflow described above.
- Transcriptomic data may be obtained for the control and edited cells using the RNA-seq workflow described in more detail below.
- RNA-seq on control and post-edit cells Total RNA including microRNAs are isolated using TRIZOL method. Two biological replicates of RNA are subjected to reverse transcription by using hexamer nucleotides with reverse transcriptase enzyme. The resultant cDNA is subjected to Illumina library preparation followed by paired end sequencing. The data is used to derive the expressed genes under various conditions mentioned. Differential expression analysis is performed between cellular states (wild type cells without any gene editing component, wild type cells with gene editing). This data is further be to derive differentially expressed genes between control and post-edit cells. A workflow for RNA-seq analysis is shown in Figure 2.
- ATAC Seq on between control and post-edit cells ATAC seq is performed by using a kit. Briefly, cells are crosslinked with 1 % HCHO subjected to transposase reaction. The resultant cells are de-crosslinked and DNA purified using conventional phenol chloroform methods. The resultant DNA is subjected to PCR amplification with Illumina sequencing ready primer linkers. The library is subjected to Illumina paired end sequencing. The data is used to align to the genome and peaks of enriched regions are used to identify the regions of hyper sensitivity. Differential expression peak analysis is performed between cellular states between control and post-edit cells.
- ChIP seq (H3K4me3, H3K36me3, H3K9me3 and H3K27me3) between control and post-edit cells: ChIP is performed by using two active histone modifications (H3K4me3, H3K36me3) and two inactive histone modifications (H3K9me3 and H3K27me3).
- the resultant DNA is subjected to Illumina paired end sequencing. This data is used to derive peaks of enrichment. Differential expression peak analysis is performed between control and post-edit cells.
- HiC between control and post-edit cells HiC method is performed as described (Rao, Suhas SP, et al. "A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping.” Cell 159.7 (2014): 1665-1680.). Briefly, 1 million cells of each cellular state are cross linked with 1% HCHO. Restriction enzymes are used to digest the chromatin followed by end filling with Biotinylated CTP. This is followed by intra nuclear ligation with T4DNA Ligase. After de-cross linking, the resultant DNA is subjected to sonication, biotin enrichment and Illumina paired-end end library preparation following sequencing.
- This data is used to derive chromatin condensation states (Chandradoss, Keerthivasan Raanin, et al. BMC genomics 21.1 (2020): 1 - 15.). Further, this data is used to derive genome wide contact matrix at 1 Mb resolution to provide chromosome neighbourhood data and chromosome architecture at 40kb resolution. Differential analysis is performed between control wild type and post-edit cells.
- Integrative analysis of RNA-seq, AT AC seq, ChlP-seq and HiC datasets' Integrative analysis of RNA-seq, AT AC seq, ChlP-seq and HiC datasets'. To understand the relationship between gene expression, chromatin accessibility, histone modification and long- range contact dynamics, an integrative analysis of data derived from RNA-seq, ATAC-seq, ChlP- seq and HiC is performed. Differential enrichment analysis of multiple -omics data is performed between control and post-edit cells.
- the edit impact is analysed based on the analysis of the assays that are performed on the edited cells.
- RNA-seq The analysis capture RNA-seq, ATAC-seq, ChlP-seq (H3K9me3+ H3K36me3 for transcribed genes) and H3K9me3 and H3K27me3 for non-transcribed genes, HiC data sets across human cell types from cellular assays and publicly available data.
- RNA-seq data which identifies differentially expressed genes between the edited and unedited cells and screens for the edit impact on the transcribed genes - i.e. in provides a transcriptomic data input.
- ATAC seq data provides input data that helps to identify the differences in chromatin accessibility, chromatin openness between edited and unedited cells.
- RNA seq data identifies the regions which have experienced real impact due to the edit and helps to identify true off-targets with higher confidence.
- this integrative analysis of ATAC seq and RNA-seq data also identifies the regions which are not transcribed (hence would be missed from the RNA seq screening) but due to high accessibility are efficiently edited.
- This also provides data concerning highly transcribed regions impacted by the edit which are less accessible, highlighting low edit impact on certain off targets.
- the order of edit impact is ranked based on region which is (i) Highly transcribed + Highly accessible (ii) Non transcribed + highly accessible (ill) Highly transcribed + least accessible (iv) non-transcribed + Non accessible
- ChlP-seq data with active histone modifications is integrated with RNA-seq and ATAC seq data to further delineate the extent of edit impact on the global genome. It is determined that (i) regions (sequence matched) which are highly transcribed highly accessible with active histone modification are highly edited followed by (ii) nontranscribed + highly accessible with active histone modifications followed by (iii) nontranscribed + highly accessible with silent histone modification followed by (iv) highly transcribed at least accessible with active histone modifications to (v) non-transcribed non accessible with silent histone modifications.
- RNA-seq is integrated with HiC data to understand the impact of edit on physical proximity of chromosome territories and genes.
- HiC data This allows identification of physical gene networks (genes working together in a cell type specific manner). These networks of genes can be integrated with RNA-seq, ATAC-seq and ChIP seq to know if there is any relationship for these networking genes in terms of its transcription, accessibility and histone modifications.
- the analysis pipeline further categorises the inferred changes at the transcriptional and post- transcriptional level on the genome to their clinical consequences and helps provide an estimate of the safety both immediate and long term.
- the bioinformatics pipeline is built to provide the insights regarding the pathogenicity of a change in the genome due to the edit both at the on target and off-target by going into the depths of chromatin remodelling, chromatin accessibility, regulation of crucial elements and their cascading effects which overall determine the clinical significance of a particular edit on the entire genome.
- Figure 4 shows a diagrammatic view of the potential impact of a single gene editing event on a transcription factory leading to network destabilisation and downstream unintended effects.
- CRISPR/Cas edited loci data was compared to Hi-C data from the corresponding cell line.
- HeLa cells which are a line of cancer cells derived from a cervical tumour, have been widely used in cancer research and are a valuable tool for studying the mechanisms of cancer development and progression. They are especially well used as an in vitro model of system for cervical cancer,
- Hi-C data from HeLa cells were obtained from 4DN public repository (https://data.4dnucleome.org/) under an experiment 4DNESLZVKJ7V.
- Hi-C data may be generated as described above or obtained from other public repositories such as ENCODE (https://www.encodeproject.org/) and GEO (https://www.ncbi.nlm.nih.gov/geo/).
- Huntington's disease is a rare, progressive neurodegenerative disorder that affects a person's ability to move, think, and behave. It is caused by a mutation in the huntingtin (HTT) gene, also called the IT15 gene, on human chromosome 4, which leads to an abnormal build up of a toxic protein called mutant huntingtin. This toxic protein damages nerve cells in the brain, particularly in areas involved in movement, cognition, and emotion.
- HTT huntingtin
- mutant huntingtin This toxic protein damages nerve cells in the brain, particularly in areas involved in movement, cognition, and emotion.
- the age of onset and progression of Huntington's disease can vary widely, but it typically begins in mid-life and can lead to significant disability over time. The disease is inherited in an autosomal dominant pattern, meaning that a person with a single copy of the mutated HTT gene from one parent will develop the disease.
- CRISPR gene editing in the context of Huntington's disease is to specifically target and modify the mutant huntingtin protein produced by the mutant HTT gene. It is thought that this could potentially halt or slow the progression of the disease by reducing the amount of toxic protein accumulation in the brain.
- gRNA published guide RNA
- HTT Huntington gene
- GNAL Gene 4 1 High Intellectual disability, Hippocampus, Astrocytes,
- EN1 gene One of the predicted off-targets, the EN1 gene, is only considered to be chromatin accessible in astrocytes but not in the other three cell types when the ATAC-seq peaks were compared (see Figure 7).
- FIG. 8 illustrates contact frequency within 2M bp of the RP11 -469H8.6 gene.
- K562 cells are a type of immortalized human myelogenous leukaemia cell line that was originally isolated from a patient with chronic myeloid leukaemia. These cells have been widely used as a model system for studying various aspects of cellular biology, including gene expression, signal transduction, cell cycle regulation, and hematopoiesis. K562 cells are particularly useful for studying erythropoiesis, the process by which red blood cells are produced in the bone marrow. They can be differentiated into red blood cells under appropriate culture conditions, making them a valuable research tool for investigating the molecular mechanisms underlying this process.
- K562 cells have also been employed in drug discovery and development. For example, they have been used to screen for compounds that can induce differentiation or apoptosis in leukaemia cells, which could potentially lead to the development of new therapies for leukemia and other types of cancer. Overall, the versatility and ease of use of K562 cells have made them a popular choice for many different types of research.
- RNA-seq changes in CRISPR/Cas assay were compared with intra-chromosomal contact frequency of the K562 cells.
- RNA-seq data from CRISPR/Cas assay targeting HS2 enhancer at the human p-globin locus was obtained.
- Log2 fold change of the RNA- seq between control and case were determined by comparing mean of the two samples in each group.
- Promoter region(s) of each gene were obtained from the Eukaryotic Promoter Database (https://epd.epfl. ch//index.php).
- Contact frequencies between the HS2 enhancer and each gene’s promoter region were extracted from Hi-C data of the K562 cells (from the 4DN data portal, https://data.4dnucleome.org/, experiment set 4DNESI7DEJTM).
- Figure 9 illustrates distribution of transcriptional changes by CRISPR/Cas assay against contact frequency between the gene’s promoter and the CRISPR target.
- the largest up-regulation (Iog2 fold-change of 1.61 ) was observed with MS4A3 gene (Membrane Spanning 4-Domains A3) with the highest contact frequency (5.16x relative to the average).
- the transcriptional change of the gene to more active state suggests change of chromatin condensation around the gene from heterochromatin to euchromatin.
- transcriptome RNA-seq
- Hi-C genome structure
- the TRAC gene is a critically important gene for the development and function of CD4+ and CD8+ T cells, which are key players in the adaptive immune response.
- the TRAC gene encodes the T cell receptor (TCR) alpha chain, which forms a heterodimer with the TCR beta chain to create a functional TCR that can recognize and respond to foreign antigens.
- TCR T cell receptor
- the expression of TRAC is tightly regulated to ensure the proper assembly and expression of the TCR complex. Defects in TRAC expression or function can result in impaired T cell development and function, which can have significant consequences for the immune system's ability to fight infections and cancers.
- the TRAC gene plays a critical role in the development and function of CD4+/CD8+ T cells, and its dysregulation can have significant consequences for the immune system's ability to mount an effective immune response.
- CRISPR/Cas guide RNA was designed to target TRAC gene in the CD4+/CD8+ primary T cells (Lazzarotto et al., 2020; Nature Biotechnology 38, pages’! 317-1327).
- 197 off-target loci were identified by considering overall mismatch, PAM proximal region mismatch, and chromatin accessibility of the multiple types of T-cells using ATAC-seq data. Subset of the off-targets were validated by targeted deep sequencing and indel frequencies were calculated (Lazzarotto et al., 2020).
- the off-targets identified using chromatin accessibility included high-mismatch (e.g.
- a method for determining the effect of a gene editing event upon the genome of a cell that has been modified by a gene editing event comprising: i. on a sample comprising the modified cell, undertaking a first genomic analysis; ii. on a sample comprising a reference cell that has not been subjected to a gene editing event, undertaking a second genomic analysis; ill. comparing the first genomic analysis with the second genomic analysis; and iv. determining whether there has been a change in the genome of the modified cell.
- the ChiP-Seq analysis comprises determination of the presence of one or more histone modifications within the genome selected from the group consisting of: H3K27Ac; H3K4Me1 ; H3K4Me3; H3K9Me3; H3K36Me3; and H3K27Me3.
- the epigenetic modification is determined by an integrated analysis of the combination of at least two or more of the group consisting of: ATAC-Seq; Hi-C; ChiP-Seq; Methyl-Seq and DNase 1 hypersensitivity analysis.
- ChiP-Seq analysis comprises determination of the presence of one or more histone modifications within the genome selected from the group consisting of: H3K27Ac; H3K4Me1 ; H3K4Me3; H3K9Me3; H3K36Me3; and H3K27Me3.
- the modified cell is selected from the group consisting of a cell from: muscle; liver; central nervous system (CNS); brain; breast; endothelium; pancreas; oesophagus; colon; gastrointestinal organs; kidney; lung; spleen; skin; heart; thyroid; lymphatic tissue; cardiovascular; eye; bone marrow; blood; connective tissue; bladder; reproductive organs; and placenta.
- CNS central nervous system
- the diseased cell type is selected from a pre-neoplastic or a neoplastic cell type and wherein neoplastic cell type is selected from the group consisting of: a primary tumour cell; a secondary tumour cell; a metastatic tumour cell; and a cancer stem cell.
- reference cell is selected from a wild type cell, or a pre-neoplastic or a neoplastic cell type and wherein neoplastic cell type is selected from the group consisting of: a primary tumour cell; a secondary tumour cell; a metastatic tumour cell; and a cancer stem cell.
- a computer-implemented method of determining the effect of a gene editing event upon the genome of a cell that has been modified by a gene editing event comprising an integrated analysis of a combination of at least two or more of the group consisting of: ATAC-Seq; chromosome conformation capture; ChiP-Seq; Methyl-Seq and DNase accessibility analysis. 19.
- a method for predicting clinical outcome of a guide RNA-based gene therapy in a patient comprising: a) determining the chromatin accessibility of a target site in the genome of the patient; b) determining the chromatin accessibility of at least one off-target site proximal to the target site, and; c) determining the mismatch between the guide RNA and the off-target site wherein a low mismatch is indicative of a negative clinical outcome in the patient.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Biomedical Technology (AREA)
- Chemical & Material Sciences (AREA)
- Molecular Biology (AREA)
- Organic Chemistry (AREA)
- Biotechnology (AREA)
- General Engineering & Computer Science (AREA)
- Zoology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Wood Science & Technology (AREA)
- Microbiology (AREA)
- Plant Pathology (AREA)
- Physics & Mathematics (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
L'invention concerne des méthodes de détermination de l'effet d'un événement d'édition génique sur le génome d'une cellule qui a été modifiée par un événement d'édition génique, la méthode consistant à : sur un échantillon comprenant la cellule modifiée, effectuer une première analyse génomique ; sur un échantillon comprenant une cellule de référence qui n'a pas été soumise à un événement d'édition génique, effectuer une seconde analyse génomique ; comparer la première analyse génomique à la seconde analyse génomique ; et déterminer s'il y a eu un changement dans le génome de la cellule modifiée. L'invention concerne également des méthodes de thérapie faisant appel aux méthodes en tant qu'indicateurs de diagnostic ou de pronostic.
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| IN202221029165 | 2022-05-20 | ||
| IN202221029165 | 2022-05-20 | ||
| US202263368937P | 2022-07-20 | 2022-07-20 | |
| US63/368,937 | 2022-07-20 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2023225352A2 true WO2023225352A2 (fr) | 2023-11-23 |
| WO2023225352A3 WO2023225352A3 (fr) | 2023-12-28 |
Family
ID=88836012
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2023/022982 Ceased WO2023225352A2 (fr) | 2022-05-20 | 2023-05-19 | Méthodes d'évaluation d'effets d'édition génique |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2023225352A2 (fr) |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6984522B2 (en) * | 2000-08-03 | 2006-01-10 | Regents Of The University Of Michigan | Isolation and use of solid tumor stem cells |
| US11401544B2 (en) * | 2017-05-05 | 2022-08-02 | The United States Of America, As Represented By The Secretary, Department Of Health And Human Services | Methods of preparing a re-usable single cell and methods for analyzing the epigenome, transcriptome, and genome of a single cell |
| EP3640330B1 (fr) * | 2018-10-15 | 2021-12-08 | Consiglio Nazionale Delle Ricerche | Procédé d'analyse séquentielle de macromolécules |
-
2023
- 2023-05-19 WO PCT/US2023/022982 patent/WO2023225352A2/fr not_active Ceased
Also Published As
| Publication number | Publication date |
|---|---|
| WO2023225352A3 (fr) | 2023-12-28 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Shen et al. | Predictable and precise template-free CRISPR editing of pathogenic variants | |
| Owens et al. | Microhomologies are prevalent at Cas9-induced larger deletions | |
| Brinkman et al. | Easy quantification of template-directed CRISPR/Cas9 editing | |
| Gonzalez-Pena et al. | Accurate genomic variant detection in single cells with primary template-directed amplification | |
| Gebert et al. | Large Drosophila germline piRNA clusters are evolutionarily labile and dispensable for transposon regulation | |
| Hosono et al. | Oncogenic role of THOR, a conserved cancer/testis long non-coding RNA | |
| Hart et al. | Evaluation and design of genome-wide CRISPR/SpCas9 knockout screens | |
| Hwang et al. | Lineage tracing using a Cas9-deaminase barcoding system targeting endogenous L1 elements | |
| Schweiger et al. | Genome-wide massively parallel sequencing of formaldehyde fixed-paraffin embedded (FFPE) tumor tissues for copy-number-and mutation-analysis | |
| Fu et al. | Systematic decomposition of sequence determinants governing CRISPR/Cas9 specificity | |
| Kadota et al. | Multifaceted Hi-C benchmarking: what makes a difference in chromosome-scale genome scaffolding? | |
| Erdel et al. | Generalized nucleation and looping model for epigenetic memory of histone modifications | |
| Costa et al. | Genome editing using engineered nucleases and their use in genomic screening | |
| JP2021511309A (ja) | 核酸を解析するための方法および組成物 | |
| US11584930B2 (en) | Methods and kits for identifying cancer treatment targets | |
| Bodai et al. | Targeting double-strand break indel byproducts with secondary guide RNAs improves Cas9 HDR-mediated genome editing efficiencies | |
| Tasan et al. | Targeting specificity of the CRISPR/Cas9 system | |
| Han et al. | Transposable element profiles reveal cell line identity and loss of heterozygosity in Drosophila cell culture | |
| D′ Antonio et al. | Identifying DNase I hypersensitive sites as driver distal regulatory elements in breast cancer | |
| Diroma et al. | New insights into mitochondrial DNA reconstruction and variant detection in ancient samples | |
| Brown et al. | Multiplexed targeted genome engineering using a universal nuclease-assisted vector integration system | |
| Trivedi et al. | Analyzing CRISPR screens in non-conventional microbes | |
| Melamed et al. | An information theoretic method to identify combinations of genomic alterations that promote glioblastoma | |
| Lazzarotto et al. | Population-scale cellular GUIDE-seq-2 and biochemical CHANGE-seq-R profiles reveal human genetic variation frequently affects Cas9 off-target activity | |
| Huang et al. | Polymorphic transposable elements contribute to variation in recombination landscapes |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23808406 Country of ref document: EP Kind code of ref document: A2 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 23808406 Country of ref document: EP Kind code of ref document: A2 |