WO2018175350A1 - Ensemble de petites pointes d'arn validées pour analyse d'exarn - Google Patents
Ensemble de petites pointes d'arn validées pour analyse d'exarn Download PDFInfo
- Publication number
- WO2018175350A1 WO2018175350A1 PCT/US2018/023191 US2018023191W WO2018175350A1 WO 2018175350 A1 WO2018175350 A1 WO 2018175350A1 US 2018023191 W US2018023191 W US 2018023191W WO 2018175350 A1 WO2018175350 A1 WO 2018175350A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- mir
- hsa
- spike
- sequences
- rna
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/166—Oligonucleotides used as internal standards, controls or normalisation probes
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/178—Oligonucleotides characterized by their use miRNA, siRNA or ncRNA
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
Definitions
- Extracellular RNA (exRNAs) in human biofluids are of great interest for many of the groups within the Extracellular RNA Communication Consortium (ERCC) as a largely untapped source of accurate prognostic and diagnostic biomarkers for the human disease states being investigated by the Consortium's research groups.
- ERCC Extracellular RNA Communication Consortium
- accurate quantification of the exRNAs contained in these biofluids is confounded by the broad range of expression levels for different exRNAs and the variability introduced by the multiple "wet lab” steps involved in obtaining exRNA quantification data.
- biases can be introduced during the RNA extraction and the RNA preparation steps used in downstream processing.
- different biases can be encountered not only among different analytical methods (e.g.
- the Exiqon small RNA spike-in set is the most comparable commercially available product that attempts to address some of the QC and normalization challenges associated with small RNA NGS discovery work.
- This small RNA-seq spike-in set consists of 52 -20- 21nt small RNAs, mixed at a range of concentrations, to be added at either the RNA isolation or at the library generation phase.
- Milteyi Biotec spike-ins are intentionally identical to known human, mouse, rat and viral miRNA sequences found within miRbase, are appropriate for use within a microarray setting, and thus have limited use as spike-in controls in NGS small RNA discovery work.
- the External RNA Controls Consortium spike-ins 3 and the Sequins 2 are restricted to long RNAs (>200nt) and thus are biochemically incompatible with most small RNA-seq NGS library preparation pipelines.
- the spike-in set developed by the Tuschl lab consists of two equimolar pools of 10 synthetic 22nt small RNAs with no matches to the human or mouse genomes.
- One pool of 10 oligos is added during RNA isolation and the second pool is added at the beginning of library preparation.
- These pools allow for QC of both the RNA isolation and library preparation steps.
- the pools consist of relatively few oligos and the size is appropriate only for monitoring recovery of miRNA-sized fragments as opposed to other classes of small RNAs.
- it is unclear whether any oligos in the set have matches to sequences in databases beyond the human and mouse genome, which could make them problematic for use in libraries where exogenous RNAs are of interest.
- exRNAs include fragments of both coding and non-coding long RNAs. Furthermore, as this type of discovery work is starting to encompass exogenous RNAs from non-human organisms found in human biofluids, it is important that sequences from non- human organisms also be avoided.
- RNA spike-in standards can be used to both standardize exRNA expression across experiments, as well as to provide quality control (QC) metrics for the various procedural steps in sample preparation.
- aspects of this disclosure relate to artificial short RNA sequences that (1) have a base composition that mimics that of endogenous human miRNAs; (2) have broad sequence diversity; (3) cover a range of sequence lengths of between about 16 to 70 nucleotides; and/or (4) do not share sequence identity with known endogenous sequences in human, animal, plant, fungus, bacterial, and/or viral genomes and methods of identification thereof.
- the artificial short RNA sequences have a nucleotide frequency, optionally GC content, of approximately 25% for each nucleotide, which mimics the nucleotide frequency of endogenous human miRNAs in miRBase.
- the artificial short RNA sequences have a dinucleotide frequency that mimics the dinucleotide frequency of endogenous human miRNAs in miRBase.
- the artificial short RNA sequences are selected from the group of artificial short RNA sequences provided in Table 3 or Figure 2.
- the artificial short RNA is detectably labeled, optionally 5' phosphorylated.
- RNAs as "spike-ins.”
- the aim is 0.5-1% or 1-5% spike-in reads per library (i.e., for a 10M total read library, there are a total of 100,000 calibrator reads). In some embodiments, this is accomplished using a starting point of 5% spike-in with ongoing experiments to optimize this value.
- Related aspects relate to a pool of artificial short RNA sequences disclosed above, referred to herein as a "spike-in set” or a "set of spike-in RNAs.”
- the set of spike-in RNAs comprises one or more of the artificial short RNA sequences, optionally between about 10 to 100 sequences.
- the set of spike-in RNAs all have the same length. In some embodiments, the set of spike-in RNAs have a range of lengths from 16 to 70 nucleotides.
- exRNA small extracellular RNAs
- exRNA small extracellular RNAs
- Some aspects relate to the addition of the artificial short RNA sequences to biofluids and/or RNA samples at different steps of experimental procedures or the normalization of endogenous RNA among different samples.
- Expression levels of these artificial short RNA sequences can be determined using qRT-PCR, hybridization or next-generation sequencing (NGS).
- NGS next-generation sequencing
- equimolar or ratiometric molar concentrations of the spike ins are used.
- FIG. 1 depicts an exemplary 32nt oligo phylogenetic tree. Red Line: Groups (6). Blue Square: Chosen oligo.
- FIG. 2 lists spike-in sequences. These sequences were run through the exceRpt pipeline and had no alignments to known endogenous nor to known exogenous (non-human) sequences. RNA oligos were ordered from IDT with a 5' phosphate and HPLC purification. The RNA oligos were diluted to 1 ⁇ and pooled into an A and a B pool.
- FIG. 3 provides a characterization of libraries made with the spike-ins.
- set A oligos were added to Qiazol at indicated amounts before adding to the sample to extract the RNA.
- FIG. 4 shows spike-in oligos on a 10% Acrylamide TBE-urea gel.
- FIG. 5 shows measurements of the total miRNA concentration in a female plasma pool based on known spike-in quantity added to the plasma pool during RNA isolation.
- the term “comprising” is intended to mean that the compositions and methods include the recited elements, but do not exclude others.
- the transitional phrase “consisting essentially of (and grammatical variants) is to be interpreted as encompassing the recited materials or steps "and those that do not materially affect the basic and novel characteristic(s)" of the recited embodiment. See, In re Herz, 537 F.2d 549, 551-52, 190 U.S.P.Q. 461, 463 (CCPA 1976) (emphasis in the original); see also MPEP ⁇ 2111.03.
- nucleic acid sequences refers to a polynucleotide which is said to "encode” a polypeptide if, in its native state or when manipulated by methods well known to those skilled in the art, can be transcribed and/or translated to produce the mRNA for the polypeptide and/or a fragment thereof.
- the antisense strand is the complement of such a nucleic acid, and the encoding sequence can be deduced therefrom.
- Homology refers to sequence similarity between two peptides or between two nucleic acid molecules. Homology can be determined by comparing a position in each sequence which may be aligned for purposes of comparison. When a position in the compared sequence is occupied by the same base or amino acid, then the molecules are homologous at that position. A degree of homology between sequences is a function of the number of matching or homologous positions shared by the sequences. As used herein, referring to a sequence that "does not share identity” intends that a sequence (oligo) shares less than 100% identity with or alternatively has one or more mismatches when compared to another sequence across the length of the sequence (oligo).
- ortholog is used in reference of another gene or protein and intends a homolog of said gene or protein that evolved from the same ancestral source. Orthologs may or may not retain the same function as the gene or protein to which they are orthologous.
- polynucleotides are transcribed into mRNA and/or the process by which the transcribed mRNA is subsequently being translated into peptides, polypeptides, or proteins. If the polynucleotide is derived from genomic DNA, expression may include splicing of the mRNA in a eukaryotic cell. The expression level of a gene may be determined by measuring the amount of mRNA or protein in a cell or tissue sample; further, the expression level of multiple genes can be determined to establish an expression profile for a particular sample.
- nucleic acid sequence and “polynucleotide” are used interchangeably to refer to a polymeric form of nucleotides of any length, either
- ribonucleotides or deoxyribonucleotides includes, but is not limited to, single-, double-, or multi- stranded DNA or RNA, genomic DNA, cDNA, DNA-RNA hybrids, or a polymer comprising purine and pyrimidine bases or other natural, chemically or biochemically modified, non-natural, or derivatized nucleotide bases.
- ribonucleic acid or "RNA” are used interchangeably to refer to a nucleic acid comprising a ribonucleic acid backbone.
- bases adenine (A), uracil (U), guanine (G), and cytosine (C) are found in RNA.
- protein protein
- peptide and “polypeptide” are used interchangeably and in their broadest sense to refer to a compound of two or more subunits of amino acids, amino acid analogs or peptidomimetics.
- the subunits may be linked by peptide bonds.
- the subunit may be linked by other bonds, e.g., ester, ether, etc.
- a protein or peptide must contain at least two amino acids and no limitation is placed on the maximum number of amino acids which may comprise a protein's or peptide's sequence.
- amino acid refers to either natural and/or unnatural or synthetic amino acids, including glycine and both the D and L optical isomers, amino acid analogs and peptidomimetics.
- spike-in refers to a nucleic acid sequence, e.g. RNA, added to a sample as a control to assess performance of a nucleic acid quantification technology such as qPCR, next generation sequence (NGS), or a microarray.
- microarray refers to a collection of nucleic acids attached to a surface used to measure expression level of multiple genes simultaneously.
- the term "subject" is intended to mean any animal.
- the subject may be a mammal; in further embodiments, the subject may be a human, mouse, or rat.
- tissue is used herein to refer to tissue of a living or deceased organism or any tissue derived from or designed to mimic a living or deceased organism.
- the tissue may be healthy, diseased, and/or have genetic mutations.
- the biological tissue may include any single tissue (e.g., a collection of cells that may be interconnected) or a group of tissues making up an organ or part or region of the body of an organism.
- the tissue may comprise a homogeneous cellular material or it may be a composite structure such as that found in regions of the body including the thorax which for instance can include lung tissue, skeletal tissue, and/or muscle tissue.
- Exemplary tissues include, but are not limited to those derived from liver, lung, thyroid, skin, pancreas, blood vessels, bladder, kidneys, brain, biliary tree, duodenum, abdominal aorta, iliac vein, heart and intestines, including any combination thereof.
- biological fluid refers to a biological fluid, such as but not limited to those excreted (e.g. urine or sweat), secreted (e.g. breast milk or bile), circulating (e.g. blood, blood components such as plasma, or cerebrospinal fluid), and/or developed as a results of a pathological process in a subject (e.g. pus or other blister or cyst fluids).
- excreted e.g. urine or sweat
- secreted e.g. breast milk or bile
- circulating e.g. blood, blood components such as plasma, or cerebrospinal fluid
- results of a pathological process in a subject e.g. pus or other blister or cyst fluids.
- Non- limiting examples of such fluids found in a human subject include amniotic fluid, aqueous humour, vitreous humour, bile, blood, blood plasma, blood serum, breast milk, cerebrospinal fluid, cerumen (earwax), chyle, chime, endolymph, perilymph, exudates, feces, ejaculate (male or female), gastric acid, gastric juice, lymph, mucus, pericardial fluid, perioneal fluid, pleural fluid, pus, rheum, saliva, sebum, serous fluid, semen, smegma, sputum, synovial fluid, sweat, tears, urine, vaginal secretion, vomit, intracellular fluid, extracellular fluid,
- intravascular fluid interstitial fluid
- lymphatic fluid and transcellular fluid.
- the fluid may be from a subject of any age, such as but not limited to an adult, child, infant, neonate, or fetus.
- the biofluid may optionally be one or more of amniotic fluid, cerebrospinal fluid, adult serum, adult plasma, cord blood plasma, cord blood serum, bronchoalveolar lavage fluid, saliva, sputum, and adult urine.
- treating or “treatment” of a disease in a subject refers to (1) preventing the symptoms or disease from occurring in a subject that is predisposed or does not yet display symptoms of the disease; (2) inhibiting the disease or arresting its
- beneficial or desired results can include one or more, but are not limited to, alleviation or amelioration of one or more symptoms, diminishment of extent of a condition (including a disease), stabilized (i.e., not worsening) state of a condition (including disease), delay or slowing of condition (including disease), progression, amelioration or palliation of the condition (including disease), states and remission (whether partial or total), whether detectable or undetectable.
- vector intends a recombinant vector that retains the ability to infect and transduce non-dividing and/or slowly-dividing cells and integrate into the target cell's genome.
- the vector may be derived from or based on a wild-type virus. Aspects of this disclosure relate to an adeno-associated virus vector.
- label intends a directly or indirectly detectable compound or composition that is conjugated directly or indirectly to the composition to be detected, e.g., N-terminal histidine tags (N-His), magnetically active isotopes, e.g., 115 Sn, 117 Sn and 119 Sn, a non-radioactive isotopes such as 13 C and 15 N, polynucleotide or protein such as an antibody so as to generate a "labeled" composition.
- N-terminal histidine tags N-His
- magnetically active isotopes e.g., 115 Sn, 117 Sn and 119 Sn
- a non-radioactive isotopes such as 13 C and 15 N
- polynucleotide or protein such as an antibody so as to generate a "labeled” composition.
- the term also includes sequences conjugated to the polynucleotide that will provide a signal upon expression of the inserted sequences, such as green fluorescent
- label generally intends compositions covalently attached to the composition to be detected, it specifically excludes naturally occurring nucleosides and amino acids that are known to fluoresce under certain conditions (e.g. temperature, pH, etc.) and generally any natural fluorescence that may be present in the composition to be detected.
- the label may be detectable by itself (e.g.
- radioisotope labels or fluorescent labels or, in the case of an enzymatic label, may catalyze chemical alteration of a substrate compound or composition which is detectable.
- the labels can be suitable for small scale detection or more suitable for high-throughput screening.
- suitable labels include, but are not limited to magnetically active isotopes, nonradioactive isotopes, radioisotopes, fluorochromes, chemiluminescent compounds, dyes, and proteins, including enzymes.
- the label may be simply detected or it may be quantified.
- a response that is simply detected generally comprises a response whose existence merely is confirmed
- a response that is quantified generally comprises a response having a quantifiable (e.g., numerically reportable) value such as an intensity, polarization, and/or other property.
- the detectable response may be generated directly using a luminophore or fluorophore associated with an assay component actually involved in binding, or indirectly using a luminophore or fluorophore associated with another (e.g., reporter or indicator) component.
- luminescent labels that produce signals include, but are not limited to bioluminescence and chemiluminescence.
- Detectable luminescence response generally comprises a change in, or an occurrence of a luminescence signal.
- Suitable methods and luminophores for luminescently labeling assay components are known in the art and described for example in Haugland, Richard P. (1996) Handbook of Fluorescent Probes and Research Chemicals (6 th ed).
- Examples of luminescent probes include, but are not limited to, aequorin and luciferases.
- sequence space represented by endogenous small RNAs is largely limited to ⁇ 75 nt. This increases the probability of a randomly generated small RNA standard either being mis-called as an endogenously expressed small RNA, or that the base composition of the synthetic sequences may diverge excessively from the known small RNA sequence space to the point that it reduces the user's ability to make reliable inferences on relative small RNA target expression.
- aspects of this disclosure relate to artificial short RNA sequences that (1) have a base composition that mimics that of endogenous human miRNAs; (2) have broad sequence diversity; (3) cover a range of sequence lengths of between about 16 to 70 nucleotides; and/or (4) do not share sequence identity with known endogenous sequences in human, animal, plant, fungus, bacterial, and/or viral genomes and methods of identification thereof.
- Non-limiting exemplary artificial short RNA sequences include those provided in Table 3 or Figure 2.
- the sequences further comprise degenerate sequences thereof.
- degenerate sequences meet the criteria (1) to (4) noted above and vary from the artificial short RNA sequence, optionally by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or 16 nucleotides.
- the invariant sequence of the degenerate sequence comprises the portion of the base composition that mimics that of endogenous miRNA from the artificial short RNA sequence and may be contiguous or non-contiguous.
- compositions that mimics that of endogenous human miRNAs are generally based on analytics from miRNA databases.
- miRBase is the art-recognized standard database that contains high- quality human small RNAs, many of which are grouped into families with similar seed sequences. However, it is appreciated that any suitable miRNA database may be used for the purposes disclosed herein.
- exemplary miRNAs from each family can be sampled to gain a representative population distribution of the di-nucleotide frequencies for naturally occurring human miRNAs. Using these miRNAs as a template for random small RNA generation can yield populations with similar di-nucleotide frequencies as those found within the
- This matched di-nucleotide frequency distribution can extend to synthetic spike-in oligos of any size, permitting detailed quality control validation of various aspects of RNA extraction and/or the size selection portion of the NGS library generation.
- This process has the potential to reduce the occurrence of "jackpotting” where a single spike-in oligo sequences particularly well and reduces signal of the other oligo standards. This "jackpotting" phenomenon has been previously observed on Ion Torrent (Thermo Scientific) based instruments, and on Illumina-based instruments.
- a novel sequence can be generated in such a way as to sample the frequency of any target database of sequences.
- This database of sequences can be of an entire species, sequences from multiple species, or subsets of sequences of either. Specifically, but not limited to, whole or truncated versions of mRNAs, miRNAs, piRNAs, tRNAs, rRNAs, YRNAs, DNAs, peptides, amino acids, or other sequences.
- the base pair frequency can be of single nucleotide, di-nucleotide, tri-nucleotide, or any number of nucleotides or amino acids as desired by the user.
- the resultant sequence can be screened for any number of sequence motifs of any length.
- These motifs consist of consecutive strings of nucleotides or amino acids whose identities are provided within the IUPAC codes for nucleotides or amino acids, which additionally represents subsets of specific nucleotides or amino acids at each position within a sequence.
- sequence motifs can be homopolymers of two nucleotides, three nucleotides, four nucleotides, or any number of nucleotides or amino acids.
- sequence motifs can be patterns of two, three, four, or any number of sequences.
- These motifs can include repeats of motifs of two, three, four, or any number.
- the overrepresentation of related sequences within the database can be reduced through a pre-clustering of the sequences within the database into groups or families.
- This clustering can be performed de novo through a number of sequence similarity discovery algorithms, such as, but not limited to, those embodied by the BLAST, BLAT, MUSCLE, ClustalW programs, or by the Smith-Waterman algorithm, or by the Needleman- Wunsch algorithm.
- this clustering into families is available through the metadata associated with each sequence in the database, such as the naming system within miRBase.
- reducing sequence overrepresentation is important to ensure increased sequence diversity within a selected oligo pool and reduces the impact of miRNA family expansion that has occurred over evolutionary timescales within eukaryotic species.
- the database of sequences is catenated into a single string of nucleotide or amino acid sequences with a series of nonsense text markers (ie "N” or "X") placed in between each sequence.
- the user designates the length of the stretch of nucleotides or amino acidss for which they desire in their generated sequence to possess the same frequency representation as that found within their database of sequences (ie the user chooses a single nucleotide, or two nucleotides, or three nucleotides, or any number of nucleotides or amino acids).
- the user also selects the final length for their desired randomly-generated oligo.
- the length of that oligo is evenly divisible by the length of the stretch of nucleotides or amino acids. For instance, a selected length of three nucleotides (tri-nucleotide) will have a final randomly samples oligo of length 3, 6, 9, 12, nucleotides or any number of nucleotides evenly divisible by three.
- a random number is generated, and that position is chosen within the catenated string of nucleotides or amino acids and a number of nucleotides or amino acids after that random position, of the length of nucleotides or amino acids chosen by the user, minus 1, is selected. If this set of nucleotides or amino acids contains the nonsense text marker (ie "N" or "X") then the entire selected sequence is discarded, and the process is restarted with a new random number. If the selected sequence does not contain the nonsense marker, it is screened against the previously chosen motifs (repeats, homopolymers, etc) and discarded if it matches any of those sequences and the process repeats with a new randomly generated number.
- the chosen oligo does not fail on the prior two filtering steps, it is selected for catenation. This process is repeated with a newly generated random number and the oligos that pass the filters are catenated to the previous oligo. This newly generated oligo is then filtered against the undesired motifs, and if those are found, the previously catenated oligos are removed. This process is repeated until the target number of oligos of the specified length that pass the filters are generated.
- the artificial short RNA sequences have a GC content of approximately 25% for each nucleotide, which mimics the nucleotide frequency of endogenous human miRNAs in miRBase.
- sequence diversity a variety of algorithms may be used for alignment and subsequent diversity scoring.
- alignment may be performed using any suitable alignment program, including but not limited to a MUSCLE algorithm.
- Sequence diversity can likewise be analyzed using a suitable model such as the Maximum Composite Likelihood method with Substitutions to include Transitions+Transversions in MEGA 7, assuming a uniform pattern among lineages. It is appreciated that sequences having a diversity metric calculated by this method above 4.00 have suitably “broad" sequence diversity.
- An ordinary skilled artisan can appreciate that equivalents to this 4.00 threshold are contemplated should an alternative, equivalent model be used and would appreciate how to convert the threshold value in view of the model and corresponding assumptions.
- phylogenetic trees can be generated to group sequences of the same length into families. Sequences can then be chosen based on the generation of phylogenetic trees. For example, sequences can be chosen based on their position in a phylogenetic tree in such a way to both maximize inter-sequence diversity and representation of the total diversity within the pool of sequences.
- the program generating the tree e.g. MEGA7, can be configured so that approximately the same number of related sequences are categorized into each subtree.
- the sequence with the highest degree of similarity to the other sequences within its subtree can be selected from each subtree. Not to be bound by theory, this sequence will likely be the basal sequence in the subtree.
- the short artificial RNA sequences disclosed herein may vary in length between 16 to 70 nucleotides, i.e., having at least 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, but not more than 70 nucleotides; and/or at most 70, 69, 68, 67, 66, 65, 64, 63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, 50, 49, 48, 47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26,
- the Extracellular RNA Communication Consortium (ERCC) has developed a comprehensive and standardized small RNA seq mapping pipeline available through Genboree (www.genboree.org), called the exceRpt pipeline.
- This pipeline can be utilized to screen randomly generated synthetic putative small RNA sequences to remove those that match the human genome (endogenous) or known non-human (exogenous) sequences. This greatly reduces the chance that a spike-in sequence will be miscalled as a relevant small RNA within a sample.
- RNA sequences such as those from the Erie or Galas groups and/or those available on one of the other databases known in the art.
- generated oligo sequences can be curated to avoid mapping to human, animal, plant, fungus, bacterial, and/or viral genomes based on the selected data set.
- the Erie and Galas group datasets provide those exRNAs found in biofluids, such as amniotic fluid, cerebrospinal fluid, adult serum, adult plasma, cord blood plasma, bronchoalveolar lavage fluid, saliva, sputum, and adult urine.
- RNAs as "spike-ins.”
- the aim is 0.5-1% or 1-5% spike-in reads per library (i.e., for a 10M total read library, there are a total of 100,000 spike-in reads). In some embodiments, this is accomplished using a starting point of 5% spike-in with ongoing experiments to optimize this value.
- RNA sequences that (1) have a base composition that mimics that of endogenous human miRNAs; (2) have broad sequence diversity; (3) cover a range of sequence lengths of between about 16 to 70 nucleotides; and/or (4) do not share sequence identity with known endogenous sequences in human, animal, plant, fungus, bacterial, and/or viral genomes - i.e. those artificial short RNA sequences disclosed herein above.
- the pool of these sequences is referred to herein as a "spike-in set” or a "set of spike-in RNAs.”
- pools may be generated by the same methods.
- the pools may comprise between about 2 and 100 unique artificial short RNA sequences, optionally at least 50 unique artificial short RNA sequences selected according to the metrics disclosed herein above.
- the set of spike-in RNAs comprises one or more of the artificial short RNA sequences, optionally at least 5 sequences, at least 10 sequences, at least 15 sequences, at least 20 sequence, at least 25 sequence, at least 30 sequence, at least 40 sequences, at least 45, at least 50 sequences, at least 55 sequences, at least 60 sequences, at least 65 sequence, and least 70 sequences, at least 75 sequences, at least 80 sequences, at least 85 sequences, at least 90 sequences, at least 95 sequences, or at least 100 sequences.
- the pools may comprise a unique artificial short RNA sequence and degenerate sequences thereof.
- Said degenerate sequences meet the criteria (1) to (4) noted above and vary from the artificial short RNA sequence, optionally by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or 16 nucleotides.
- the invariant sequence of the degenerate sequence comprises the portion of the base composition that mimics that of endogenous miRNA from the artificial short RNA sequence and may be contiguous or noncontiguous.
- the set of spike-in RNAs all have the same length.
- all of the sequences have 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, or 70 nucleotides.
- the length is dependent on the target biofluid or desired RNA class. For example, for plasma miRNA, the target range would be 20 to 24 nucleotides.
- spike-in oligo sets will be the most comprehensive available for human small RNA-seq experiments and have high value both within the U01 laboratories, as well as across the larger scientific community. Bioinformatic exclusion of known human and non-human sequences within the synthetic pools reduces the chance of misidentifying reads from a spike-in standard as relevant small RNAs, and will be more stringent than currently available oligo spike-in sets.
- the di- nucleotide selection procedure in the synthetic oligo generation process can better match the nucleotide biases seen in miRbase families of small RNAs, and increase the chance that the spike-in oligos behaves in a similar manner on the NGS sequencing platform.
- the disclosed spike-in set is being designed with NGS considerations in mind, they may be used in other small RNA analysis methods.
- exRNA small extracellular RNAs
- exRNA small extracellular RNAs
- the artificial short RNA sequences relate to the addition of the artificial short RNA sequences to biofluids and/or RNA samples at different steps of experimental procedures or the normalization of endogenous RNA among different samples.
- the artificial short RNA is detectably labeled, optionally 5' phosphorylated.
- Levels of these artificial short RNA sequences can be determined using qRT-PCR, microarray or next-generation sequencing (NGS). In some embodiments, equimolar or ratiometric molar concentrations are used.
- RNA oligo spike-ins disclosed herein contains a sufficient number of oligos (-100) such that distinct spike-in sets, with a useful range of sizes and/or concentrations, can be added at both the RNA isolation and at the NGS library generation phases. Furthermore, the proposed spike-in oligos have a wide range of lengths (16 to 70 nt), allowing for both the detection of size bias in the RNA isolation steps, and providing the ability to gauge the effectiveness of the size selection processes during library preparation.
- this larger size range increases the flexibility of the spike-ins to include all relevant small RNA species (i.e., miRNA, piRNA, snoRNA, snRNA, Y RNA fragments, etc.) whereas the currently available spike-ins (Exiqon) are restricted to only the miRNA class of small RNAs.
- spike-ins disclosed herein include but are not limited to (1) evaluating RNA yield through RNA isolation, (2) evaluating the amount of input RNA into library prep using sequence data ⁇ e.g. variation % calibrator in sequencing reads), (3) evaluating library quality ⁇ e.g., overt failures, limitation of detection/complexity), (4) where multiple sets of calibrators are used, detecting cross-contamination or sample
- the spike in set in Table 3 was tested with a representative and broad range of NGS library preparation techniques and chemistries, and is, thus, expected to perform in a similar fashion to other kits not specifically tested here, as well as future NGS library preparation chemistries.
- This method can, in turn, be optimized for other uses with other analytical methods (e.g., qRT-PCR, microarray, etc.) by performing the filtering step using the planned analytical method and adapting the criteria for removal accordingly.
- RNA RNA
- a biofluid may be divided into aliquots of different volumes and spiked with a range of concentrations of a set of spike-in oligonucleotides;
- the biofluid may be aliquoted into volumes ranging from about ⁇ to about 450 ⁇ and the corresponding spike-in set may be introduced in a range of concentrations from about lxlO "17 to about lxlO "18 moles per ⁇ .
- the range is adjusted the match the expected specific biofluid content.
- Extracellular RNA can then be isolated or analyzed using methods including but not limited to qRT-PCR, microarray, NanoString, or targeted small RNA sequencing.
- the read count of spike-in RNAs is then compared to the total miRNA read count in each library.
- the absolute concentration of miRNA in the plasma can be calculated. Examples
- Example 1 Design and synthesis of a set of small RNA spike-in synthetic oligos for use in NGS analysis and other small RNA analytical methods.
- dinucleotide frequencies of the synthetic oligos were matched those found in miRBase, a high quality human miRNA database. Specifically, these small RNAs were extracted from the database and grouped by family. A random selection of a proportional number of sequences from each family was used to form a single synthetic concatemer sequence that represented the dinucleotide frequency of the population (Table 1).
- Nonsense sequences were placed in between each small RNA sequence in the
- oligos were processed through the Genboree exceRpt small RNA mapping pipeline against endogenous (human) and exogenous reference libraries at a stringency of 0 allowed mismatches, rather than the default of 1 allowed mismatch.
- the oligos that were found to have no hits using this pipeline were then used as a reference library against which human biofluid exRNA small RNA-seq datasets from the Erie and Galas groups were mapped to ensure that there was no mapping of reads from actual datasets to these sequences occurred.
- the biofluids represented in these datasets were: amniotic fluid; cerebrospinal fluid, adult serum, adult plasma, cord blood plasma, bronchoalveolar lavage fluid, saliva, sputum, and adult urine.
- Applicants then selected the spike-in oligonucleotides for synthesis from the collection of oligos that passed these mapping filters.
- the collection was divided according to oligo length and each length category was aligned using MUSCLE, and phylogenetic trees were generated using MEGA7.
- the optimal substitution matrix was determined to be Jukes Cantor, which was then used to create maximum likelihood trees with a bootstrap value of 100. Regions of the alignment that lacked complete coverage across the sequences were considered uninformative and these data were eliminated in branch determination.
- the resultant trees (example tree in Figure 1) were inspected visually and a random oligo was chosen from each apparent subgroup of the tree until the target number of oligos for each size range was achieved (Table 3).
- the spike-ins will be added with the aim to account for 1-5% of the total number of reads in a small RNA-seq library.
- the different synthetic oligos provide sequence data at different efficiencies. Applicants start with a larger number of oligos than expected to be included in the final spike-in sets to allow for drop-out during the validation process.
- the selected sequences are synthesized as 5'phosphorylated RNA oligonucleotides to mimic the chemical characteristics of endogenous miRNAs. The synthesis is performed at the smallest
- Applicants have designed and synthesized a spike-in oligonucleotide set that mimics the chemistry and base composition of biologically important exRNAs, while having sequences that are distinct from endogenous RNAs and genomes of humans and the other species represented in the Genboree exceRpt pipeline.
- a small-scale equimolar test pool with all of the synthesized spike-in oligos is prepared and sequenced to identify oligos that are unsuitable for inclusion within subsequent pools due to overrepresentation in the resulting read counts, or "jackpotting;"
- pooling and dilution strategy previously used by the Galas lab for development of a small set of spike-in oligos forms the basis of the strategy used here.
- separate pools were made for pre-RNA isolation spike-in and for direct spike-in during library generation. This allows for QC of both the RNA isolation and library preparation steps independently.
- For the direct library spike-in a final count of -lxlO "19 moles of pooled oligo per library was added, and for the pre-RNA isolation spike-in, ⁇ 3xl0 "18 moles pooled oligo was diluted in the Qiazol lysis reagent used in the RNA extraction procedure.
- RNA Storage Solution RNAse free ImM sodium citrate buffer pH 6.4, ThermoFisher, Carlsbad, CA. Only "low nucleic acid binding" tubes and RNA Storage Solution is used in the oligo dilution process.
- 10 ⁇ _ of each 100 ⁇ oligo will be diluted to 10 ⁇ in a 100 ⁇ _, final volume. This 1 : 10 dilution is repeated for a ⁇ . 1 ⁇ RNA oligo stock. The 10 ⁇ and 1 ⁇ stocks is used for subsequent oligo pooling operations.
- 1 ⁇ _ of each of these 1 ⁇ stocks is pooled by equal volume to yield a pool concentration of 1 ⁇ .
- RNA-seq libraries are generated from the test pool using the Illumina TruSeq, NEB NEBNext, Galas 4N, Erie 4N, and Clontech SMARTer small RNA-seq library preparation methods and sequenced. The resulting data are analyzed and highly
- a maximum of 1 ⁇ 2 of the available synthesis of any given oligo are used to generate the pools described below, reserving the remaining synthesis for generation of future pools.
- ratiometric standards can be used for accurate absolute quantification of target RNA molecules as observed with the Sequins, and to a lesser extent with the Zebrafish miRNA standards and the Exiqon standards.
- Table 5 delineates the ratiometric pooling strategy.
- Ratiometric SetA Mixl Two mixes are made in which the oligos are distributed in opposing molar dilution ladders across different oligos of different lengths; these are designated Ratiometric SetA Mixl and
- Ratiometric SetA Mix2 The same is done for the oligos present in Equimolar SetB and be designated Ratiometric SetB Mixl and Ratiometric SetB Mix2. Oligo size classes with a larger number of oligos/class (20 nt-28 nt) are included across a larger range of dilutions, while those with a smaller number of oligos/class (16 nt, 18 nt, 32 nt-70 nt) are included near the mean oligo concentration of the pool.
- RNA-seq libraries using two fixed adaptor sequence small RNA library preparation methods (the NEB NEBNExt or Illumina Truseq small RNA methods) and two degenerate adaptor small RNA library preparation methods (the methods previously developed by the Erie and Galas labs) and one template-switching method (Clontech SMARTer small RNA). Libraries will be generated from eight serial 1 : 10 dilutions of this pool. This multiplexed library pool is quantified, balanced, and sequenced on a single MiSeq PE150 (Illumina) run. Oligos that jackpot or which are not detected in libraries prepared using any method are excluded from subsequent pools.
- An oligo was considered to be "jackpotting" when, in an equimolar spike in pool consisting only of spike in oligos, that oligo represented over 4% of the resultant reads in any of the tested NGS library preparation techniques. In such an experiment, an oligo was considered to have failed when it received ⁇ 0.02% of the available reads.
- the library preparation kits tested were the NEB Next Small RNA kit (NEB), the TrueSeq Small RNA kit (Illumina), The Clontech SMART er kit (Takara), and a "homebrew" 4N based method.
- the NEB and Illumina library preparation kits utilize similar chemistries for preparation of the small RNA material for deep sequencing, with the primary differences being in how each kit removes unwanted chemical side products.
- the "4N” method uses similar adaptor ligation methods as the NEB and Illumina kits, however, it includes a randomized adaptor that significantly reduces bias in annealing the adaptors to both small RNAs in solution as well as to the spike-in oligos.
- the Takara library preparation kit utilizes a ligation-free approach to NGS library preparation which is orthogonal to other available methods.
- the spike in oligos have been tested with a representative and broad range of NGS library preparation techniques and chemistries, and would be expected to perform in a similar fashion to other kits not specifically tested here, as well as future NGS library preparation chemistries.
- Ratiometric SetA Mix2 Ratiometric SetB Mixl and Ratiometric SetB Mix2 is initially evaluated by generating small RNA-seq libraries using the NEBNext (by the Laurent group) and the 4N (by the Galas group) methods on the pure pools at 4-8 serial 1 : 10 dilutions and sequenced on a MiSeq. These experiments determine whether the read counts for the component oligos correspond well to the expected numbers based on the pooling ratios.
- the three SetA pools (Equimolar SetA, Ratiometric SetA Mixl, Ratiometric SetA Mix2) will then be spiked individually into three biofluids (serum, plasma, and urine) for RNA Isolation using the miRNeasy micro kit at concentrations approximating 1%, 5%, and 10% of the miRNA concentration in the biofluid sample.
- the three SetB pools (Equimolar SetB, Ratiometric SetB Mixl and Ratiometric SetB Mix2) are spiked-in at concentrations approximating 1%, 5%, and 10% of the miRNA concentration in the RNA samples, in a corresponding fashion (e.g.
- RNA samples from the biofluid samples spiked with Equimolar SetA are spiked with Equimolar SetB), and small RNA-seq libraries are generated using the NEBNext (by the Laurent group) and the 4N (by the Galas group) methods and sequenced on a HiSeq (Laurent) or NextSeq (Galas).
- NEBNext by the Laurent group
- 4N by the Galas group
- Ratiometric SetA Mixl is spiked into a non-pregnant female serum sample and Ratiometric SetA Mix2 is spiked into a pregnant female serum sample.
- RNA will be isolated using the miRNeasy micro kit.
- Ratiometric SetB Mixl is spiked into the non-pregnant female RNA sample and Ratiometric SetB Mix2 is spiked into the pregnant female RNA sample.
- RNAseq libraries are generated using the NEBNext (by the Laurent group) and the 4N (by the Galas group) methods and sequenced on a HiSeq (Laurent) or NextSeq (Galas). These experiments demonstrate that we can use the ratiometric pools to normalize data from different samples.
- Applicants develop a rigorously validated series of spike-in small RNA sets, made from the spike-in RNA oligos designed and synthesized under a separate proposal ("Design and Synthesis of Small RNA Oligonucleotide Spike-ins"). Not to be bound by theory, Applicants believe that the results are critical, in that they can generate tools that can be easily adopted by both highly experienced and less experienced laboratories whose experiments include exRNA isolation and/or analysis.
- FIG. 5 shows the results of an experiment in which a large volume human plasma sample was divided into aliquots of different volumes (from lOOuL- 450uL) and spiked with a range of concentrations of a set of spike-in oligonucleotides (1x10 " 17 - lxlO "18 moles per lOOuL of plasma). Extracellular RNA was then isolated from each spike sample and subjected to small RNA sequencing.
- the read count of spike-in RNAs in the range of 20-24nt long was compared to the total miRNA read count in each library.
- the absolute concentration of miRNA in the plasma is calculated. Since the source of plasma was the same for all of the libraries, the finding that the estimated input miRNA concentration is relatively consistent across samples shows that the use of the spike-ins to estimate the miRNA concentration is robust to variations in sample input volume and spike-in concentration.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Physics & Mathematics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Engineering & Computer Science (AREA)
- Analytical Chemistry (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biotechnology (AREA)
- Organic Chemistry (AREA)
- Wood Science & Technology (AREA)
- Evolutionary Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Medical Informatics (AREA)
- Zoology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Immunology (AREA)
- Molecular Biology (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Certains aspects de la présente invention concernent des séquences d'ARN courtes artificielles qui (1) ont une composition de base qui imite celle de micro-ARN humains endogènes ; (2) ont une large diversité de séquences ; 3) couvrent une plage de longueurs de séquence comprise entre environ 16 et 70 nucléotides ; (4) ne partagent pas l'identité de séquence avec des séquences endogènes connues dans des génomes d'humains, d'animaux, de végétaux, de champignons, de bactéries et/ou de virus, et leurs méthodes d'utilisation.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201762473935P | 2017-03-20 | 2017-03-20 | |
| US62/473,935 | 2017-03-20 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2018175350A1 true WO2018175350A1 (fr) | 2018-09-27 |
Family
ID=63585748
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2018/023191 Ceased WO2018175350A1 (fr) | 2017-03-20 | 2018-03-19 | Ensemble de petites pointes d'arn validées pour analyse d'exarn |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2018175350A1 (fr) |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20150011413A1 (en) * | 2012-01-13 | 2015-01-08 | Micromedmark Biotech Co., Ltd. | Internal reference genes for micrornas normalization and uses thereof |
-
2018
- 2018-03-19 WO PCT/US2018/023191 patent/WO2018175350A1/fr not_active Ceased
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20150011413A1 (en) * | 2012-01-13 | 2015-01-08 | Micromedmark Biotech Co., Ltd. | Internal reference genes for micrornas normalization and uses thereof |
Non-Patent Citations (1)
| Title |
|---|
| LOCATI ET AL.: "Improving small RNA-seq by using a synthetic spike-in set for size-range quality control together with a set for data normalization", NUCLEIC ACIDS RESEARCH, vol. 43, no. 14, 18 August 2015 (2015-08-18), pages 1 - 10, XP055386708 * |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Chen et al. | Alternative polyadenylation: methods, findings, and impacts | |
| US11111541B2 (en) | Diagnostic MiRNA markers for Parkinson's disease | |
| Courts et al. | Specific micro‐RNA signatures for the detection of saliva and blood in forensic body‐fluid identification | |
| Hafner et al. | Genome-wide identification of miRNA targets by PAR-CLIP | |
| Nicholson et al. | Quantifying RNA binding sites transcriptome-wide using DO-RIP-seq | |
| Huang et al. | The discovery approaches and detection methods of microRNAs | |
| CN102732629B (zh) | 利用高通量测序同时测定基因表达量和多聚腺苷酸加尾的方法 | |
| US12286670B2 (en) | Full-length RNA sequencing | |
| Xu et al. | Transcriptome-wide identification and functional investigation of circular RNA in the teleost large yellow croaker (Larimichthys crocea) | |
| Wang et al. | An overview of methodologies in studying lncRNAs in the high-throughput era: when acronyms ATTACK! | |
| CN101962685B (zh) | 一种基于液相芯片检测微小核糖核酸的方法 | |
| Solé et al. | The use of circRNAs as biomarkers of cancer | |
| CN114736951B (zh) | 一种小分子rna的高通量测序文库构建方法 | |
| CN109505012A (zh) | 一种针对FFPE样本的mRNA二代测序文库构建的试剂盒及其应用 | |
| CN102181527B (zh) | 全基因组mRNA 3’末端基因文库的构建方法 | |
| CN104093854A (zh) | 表征组合物中的rna的方法和试剂盒 | |
| WO2016048843A1 (fr) | Séquençage par maillage de l'arn : analyse permettant une cartographie directe de l'arn : interactions de l'arn dans les cellules | |
| Kandhari et al. | The detection and bioinformatic analysis of alternative 3′ UTR isoforms as potential cancer biomarkers | |
| US20160239732A1 (en) | System and method for using nucleic acid barcodes to monitor biological, chemical, and biochemical materials and processes | |
| CN116875703B (zh) | 一种与犊牛生长发育相关的分子标记及其应用 | |
| Bhattacharjee | Advances of transcriptomics in crop improvement: A Review | |
| KR20110138341A (ko) | 비암호화 rna 발현 검정법을 이용한 방법 | |
| Bhattacharya et al. | Experimental toolkit to study RNA level regulation | |
| Chen et al. | The screening and validation process of miR-223-3p for saliva identification | |
| WO2018175350A1 (fr) | Ensemble de petites pointes d'arn validées pour analyse d'exarn |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18771633 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 18771633 Country of ref document: EP Kind code of ref document: A1 |