WO2025163086A1 - Méthodes de préparation d'échantillons d'arn traités et leur utilisation dans la préparation de vaccins à arn - Google Patents
Méthodes de préparation d'échantillons d'arn traités et leur utilisation dans la préparation de vaccins à arnInfo
- Publication number
- WO2025163086A1 WO2025163086A1 PCT/EP2025/052429 EP2025052429W WO2025163086A1 WO 2025163086 A1 WO2025163086 A1 WO 2025163086A1 EP 2025052429 W EP2025052429 W EP 2025052429W WO 2025163086 A1 WO2025163086 A1 WO 2025163086A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- rna
- sample
- cdna
- molecules
- disease
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61P—SPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
- A61P35/00—Antineoplastic agents
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61P—SPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
- A61P39/00—General protective or antinoxious agents
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
Definitions
- the invention relates to methods for analysing nucleic acid samples including methods for identifying and using biomarkers.
- the invention also relates to methods for determining a set of RNA sequences associated with a disease or condition and producing a database of RNA sequences associated with a disease or condition. Methods for diagnosing a disease and producing an RNA vaccine for a subject with a disease are also provided.
- RNA sequencing has become a powerful tool for understanding biology (Stark, R., Grzelak, M. & Hadfield, J. RNA sequencing: the teenage years. Nat. Rev. Genet. 20, 631-656 (2019)). Its applications range from drug development to improving agriculture. Most cells and tissues share many of the same highly expressed genes which are commonly known as housekeeping genes. These genes are typically responsible for basic cell functions and thus do not provide cell specific characteristics. Since these house-keeping genes typically make up a large fraction of RNA within a sample, RNA sequencing data is usually dominated by sequencing reads from these non-informative RNA. This phenomenon results in two main negative effects on generating good results from RNA sequencing projects; first, genes and isoforms which are specific to the condition in question are difficult to detect, and second, the data generated is, in large part, redundant.
- the first main negative effect has two consequences. The first is that the amount of sequencing required to detect genes of interest must be large enough to handle sampling inefficiencies caused by the low relative abundance of genes of interest. The second being that, in some cases, low abundance target genes may be simply impractical to identify. This can be evidenced by the still ongoing efforts to annotate the human genome where even after thousands of sequencing projects the full human transcriptome is still elusive with novel isoforms and genes being reported with regularity. Since eukaryotic transcriptomes derive their complexity from alternative splicing which generates combinatorial permutations, the search for novel RNA will likely be a constant endeavour.
- the second main negative effect also has two main consequences.
- the first is that more data requires more processing time which increases overall cost and time of RNA sequencing experiments. These costs are both in terms of energy from additional computation required and work time from bioinformaticians that are tasked with processing the data.
- the second consequence is that redundant data results in the need for more storage.
- sequencing is becoming more widespread, data storage has become a significant problem. For RNA sequencing technology to take on more roles, more efficient data generation is necessary to reduce storage requirements.
- the invention is based on methods that take advantage of their ability to generate full length sequences from RNA extracted from blood without fragmenting RNA or cDNA products before sequencing.
- This provides a transcriptome representing any RNA that make its way into the circulatory system. These include RNA from typical blood cells like red blood cells, white blood cells, other immune cells, etc. These would also include any other cells that are typically uncommon in the circulatory system such as cancer cells or cells that somehow dislodged into the circulatory system. This also includes extracellular RNA which could have originated from any cell within the body.
- RNA from all these sources and at full length the invention enables each RNA to be ascribed to a cell/tissue of origin as well as a state of cell or tissue behaviour.
- the transcription start site, end site, and splicing are features which typically represent unique combinations used by different cell types. This information can also be used to directly identify the protein isoform that would be translated from messenger RNA. The ability to detect protein isoforms and identify those that are associated with a disease allow for the development of treatments such as RNA vaccines.
- the invention provides a method for analysing a nucleic acid sample, the method comprising:
- the nucleic acid sample may be obtained from a biological sample of any suitable form including any material, biological fluid, tissue, or cell obtained or otherwise derived from a subject.
- the nucleic acid sample may be obtained from (cancer) cells or genetic material (DNA or RNA) derived from the (cancer) cells, to include cell-free genetic material (e.g. found in the peripheral blood).
- the nucleic acid sample may be obtained from a biopsy sample, optionally a solid biopsy sample and/or a liquid biopsy sample.
- the nucleic acid sample may be obtained from biological fluid or a fluid or lysate generated from a biological material.
- the nucleic acid sample may be obtained from blood.
- the nucleic acid sample may be obtained by extracting RNA from a biological sample (e.g. blood, optionally whole blood) obtained from a subject. cDNA may then be synthesized using the RNA as a template (i.e. by reverse transcription).
- Blood samples may be readily and frequently obtained, allowing for repeated and non- invasive sampling of a patient.
- the invention provides a method for analysing a nucleic acid sample, the method comprising:
- RNA sample optionally comprising synthesizing cDNA using the RNA as a template
- the blood sample may be whole blood.
- the subject may have cancer.
- Sequencing the processed RNA or cDNA generates a sequencing output.
- the sequencing output may provide a transcriptome representing any RNA in the blood.
- the sequencing output from the methods defined herein may comprise sequences of RNA from blood cells, other cells dislodged into the blood, and/or cancer cells, as well as extracellular RNA.
- the sequencing output may be used to ascribe each RNA sequence/molecule to a cell or tissue of origin and/or to a state of cell or tissue behaviour.
- the invention provides a method for analysing a nucleic acid sample, the method comprising:
- nucleic acid sample comprising an RNA molecule, optionally wherein the nucleic acid sample is extracted from a blood sample obtained from a subject;
- sequencing the processed RNA or cDNA wherein the sequencing output is used to identify the cell type and/or tissue type from which the RNA molecule is derived.
- the invention provides a method for analysing a nucleic acid sample, the method comprising:
- nucleic acid sample comprising an RNA molecule, optionally wherein the nucleic acid sample is extracted from a blood sample obtained from a subject;
- the sequencing output from the methods defined herein may be used to identify the transcription start site, end site and/or splicing.
- the methods defined herein may comprise identifying at least one transcription start site, end site and/or splice junction after sequencing the processed RNA or cDNA (i.e. in the sequencing output), optionally identifying the transcription start site, end site and all splice junctions in one or more (2, 3, 4, 5, 10, 20, 100 or 1000 or more) transcript(s).
- These features typically represent a unique combination used by different cell types enabling the cell type to be identified.
- This information can also be used to identify the protein isoform that would be translated from an RNA sequence/molecule/transcript.
- the sequencing output may be used to identify the presence of an RNA transcript sequence.
- the sequencing output may be used to identify one or more isoforms of a protein.
- the protein may have 2, 3, 4, 5, 10, 15 or 20 or more isoforms and the method may identify which of the protein isoforms is encoded by the RNA molecule
- RNA (transcript) sequence/molecule or set of RNA sequences may be associated with a disease or condition.
- An RNA (transcript) sequence/molecule or set of RNA sequences associated with a disease or condition is an RNA (transcript) sequence/molecule or set of RNA sequences that is correlated with the disease or condition.
- An RNA (transcript) sequence/molecule or set of RNA sequences associated with a disease or condition may be (uniquely) present in a subject with the disease or condition, optionally absent in a subject without the disease or condition.
- An RNA (transcript) sequence/molecule or set of RNA sequences associated with a disease or condition may be (uniquely) absent in a subject with the disease or condition, optionally present in a subject without the disease or condition.
- the RNA (transcript) sequence may have a coding sequence that is disease (for example cancer) specific.
- An RNA (transcript) sequence may encode a protein isoform that is associated with the disease or condition.
- One or more peptide sequence(s) may be identified in the protein isoform that are present only in cells from subjects with the disease, optionally cancer cells.
- One or more peptide sequence(s) may be identified in the protein isoform that are antigenic peptides. These peptide sequences (antigenic peptides) are encoded by sections of the RNA (transcript) sequence.
- the invention provides a method for determining a set of RNA sequences associated with a disease or condition, the method comprising:
- the RNA sample may be extracted from any material, biological fluid, tissue, or cell obtained or otherwise derived from the subject.
- the sample may be a (solid) biopsy sample.
- the sample may be a liquid biopsy sample.
- the sample may be obtained from a biological fluid or a fluid or lysate generated from a biological material.
- the RNA sample may be extracted from a blood sample, optionally a whole blood sample, obtained from the subject.
- the invention provides a method for determining a set of RNA sequences associated with a disease or condition, the method comprising:
- RNA sequences in the set may encode an antigenic peptide, optionally each RNA sequence in the set encodes an antigenic peptide.
- Each RNA sequence in the set may be (uniquely) expressed in a subject with the disease or condition and, optionally, is not expressed in subjects without the disease or condition.
- set of RNA sequences is meant 2 or more, optionally 3, 4, 5, 6, 7, 8, 9, 10, 20, 50 or 100 or more RNA sequences.
- Each RNA sequence in the set may be more than 10bp, 20 bp, 50 bp, 100bp, 500bp, 1000 bp long, optionally 10 to 10000 bp, 20 to 1000bp or 50 to 500 bp long.
- Each RNA sequence in the set may be less than 10 bp, 20 bp, 50 bp, 100 bp, 500 bp, or 1000 bp long.
- One or more, optionally 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50 or 100 or more RNA sequences from the set may be used to produce a (first) RNA vaccine for the subject.
- RNA sequence in the set may be between about 10 and about 1000 nucleotides in length, between about 10 and about 100 nucleotides in length, between about 15 and about 80 nucleotides in length or between about 18 and about 75 nucleotides in length.
- the invention provides a method for producing an RNA vaccine for a subject with a disease, the method comprising:
- RNA sample obtained from the subject
- processing the RNA sample optionally comprising synthesizing cDNA using the RNA as a template
- the RNA sample may be extracted from any material, biological fluid, tissue, or cell obtained or otherwise derived from the subject.
- the sample may be a (solid) biopsy sample.
- the sample may be a liquid biopsy sample.
- the sample may be obtained from a biological fluid or a fluid or lysate generated from a biological material.
- the RNA sample may be extracted from a blood sample, optionally a whole blood sample, obtained from the subject.
- the invention provides a method for producing an RNA vaccine for a subject with a disease, the method comprising:
- the invention provides a method for producing an RNA vaccine for a subject with a disease, the method comprising:
- RNA sample optionally comprising synthesizing cDNA using the RNA as a template
- sequencing the processed RNA or cDNA wherein the sequencing output is used to identify the presence of an RNA (transcript) sequence associated with the disease
- processing the RNA or cDNA sample comprises:
- RNA or cDNA sample (i) contacting the RNA or cDNA sample with a DNA array, comprising two or more DNA molecules, wherein one or more RNA or cDNA molecules from the RNA or cDNA sample anneal to the DNA molecules of the DNA array; and
- the method may further comprise:
- RNA sample optionally comprising synthesizing cDNA using the RNA as a template
- RNA sequencing output is used to identify the presence of an RNA (transcript) sequence associated with the disease
- RNA vaccine for the subject.
- the first RNA vaccine and the further RNA vaccine may be the same.
- the first RNA vaccine and the further RNA vaccine may be different, optionally the first RNA vaccine and the further RNA vaccine differ in the antigenic peptides they encode.
- the RNA (transcript) sequence associated with the disease may be different in the RNA sample and the further RNA sample.
- the RNA (transcript) sequence associated with the disease may be the same in the RNA sample and the further RNA sample.
- the RNA (transcript) sequence may be between about 10 and about 1000 nucleotides in length, between about 10 and about 100 nucleotides in length, between about 15 and about 80 nucleotides in length or between about 18 and about 75 nucleotides in length.
- the RNA transcript may encode a protein isoform present in the subject with the disease.
- Using the identified RNA transcript sequence to produce the RNA vaccine may comprise producing an RNA molecule comprising at least a portion of the sequence, wherein the RNA molecule comprises an open reading frame (ORF), optionally encoding at least one antigenic peptide.
- the RNA molecule may further comprise a 5' UTR, 3' UTR, a polyA tail and/or a 5' cap.
- the 5' cap may have the Cap O structure or the Cap 1 structure.
- the Cap 0 structure may include a methyl-7 guanine nucleotide linked to the 5' position through a 5' triphosphate.
- the Cap 1 structure may be achieved by the methylation of the mRNA first nucleotide at the ribose 2'-0 position.
- the RNA molecule may comprise one or two (optionally uridine-based) RNA strands, optionally with non-coding sequences optimised for translational performance.
- the RNA molecule in the RNA vaccine may comprise at least a portion of the wild-type sequence of the RNA transcript or may comprise a modified sequence. For example, the sequence may be adapted with respect to its codon usage.
- Adaption of codon usage can increase translation efficacy and half-life of the RNA. At least 25%, preferably at least 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95% or even 100% of uridine present in the RNA sequence of each RNA molecule in the RNA vaccine may be replaced by pseudouridine or N1-methylpseudouridine or 15-methyluridine or 2-thiouridine.
- the RNA vaccine may comprise conventional (non-replicating) mRNA, self-amplifying mRNA and/or trans-amplifying mRNA (taRNA).
- Self-amplifying mRNA may be based on the addition of a viral replicase gene to enable the mRNA to self-replicate.
- RNA vaccine refers to a vaccine comprising an RNA molecule as defined herein.
- the vaccine may comprise, however, other substances and molecules which are required or which are advantageous when the vaccine is administered to an individual (e.g. pharmaceutical excipients).
- the RNA vaccine may comprise the RNA molecule in a buffer solution.
- the RNA molecule may be formulated in a lipid-based carrier, polymer or peptide, optionally a lipid nanoparticle.
- the RNA molecule may be formulated in a lipoplex nanoparticle comprising the synthetic cationic lipid (R)-N,N,N- trimethyl-2,3-dioleyloxy-1-propanaminium chloride (DOTMA) and the phospholipid 1,2- dioleoyl-sn-glycero-3-phosphatidylethanolamine (DOPE).
- DOTMA synthetic cationic lipid
- DOPE phospholipid 1,2- dioleoyl-sn-glycero-3-phosphatidylethanolamine
- the RNA vaccine may be based on uridine mRNA-lipoplex nanoparticles (as described in Luis Rojas et al., Nature 2023; Jun 618(7963): 144-150 doi: 10.1038/s41586-023-06063-y, which is hereby incorporated by reference).
- the RNA vaccine may be delivered via transfection of dendritic cells.
- the RNA vaccine may encode more than 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15 or 20 antigenic peptides.
- RNA vaccine may be manufactured as described in Sara Sousa Rosa et al. Vaccine. 2021 Apr 15;39(16):2190-2200 doi: 10.1016/j.vaccine.2021.03.038, which is hereby incorporated by reference.
- the disease may be cancer or an infectious disease (e.g. COVID-19).
- the RNA vaccine may be an RNA cancer vaccine.
- the RNA transcript sequence may have a coding region that is disease (cancer) specific.
- the RNA transcript sequence may encode a protein isoform that is (uniquely) expressed in subjects with cancer.
- the set of RNA sequences in the transcript may encode neoantigens (cancer specific peptide sub-sequences). The neoantigens may be expressed on the cell surface via MHC complexes.
- Neoantigens may be produced by alternative splicing and/or RNA editing and may be predicted from RNA sequencing output (Jiyeon Park and Yeun-Jun Chung, Genomics and Informatics 2019;17(3):e23 DOI: https://doi.Org/10.5808/GI.2019.17.3.e23 which is hereby incorporated by reference).
- the neoantigens may be targets for therapy (for example immunotherapy), optionally a cancer vaccine, adoptive cell therapy and/or antibody-based therapy (Na Xie et al., Signal Transduction and Targeted Therapy (2023)8:9 https://doi.org/10.1038/s41392-022-01270-x, which is hereby incorporated by reference).
- the neoantigen(s) may be targets for one or more targeted therapies.
- the neoantigen(s) may be targets for one or more antibody-drug conjugates.
- the neoantigen(s) may be targets for one or more radiopharmaceuticals.
- 2 or more, 5 or more, 10 or more or 15 or more neoantigens may be targeted by an RNA vaccine.
- 2 or more, 5 or more, 10 or more or 15 or more neoantigens may be targeted by one or more of the following: a targeted therapy, immunotherapy, adoptive cell therapy, antibody-based therapy, antibody-drug conjugate(s) and/or radiopharmaceutical(s).
- the invention provides an RNA vaccine for use in therapy, wherein the RNA vaccine is produced using the methods defined herein.
- a subject may receive an immune checkpoint inhibitor, optionally atezolizumab, prior to treatment with the RNA vaccine.
- the RNA vaccine may be administered to the subject 1, 2, 3, 4, 5, 6, 7, 8, or 9 or more times.
- the invention provides a method for discovering a biomarker for a disease comprising: (i) providing a (test) RNA sample obtained from a subject with the disease;
- RNA sample optionally comprising synthesizing cDNA using the RNA as a template
- the RNA sample may be extracted from a blood sample, optionally a whole blood sample.
- the invention provides a method for discovering a biomarker for a disease comprising:
- Processing the RNA or cDNA sample may comprise:
- RNA or cDNA sample (i) contacting the RNA or cDNA sample with a DNA array, comprising two or more DNA molecules, wherein one or more RNA or cDNA molecules from the RNA or cDNA sample anneal to the DNA molecules of the DNA array; and
- the amino acid sequence data may comprise amino acid sequence data obtained by sequencing RNA from a sample from a subject without the disease, determining one or more amino acid sequences corresponding to an RNA sequence and partitioning the one or more amino acid sequences into a plurality of segments of a defined length.
- the (test) RNA sample may be extracted from a blood sample, optionally a whole blood sample.
- the segments may be between 1 and 50, 2 and 40, 3 and 35, 4 and 30, 5 and 28, or 6 and 25 amino acids in length.
- the segments may be 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 or 25 amino acids long.
- the invention provides a method for discovering a biomarker for a disease comprising:
- control RNA sample optionally comprising synthesizing cDNA using the RNA as a template
- the method may further comprise identifying a segment that is present in the test sample but not in the control sample or vice versa and/or identifying a segment whose level differs between the samples.
- Processing the test and/or control RNA or cDNA sample may comprise:
- RNA or cDNA sample comprising two or more DNA molecules, wherein one or more RNA or cDNA molecules from the RNA or cDNA sample anneal to the DNA molecules of the DNA array; and (ii) extracting the unannealed RNA or cDNA molecules thereby generating processed RNA or cDNA.
- the (test and/or control) RNA sample may be extracted from a blood sample, optionally a whole blood sample.
- the invention provides a method for discovering a disease biomarker, the method comprising:
- processing the RNA or cDNA sample comprises:
- RNA or cDNA sample (i) contacting the RNA or cDNA sample with a DNA array, comprising two or more DNA molecules, wherein one or more RNA or cDNA molecules from the RNA or cDNA sample anneal to the DNA molecules of the DNA array; and
- the method may further comprise:
- control RNA sample optionally comprising synthesizing cDNA using the RNA as a template
- the methods may comprise comparing the sequencing output for the test RNA sample and control RNA samples by analysing the sequencing output, optionally by: (i) determining one or more amino acid sequences corresponding to an RNA or cDNA sequence from the sequencing output for the test RNA sample and control RNA sample;
- the method may comprise identifying a segment as a disease biomarker when it is present in one sample but not in the other and/or identifying a segment as a disease biomarker when its level differs between the samples.
- the RNA or cDNA sequence from the sequencing output for the test RNA sample and control RNA sample may be from the same gene/transcript.
- the segments may be between 1 and 50, 2 and 40, 3 and 35, 4 and 30, 5 and 28, or 6 and 25 amino acids in length.
- the segments may be 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24 or 25 amino acids long.
- Discovering a disease biomarker means identifying a novel biomarker (an indicator of a biological state) for a particular disease, for example uncovering a previously unknown biomarker for developing into a test for the disease.
- the disease biomarker may be suitable for use in diagnosing the disease, characterising the disease, predicting response to therapy, detecting minimal residual disease and/or prognosing the disease.
- characterisation is meant classification and evaluation of the disease.
- Prognosis refers to predicting the likely outcome of the disease for the subject.
- the characterisation of and/or prognosis for the disease may comprise determining the grade and/or stage of the disease.
- the characterisation of the disease may comprise determining the sub-type of the disease.
- the disease biomarker may be suitable for use in indicating the likelihood that a subject with a particular disease will benefit from a specific therapy.
- the disease may be cancer.
- the characterisation of and/or prognosis for the cancer may comprise determining the presence or absence of metastases. Metastasis, or metastatic disease, is the spread of a cancer from one organ or part to another non-adjacent organ or part. The new occurrences of disease thus generated are referred to as metastases. Characterisation of and/or prognosis for the disease may also comprise predicting biochemical recurrence and/or determining whether the cancer is aggressive and/or determining whether the cancer has spread to the lymph nodes. Aggressive refers to a cancer that is fast growing, more likely to spread, more likely to recur and/or shows resistance to treatment.
- the invention provides a method for monitoring a subject comprising:
- Monitoring a subject may comprise monitoring response to treatment for a disease, for example monitoring whether treatment is successful and/or monitoring for adverse reactions/complications.
- the first time point may be prior to starting treatment and the second time point may be during or after treatment.
- An immune system related transcript may be detected in the sequencing output.
- the first time point may be prior to starting treatment with an immunotherapy and the second time point may be during or after treatment with the immunotherapy.
- the disease biomarker may be a cDNA sequence or an RNA sequence.
- the cDNA sequence will correspond to an RNA sequence.
- the cDNA/RNA sequence may correspond to a protein or peptide.
- the method may further comprise identifying an RNA, transcript, transcript model, gene, protein and/or peptide corresponding to a cDNA sequence.
- the disease biomarker may, therefore, be a cDNA molecule (of a specific sequence), DNA molecule (of a specific sequence), RNA molecule (of a specific sequence), transcript, transcript model, protein or peptide.
- the method may comprise discovering more than one disease biomarker, optionally more than 10, 100, 1000, 10000, 100000, 1 million or 10 million disease biomarkers.
- the method may comprise discovering between 1 and 10, 1 and 100, 1 and 1000, 1 and 10000, 1 and 100000, 1 and 1 million or 1 and 10 million disease biomarkers.
- Two or more of the disease biomarkers may be compiled to form a database.
- the invention provides a method for producing a database of disease biomarkers, the method comprising:
- RNA samples optionally comprising synthesizing cDNA using the RNA as a template
- RNA samples may each be extracted from a blood sample, optionally a whole blood sample.
- the invention provides a method for producing a database of disease biomarkers, the method comprising:
- RNA samples optionally comprising synthesizing cDNA using the RNA as a template
- Each disease biomarker may be an RNA transcript, optionally wherein the RNA transcript encodes a protein isoform.
- the protein isoform may be present or absent in subjects with the disease.
- the invention provides a method for producing a database of RNA sequences associated with a disease or condition, the method comprising:
- RNA samples may each be extracted from a blood sample, optionally a whole blood sample.
- the invention provides a method for producing a database of RNA sequences associated with a disease or condition, the method comprising:
- one or more RNA sequences in the set may encode an antigenic peptide, optionally each RNA sequence in the set encodes an antigenic peptide.
- Identifying a set of RNA sequences in the RNA transcript may comprise:
- the amino acid sequence data Prior to the comparing in step (iii) the amino acid sequence data may be partitioned into a plurality of segments of a defined length.
- the amino acid sequence data in step (iii) above may comprise amino acid sequence data obtained by sequencing RNA from a sample from a subject with a disease (for example cancer) and determining one or more amino acid sequences corresponding to an RNA sequence.
- the amino acid sequence data may comprise amino acid sequence data obtained by sequencing RNA from a sample from a subject without a disease (for example cancer) and determining one or more amino acid sequences corresponding to an RNA sequence.
- a segment may be identified as a (disease) biomarker based on its presence or absence in the amino acid sequence data, for example where a particular segment is uniquely present or at a higher level in subjects with a particular disease the segment may be identified as a disease biomarker.
- the two or more segments may be segments that are present in amino acid sequence data obtained by sequencing RNA from a sample from a subject with a disease (for example cancer) and determining one or more amino acid sequences corresponding to an RNA sequence, and absent in amino acid sequence data obtained by sequencing RNA from a sample from a subject without a disease (for example cancer) and determining one or more amino acid sequences corresponding to an RNA sequence.
- the segments may be between 1 and 50, 2 and 40, 3 and 35, 4 and 30, 5 and 28, or 6 and 25 amino acids in length.
- the segments may be 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24 or 25 amino acids long.
- the segments (k-mers) may be overlapping, for example such that each amino acid of the one or more amino acid sequences is the start of a segment (k-mer) (insofar as the length of the one of more amino acid sequences and the length of the segments allows).
- the sequencing output from a sample analysed and/or processed according to the methods defined herein may be compared to a database produced using a method defined herein in order to identify the presence of a disease biomarker or a set of RNA sequences associated with a disease or condition.
- the database of disease biomarkers and the database of RNA sequences associated with a disease or condition may be used to guide treatment decisions and to aid in the development of new treatments.
- the disease biomarker and/or set of RNA sequences may be a suitable target for a therapeutic agent, for example a vaccine, an RNA therapy and/or gene editing.
- the discovery of a disease biomarker specific to cancer cells can be an initial step in identifying a cancer specific antigen for a cancer vaccine to target.
- the method may further comprise identifying a transcript or protein/peptide corresponding to the disease biomarker as a target for therapy, optionally a cancer vaccine target.
- the therapy may be an antibody-drug conjugate and/or a radiopharmaceutical.
- the method may further comprise developing a therapy, for example a vaccine, optionally an RNA vaccine, directed to the target.
- the method may further comprise developing an antibodydrug conjugate directed to the target.
- the method may further comprise developing a radiopharmaceutical directed to the target.
- the methods may comprise providing 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40 or more RNA/cDNA samples from different subjects with the disease and/or providing 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40 or more RNA/cDNA samples from different subjects without the disease.
- the methods comprise providing 30 or more RNA/cDNA samples from different subjects with the disease (e.g. with a cancer) and providing 30 or more RNA/cDNA samples from different subjects without the disease (e.g. without the cancer).
- the methods may further comprise:
- RNA from biological fluid (e.g. blood) or a fluid or lysate generated from a biological material from a subject with a disease and from a subject without the disease;
- a subject with a disease means the subject has the disease at the time the (biological) sample (biological fluid or biological material) from which the RNA/cDNA sample is derived is taken from the subject.
- a subject without a disease means the subject does not have the disease at the time the (biological) sample (biological fluid or biological material) from which the RNA/cDNA sample is derived is taken from the subject.
- the subject without the disease may be a healthy subject.
- the disease biomarker may be a cDNA molecule (of a specific sequence), RNA molecule (of a specific sequence), protein or peptide that is detectable in a sample from a subject with a disease but not in a sample from a subject without the disease.
- the disease biomarker may be a cDNA molecule (of a specific sequence), RNA molecule (of a specific sequence), protein or peptide that is not detectable in a sample from a subject with a disease but is detectable in a sample from a subject without the disease.
- the cancer vaccine target may be a cDNA molecule (of a specific sequence), RNA molecule (of a specific sequence), protein or peptide that is detectable in a sample from a subject with a cancer but not in a sample from a subject without the cancer.
- the disease biomarker may be a transcript that is (uniquely) present in subjects with a particular disease.
- the disease biomarker is a transcript that is (uniquely) absent in subjects with a particular disease.
- the cancer vaccine target is a transcript that is uniquely present in subjects with a particular cancer.
- the transcript/RNA/cDNA sequence may correspond to a particular protein or peptide. At least a portion of the protein or peptide may form an antigen comprised in a cancer vaccine.
- the present invention enables the identification of transcripts found only in subjects with a disease, optionally cancer.
- Such transcript models can be identified through comparison with subjects without the disease (e.g. benign patients) and, optionally, public transcriptome annotation databases.
- Sequencing output and/or resulting transcriptomic profile(s) from a subject with a particular disease e.g. breast cancer
- sequencing output and/or resulting transcriptomic profile(s) from a subject with a different disease for example, ovarian and/or colorectal cancer
- the invention provides a method for diagnosing a disease in a subject comprising:
- RNA sample optionally comprising synthesizing cDNA using the RNA as a template
- the RNA sample may be extracted from a blood sample, optionally a whole blood sample, obtained from the subject.
- the invention provides a method for diagnosing a disease or condition, wherein the method comprises:
- the RNA sample may be extracted from a blood sample, optionally a whole blood sample.
- the invention provides a method for diagnosing a disease or condition, wherein the method comprises:
- the disease may be an autoimmune disease, a cancer, diabetes, coronary disease, a metabolic disease, Alzheimer’s disease, dementia, and/or an infectious disease.
- the disease may be a viral infection, bacterial infection and/or fungal infection.
- the disease may be COVID- 19.
- the disease may be identified at a (very) early stage, optionally before symptoms have developed.
- the disease may be cancer and the cancer may be identified at a (very) early stage, optionally before significant tumour growth has occurred.
- the disease may be cancer (e.g. a hematological cancer such as leukemia, lymphoma or multiple myeloma) and diagnosing the disease may comprise detecting minimal residual disease (MRD).
- MRD may be defined as cancer cells that remain in the subject during or after treatment.
- the invention provides a method for diagnosing cancer in a subject, the method comprising:
- processing the RNA or cDNA sample comprises:
- RNA or cDNA sample (i) contacting the RNA or cDNA sample with a DNA array, comprising two or more DNA molecules, wherein one or more RNA or cDNA molecules from the RNA or cDNA sample anneal to the DNA molecules of the DNA array; and
- the sequencing output may be used to determine the presence or absence of one or more RNA molecules/sequences/transcripts in the RNA sample in order to identify whether the subject has cancer.
- the disease may be an autoimmune disease.
- the invention provides a method for diagnosing an autoimmune disease in a subject comprising:
- RNA sample optionally comprising synthesizing cDNA using the RNA as a template
- the RNA sample may be extracted from a blood sample, optionally a whole blood sample.
- the invention provides a method for diagnosing an autoimmune disease in a subject comprising: (i) providing a (test) RNA sample extracted from a blood sample obtained from the subject;
- RNA sample optionally comprising synthesizing cDNA using the RNA as a template
- autoimmune disease herein is a disease or disorder wherein the immune system of a subject mounts an immune response to the subject’s own tissue.
- the autoimmune disease may be arthritis, celiac disease, diabetes mellitus type 1, graves' disease, inflammatory bowel disease, multiple sclerosis, alopecia areata, Addison's disease, pernicious anemia, psoriasis, systemic lupus erythematosus, myasthenia gravis, Hashimoto’s thyroiditis, Vitiligo, Sjogren’s syndrome, myositis, chronic inflammatory demyelinating polyneuropathy (Cl DP), dermatomyositis, Guillain-Barre syndrome, ulcerative colitis, Crohn’s disease and/or vasculitis.
- Cl DP chronic inflammatory demyelinating polyneuropathy
- the sequencing output may be used to identify whether the subject has the disease by comparing to a database of biomarkers, optionally wherein the database of biomarkers is produced using a method defined herein.
- diagnosing is meant determining that a subject has the disease at the time of testing.
- the methods defined herein may further comprise selecting a treatment appropriate for the disease and, optionally, treating the disease with the selected treatment. Treating the subject may start at an early stage of disease progression, optionally before symptoms have appeared or significant tumour growth has occurred.
- the treatment may be a vaccine, an antibody-drug conjugate, a radiopharmaceutical, an immune checkpoint inhibitor and/or immunotherapy.
- the invention provides a method for characterising and/or prognosing a disease in a subject comprising:
- the invention provides a method for selecting a treatment for a disease in a subject comprising:
- sequencing the processed (normalized) RNA/cDNA sample wherein the sequencing output is used to provide a diagnosis, characterisation of and/or a prognosis for the disease
- the invention provides a method for predicting the responsiveness of a subject with a disease to a therapeutic agent comprising:
- the therapeutic agent may be an immune checkpoint inhibitor and/or immunotherapy, optionally CAR-T therapy.
- the therapeutic agent may be an antibody-drug conjugate and/or a radiopharmaceutical.
- the RNA/cDNA sample may comprise full-length RNA/cDNA and/or the processed RNA/cDNA sample comprises full-length RNA/cDNA.
- the methods as described herein may further comprise treating the subject.
- the subject may be treated with a vaccine, an immune checkpoint inhibitor, immunotherapy, CAR-T therapy, an antibody-drug conjugate and/or a radiopharmaceutical.
- the methods may comprise comparing the sequencing output for the processed (normalized) RNA/cDNA sample to one or more reference sequences or to the sequencing output of one or more control samples, optionally wherein the one or more control samples are from one or more subjects with and/or without the disease.
- the methods comprise comparing the sequencing output for the processed (normalized) RNA/cDNA sample to the sequencing output of one or more control samples from one or more subjects with the disease.
- sequencing output is meant one or more sequences obtained from sequencing the processed (normalized) RNA/cDNA.
- the sequence(s) may be raw sequence(s) or may be further processed. For example, low quality reads may be filtered and/or adapter sequences may be filtered and removed.
- the (processed) sequence(s) may be mapped to the human reference genome (for example, using Minimap2) to prepare transcriptome profile(s).
- One or more transcript models may be identified in the transcriptome profile(s) (sequence(s) mapped to the genome).
- the transcript model represents a specific transcript i.e. a particular RNA isoform or splice variant produced from a gene.
- the sequencing output that is used in the methods defined herein may be transcript(s), transcriptome profile(s) and/or transcript model(s).
- Using the sequencing output to identify whether the subject has the disease may comprise detecting a disease biomarker.
- Using the sequencing output to identify whether the subject has the disease may comprise detecting more than one disease biomarker, optionally more than 10, 100, 1000, 10000, 100000, 1 million or 10 million disease biomarkers.
- Using the sequencing output to identify whether the subject has the disease may comprise detecting between 1 and 10, 1 and 100, 1 and 1000, 1 and 10000, 1 and 100000, 1 and 1 million or 1 and 10 million disease biomarkers. Detecting the disease biomarker may comprise determining the presence or absence of the disease biomarker.
- Using the sequencing output to identify whether the subject has the disease may comprise determining the presence or absence of more than one disease biomarker, optionally more than 10, 100, 1000, 10000, 100000, 1 million or 10 million disease biomarkers. Using the sequencing output to identify whether the subject has the disease may comprise determining the presence or absence of between 1 and 10, 1 and 100, 1 and 1000, 1 and 10000, 1 and 100000, 1 and 1 million or 1 and 10 million disease biomarkers.
- the presence of a particular RNA/cDNA sequence in the sequencing output may indicate that the subject has the disease, for example where a particular transcript (corresponding to the cDNA molecule) is uniquely present in subjects with a particular disease.
- the presence of a particular RNA/cDNA sequence in the sequencing output may indicate a characterisation of and/or a prognosis for the disease.
- the presence of a particular RNA/cDNA sequence in the sequencing output may allow prediction of the responsiveness of a subject with a disease to a therapeutic agent, for example where a particular transcript has been found to correlate with responsiveness of a subject with a disease to a particular therapeutic agent.
- the absence of a particular RNA/cDNA sequence in the sequencing output may indicate that the subject has the disease, for example where a particular transcript (corresponding to the cDNA molecule) is absent in subjects with a particular disease.
- the absence of a particular RNA/cDNA sequence in the sequencing output may indicate a characterisation of and/or a prognosis for the disease.
- the absence of a particular RNA/cDNA sequence in the sequencing output may allow prediction of the responsiveness of a subject with a disease to a therapeutic agent, for example where a particular transcript has been found to correlate with responsiveness of a subject with a disease to a particular therapeutic agent.
- the sequencing output may be analysed to identify unique RNA sequences (transcripts), optionally substantially all unique RNA sequences (transcripts).
- the sequencing output may be analysed to identify unique RNA sequences (transcripts) as described in Kuo, R.I., Cheng, Y., Zhang, R. et al. BMC Genomics 21 , 751 (2020) https://doi.org/10.1186/s12864-020- 07123-7, which is hereby incorporated by reference.
- One or more amino acid sequences may be identified that can be translated from an RNA sequence, optionally 3 amino acid sequences are identified corresponding to the 3 longest open reading frames of an RNA sequence (full translation, first to last codon, without start or stop codon selection).
- the one or more amino acid sequences may be split into two or more (peptide) segments (k-mers).
- the (peptide) segments (k-mers) may be between 1 and 50, 2 and 40, 3 and 35, 4 and 30, 5 and 28, or 6 and 25 amino acids in length.
- the (peptide) segments (k-mers) may be between 6 and 25 amino acids in length, optionally 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24 or 25 amino acids in length.
- the (peptide) segments (k-mers) may be overlapping, for example such that each amino acid of the one or more amino acid sequences is the start of a (peptide) segment (k-mer) (insofar as the length of the one of more amino acid sequences and the length of the segments allows).
- the (peptide) segments (k-mers) may be compiled into a database.
- (Peptide) segments (k-mers) identified from sequencing output obtained from a subject with a disease (for example cancer) may be compared to (peptide) segments (k-mers) identified from sequencing output obtained from a subject without a disease (for example cancer). This comparison may be used to identify one or more (peptide) segments (k-mers) associated with the disease (for example cancer).
- This comparison may be used to identify one or more (peptide) segments (k-mers) that are present in a subject with the disease (for example cancer) and/or absent in a subject without the disease (for example cancer).
- the comparison may be used to identify one or more (peptide) segments (k-mers) that are at an increased level in a subject with the disease (for example cancer) compared to a subject without the disease (for example cancer).
- One or more (peptide) segments (k-mers) identified by a comparison as described above may be identified as a biomarker, a neoantigen target, a target for an antibody drug conjugate, a target for a neoantigen therapy, a target for a vaccine and/or a target for a radiopharmaceutical.
- Two or more (peptide) segments (k-mers) identified by a comparison as described above may be combined to form a longer amino sequence which may be identified as a biomarker, a neoantigen target, a target for an antibody drug conjugate, a target for a neoantigen therapy, a target for a vaccine and/or a target for a radiopharmaceutical.
- two or more overlapping (peptide) segments (k-mers) identified by a comparison as described above may be combined to form a longer amino sequence including the overlapping and non-overlapping amino acids (i.e. the two or more overlapping (peptide) segments (k-mers) are not combined in series but overlapped to recreate the sequence from which they could be segmented).
- the methods may comprise analysing the sequencing output by:
- the amino acid sequence data Prior to the comparing in step (iii) the amino acid sequence data may be partitioned into a plurality of segments of a defined length.
- the amino acid sequence data in step (iii) above may comprise amino acid sequence data obtained by sequencing RNA from a sample from a subject with a disease (for example cancer) and determining one or more amino acid sequences corresponding to an RNA sequence.
- the amino acid sequence data may comprise amino acid sequence data obtained by sequencing RNA from a sample from a subject without a disease (for example cancer) and determining one or more amino acid sequences corresponding to an RNA sequence.
- the one or more segments may be identified as a (disease) biomarker based on its presence or absence in the amino acid sequence data, for example where a particular segment is uniquely present or at a higher level in amino acid sequence data from subjects with a particular disease the segment may be identified as a disease biomarker.
- the presence of a particular segment in the amino acid sequence data may indicate that the subject has the disease, for example where a particular segment is uniquely present in amino acid sequence data from subjects with a particular disease.
- the absence of a particular segment in the amino acid sequence data may indicate that the subject has the disease, for example where a particular segment is uniquely absent in amino acid sequence data from subjects with a particular disease.
- the amino acid sequence data may be obtained or derived from a database, for example a public database such as GTex, TCGA and/or CPTAC.
- the invention provides a method for analysing a nucleic acid sample, the method comprising:
- RNA sample optionally comprising synthesizing cDNA using the RNA as a template
- the methods may further comprise comparing one or more of the segments to amino acid sequence data to determine if the segment is present or absent in the amino acid sequence data.
- the amino acid sequence data may comprise sequences from a plurality of segments of a defined length.
- the amino acid sequence data may be obtained from a subject with or without a disease or condition.
- the amino acid sequence data may be obtained from a subject with or without cancer.
- the subject may have cancer.
- the methods may be carried out using a sample from a subject with cancer and using a sample from a subject without cancer.
- the methods may further comprise comparing one or more of the segments obtained by carrying out the methods using a sample from a subject with cancer to one or more of the segments obtained by carrying out the methods using a sample from a subject without cancer.
- a segment that is present in the subject with cancer but not in the subject without cancer or is present at a higher level in the subject with cancer than the subject without cancer may be identified as a biomarker or a target for therapy.
- the segment may be comprised within a longer sequence that is identified as a biomarker or a target for therapy.
- the therapy may be a vaccine, an antibody-drug conjugate and/or a radiopharmaceutical.
- Two or more segments (k-mers) may be combined to form a longer amino sequence which may be identified as a biomarker or a target for therapy.
- the target for therapy may be a neoantigen target, a target for an antibody drug conjugate, a target for a neoantigen therapy, a target for a vaccine and/or a target for a radiopharmaceutical.
- two or more overlapping segments (k-mers) (identified by a comparison as described above) may be combined to form a longer amino sequence including the overlapping and non-overlapping amino acids (i.e. the two or more overlapping segments (k-mers) are not combined in series but overlapped to re-create the sequence from which they could be segmented).
- the segments may be between 1 and 50, 2 and 40, 3 and 35, 4 and 30, 5 and 28, or 6 and 25 amino acids in length.
- the segments may be 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24 or 25 amino acids long.
- the segments (k-mers) may be overlapping, for example such that each amino acid of the one or more amino acid sequences is the start of a segment (k-mer) (insofar as the length of the one of more amino acid sequences and the length of the segments allows).
- the plurality of segments may be 2 or more, 3 or more, 4 or more, 5 or more, 10 or more, 20 or more, 50 or more, 100 or more or 1000 or more segments.
- the plurality of segments may be 2 or more segments.
- RNA samples may be obtained from biological samples of any suitable form including any material, biological fluid, tissue, or cell obtained or otherwise derived from a subject.
- the sample may include cancer cells or genetic material (DNA or RNA) derived from the cancer cells, to include cell-free genetic material (e.g. found in the peripheral blood).
- the sample may comprise a biopsy sample (e.g. a formalin-fixed paraffin-embedded biopsy sample).
- the sample may comprise a fresh/frozen (FF) sample.
- the sample may comprise tumour (cancer) tissue, optionally breast tumour (cancer) tissue.
- the sample may comprise tumour (cancer) cells, optionally breast tumour (cancer) cells.
- the tissue sample may be obtained by any suitable technique. Examples include a biopsy procedure, optionally a fine needle aspirate biopsy procedure.
- Body fluid samples may also be utilised.
- Suitable sample types include blood (including whole blood, leukocytes, peripheral blood mononuclear cells, buffy coat, plasma, and serum), sputum, tears, mucus, nasal washes, nasal aspirate, breath, urine, semen, saliva, meningeal fluid, amniotic fluid, glandular fluid, lymph fluid, nipple aspirate, bronchial aspirate, synovial fluid, joint aspirate, ascites, cells, a cellular extract, and cerebrospinal fluid. This also includes experimentally separated fractions of all of the preceding.
- a blood sample can be fractionated into serum or into fractions containing particular types of blood cells, such as red blood cells or white blood cells (leukocytes).
- a sample can be a combination of samples from a subject, such as a combination of a tissue and fluid sample.
- sample also includes materials containing homogenized solid material, such as from a stool sample, a tissue sample, or a tissue biopsy, for example.
- sample also includes materials derived from a tissue culture or a cell culture, including tissue resection and biopsy samples.
- Example methods for obtaining a sample include, e.g., phlebotomy, swab (e.g., buccal swab).
- Samples can also be collected, e.g., by micro dissection (e.g., laser capture micro dissection (LCM) or laser micro dissection (LMD)), bladder wash, smear (e.g., a PAP smear), or ductal lavage.
- micro dissection e.g., laser capture micro dissection (LCM) or laser micro dissection (LMD)
- LMD laser micro dissection
- bladder wash e.g., a PAP smear
- smear e.g., a PAP smear
- ductal lavage e.g., ductal lavage.
- a "sample” obtained or derived from a subject includes any such sample that has been processed in any suitable manner after being obtained from the subject.
- the methods of the invention as defined herein may begin with an obtained sample and thus do not necessarily incorporate the step of obtaining the sample from the patient.
- the RNA/cDNA sample may be obtained from a tumour (e.g. a solid biopsy) or from biological fluid or a fluid or lysate generated from a biological material.
- the RNA/cDNA sample may be obtained from blood.
- the RNA/cDNA sample may be obtained by extracting RNA from a biological sample (e.g. blood) obtained from the subject. cDNA is then synthesized using the RNA as a template (i.e. by reverse transcription).
- the methods may further comprise:
- RNA from biological fluid (e.g. blood) or a fluid or lysate generated from a biological material from the subject;
- the methods may further comprise reporting to the subject the outcome of the method.
- the result may be a diagnosis or prognosis for the disease.
- the result may be a specific grade or stage of a disease, such as a cancer.
- sequence may refer to all of the individual nucleic acid (e.g. cDNA or RNA) molecules having a 100% identical nucleotide sequence.
- sequence may refer to all of the individual nucleic acid (e.g. cDNA or RNA) molecules having more than 99%, 98%, 97%, 96%, 95%, 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55% or 50% identity to one another.
- % identity between a query nucleic acid sequence and a subject nucleic acid sequence may be calculated using a suitable algorithm (e.g. BLASTN, FASTA, Needleman-Wunsch, Smith-Waterman, LALIGN, or GenePAST/KERR) or software (e.g. DNASTAR Lasergene, GenomeQuest, EMBOSS needle or EMBOSS infoalign), over the entire length of the query sequence after a pair-wise global sequence alignment has been performed using a suitable algorithm (e.g. Needleman-Wunsch or GenePAST/KERR) or software (e.g. DNASTAR Lasergene or GenePAST/KERR).
- a suitable algorithm e.g. BLASTN, FASTA, Needleman-Wunsch, Smith-Waterman, LALIGN, or GenePAST/KERR
- software e.g. DNASTAR Lasergene, GenomeQuest, EMBOSS needle or EMBOSS infoalign
- unique sequence or “unique cDNA sequence” or “unique RNA sequence” may refer to all of the individual nucleic acid (e.g. cDNA or RNA as appropriate) molecules which meet or exceed a threshold % identity (e.g. 100%, 99%, 98%, 97%, 96%, 95%, 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55% or 50% identity to one another).
- a threshold % identity e.g. 100%, 99%, 98%, 97%, 96%, 95%, 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55% or 50% identity to one another).
- the “unique sequence” or “unique cDNA sequence” or “unique RNA sequence” may differ from the other sequences present in the sample (for example, by at least 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50 or 100 nucleotides or by at least 1%, 2%, 3%, 4%, 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45% or 50% of their sequence).
- the disease may not be an infectious disease.
- the disease may be cancer.
- the cancer may be an epithelial cancer.
- the cancer may be breast, ovarian and/or colorectal cancer.
- the cancer is breast cancer.
- the methods may be used to diagnose more than one disease in a single process, for example through detection of multiple RNA or cDNA molecules (derived from transcripts) that are each uniquely present in subjects with a particular disease.
- the methods may be used to diagnose more than one autoimmune disease.
- the methods may be used to diagnose more than one cancer type.
- the methods may be used to distinguish between breast cancer and a benign breast condition.
- Sequencing may comprise the use of long read sequencing. By using long read sequencing, it is possible to detect full length RNA/cDNA which will provide better information for identifying the tissue source of each RNA and the specific function. Sequencing may comprise long read sequencing. Long read sequencing may be singlemolecule long read sequencing (e.g. PacBio® HiFi or Oxford Nanopore Technologies nanopore sequencing). Long read sequencing may be single-molecule nanopore sequencing. Long read sequencing may comprise tagmentation (e.g. Ilumina Complete Long Read sequencing technology). Long read sequencing may produce reads of more than 1 kb, more than 5kb, more than 10kp or more than 20kb.
- Long read sequencing may be singlemolecule long read sequencing (e.g. PacBio® HiFi or Oxford Nanopore Technologies nanopore sequencing).
- Long read sequencing may be single-molecule nanopore sequencing.
- Long read sequencing may comprise tagmentation (e.g. Ilumina Complete Long Read sequencing technology). Long read sequencing may produce reads of more than 1 kb, more than 5kb, more than 10kp or more than 20kb
- the RNA may be full-length RNA.
- the processed RNA or cDNA that is sequenced may be full length.
- full length is meant that the sequence of the whole length (or at least 99%, 98%, 95%, 90% or 80% of the length) of the RNA/cDNA molecule may be obtained i.e. RNA or cDNA molecules are not fragmented before sequencing. Entire spliced isoforms may be directly observed.
- the methods defined herein may not comprise a step of (actively) fragmenting RNA and/or cDNA prior to sequencing.
- the RNA sample may comprise full- length RNA.
- Sequencing may comprise the use of long-read, full-length RNA sequencing. This allows for direct observation of entire spliced isoforms.
- Blood samples may be placed in blood tubes designed to preserve RNA integrity.
- the present inventors have developed a method for processing a blood sample when RNA extraction is not carried out on the day of blood collection. Specific steps for blood sample freezing, storage and thawing improve the condition of samples, particularly if they are to be subjected to long read sequencing.
- the invention provides a method for processing a blood sample comprising:
- the blood sample may be a liquid (i.e. non-dried) blood sample.
- the blood sample may be whole blood.
- the blood sample may be in a sample tube.
- the blood sample may not be absorbed into a material such as a sponge.
- the blood sample may be stored at between -15°C and -80°C, -15°C and -70°C, -15°C and -60°C, -15°C and -50°C, -15°C and -40°C, -15°C and -30°C or -15°C and -20°C.
- the blood sample may be stored at -15°C or below within 12 hours, 8 hours, 5 hours, 2 hours, 1 hour, 30 minutes, 15 minutes, 5 minutes or 1 minute of collection.
- the blood sample is stored at -15°C or below within 12 hours of collection (i.e. taking the blood sample from the subject).
- the blood sample may be stored at -20°C or below.
- the blood sample may be stored at - 20°C or below within 12 hours, 8 hours, 5 hours, 2 hours, 1 hour, 30 minutes, 15 minutes, 5 minutes or 1 minute of collection.
- the blood sample is stored at -20°C or below within 12 hours of collection (i.e. taking the blood sample from the subject).
- the blood sample may be stored at -15°C or below (optionally -20°C or below) for at least 24 hours (and optionally for no more than 72 hours, 1 week, 2 weeks, 4 weeks, 1 month or 2 months) before storing at -70°C or below (optionally -80°C or below) for no more than 4, 5, 6, 7, 8, 9, 10, 11 , or 12 months or 2, 3, 4 or 5 years.
- storage at -70°C or below (optionally -80°C or below) is for no more than 5 years.
- thawing of the blood sample takes place on the same day RNA is to be extracted.
- the blood sample may be thawed at 16 to 29°C, 17 to 28°C, 18 to 27°C, 18 to 26°C or 18 to 25°C.
- the blood sample is thawed at 18 to 25°C.
- the duration of the thawing step may be 1 to 5 hours, 2 to 5 hours, 1 to 4 hours, 2 to 4 hours, 1 to 3 hours, 2 to 3 hours or 1 to 2 hours.
- the blood sample may be thawed at 18 to 25°C for 1 to 3 hours. More preferably, the blood sample is thawed at 18 to 25°C for 3 hours.
- the sample tube may be inverted at least 5, 6, 7, 8, 9 or 10 times. Preferably, the sample tube is inverted 10 times.
- the blood sample may then be incubated at 18 to 25°C for around 2 hours prior to RNA extraction.
- RNA may be extracted from the (thawed) blood sample using the Qiagen Paxgene Blood RNA Kit.
- the blood prior to providing a (test) RNA sample (i.e. prior to step (a) or (i)), the blood may be received in a container (optionally a Paxgene Blood RNA Tube) at room temperature (5 to 30°C, preferably 18 to 25°C).
- the container may be inverted at least 5, 6, 7, 8, 9 or 10 times immediately after blood collection.
- the container is inverted 10 times immediately after blood collection.
- the blood sample may be stored at -15°C or below (or - 20°C or below) immediately after being inverted.
- the invention provides a method for processing a blood sample comprising:
- the blood sample may be no more than 5 ml, 3 ml, 2.5 ml or 2 ml. Preferably, the blood sample is no more than 2.5 ml.
- the blood sample may be between 0.1 ml and 5 ml, 0.5 ml and 5 ml, 1 ml and 4 ml, or 2 ml and 3 ml.
- the methods for processing a blood sample may be combined with the methods outlined herein that employ an RNA or cDNA sample.
- the RNA or cDNA sample may be obtained from blood by following the steps of the methods for processing a blood sample described herein.
- the methods may comprise a step of synthesizing cDNA using the extracted RNA as a template (i.e. converting the extracted RNA into cDNA, reverse transcription).
- the first RNA/cDNA sample and the second RNA/cDNA sample may be obtained from blood by following the steps of the methods for processing a blood sample described herein.
- the methods may comprise a step of synthesizing cDNA using the extracted RNA as a template (i.e. converting the extracted RNA into cDNA, reverse transcription).
- the invention provides a method for discovering a (autoimmune) disease biomarker comprising:
- RNA samples (iv) extracting RNA from the thawed first and second blood samples to form a first RNA sample from the subject with the disease and a second RNA sample from the subject without the disease; (v) processing (normalizing) the first and the second RNA samples;
- the invention provides a method for diagnosing a (autoimmune) disease in a subject comprising:
- the cDNA sample may comprise no more than 800ng, 700ng, 500ng, 100ng, 20ng, 10ng, 5ng or 1 ng of starting cDNA.
- the cDNA sample may comprise 1-800 ng, 1-500ng, 5-100ng, or 10-50ng of starting cDNA.
- RNA from a sample may be firstly reverse transcribed to cDNA.
- Sample types include blood samples (in particular from plasma, and also serum), other bodily fluids such as saliva, urine or lymph fluid. Other sample types include solid tissues, including frozen tissue or formalin fixed, paraffin embedded (FFPE) material.
- the RNA may be messenger RNA (mRNA), microRNA (miRNA) etc.
- the RNA may be reverse transcribed using a reverse transcriptase enzyme to form a complementary DNA (cDNA) molecule.
- Methods for reverse transcribing RNA to cDNA using a reverse transcriptase are well-known in the art. Any suitable reverse transcriptase can be used, examples of suitable reverse transcriptases being widely available in the art.
- the initial cDNA molecule may be single stranded until DNA polymerase has been used to generate the complementary strand.
- kits such as NEBNext ® Single Cell/Low Input cDNA Synthesis & Amplification Module
- Primers based on the 5’ and 3’ adapters can be used to add phosphate groups to the cDNA.
- a cDNA purification step (for example with ProNex or Ampure beads) may be carried out prior to use of the cDNA as a cDNA sample or prior to sequencing.
- the RNA sample may comprise no more than 3.5pg, 3pg, 2pg, 1 pg, 500ng, 100 ng, 10ng or 1 ng of starting RNA.
- the RNA sample may comprise 1 ng- 3pg, 10ng-2pg, or 100ng-1 pg of starting RNA.
- Processing the RNA sample may comprise normalization (reducing the variability in the levels of different RNA or cDNA sequences in the sample).
- the processed RNA or cDNA sample may be a normalized RNA or cDNA sample.
- Processing the RNA sample may comprise equalizing the sample.
- the relative abundance of all the unique RNA or cDNA sequences may be more equal.
- the levels of the unique sequences in the processed RNA or cDNA sample may vary by less than 50%, less than 40%, less than 30%, less than 20%, or less than 10%.
- Processing an RNA or cDNA sample may reduce the variability in the levels of the RNA or cDNA (e.g. by at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90%). Processing RNA or cDNA may achieve a more uniform distribution of cDNA sequences. The difference in abundance between the most abundant RNA/cDNA and the least abundant RNA/cDNA in the sample may be reduced (e.g. by at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90%).
- Processing the RNA or cDNA sample may reduce the number of molecules (copy number) of the (1 , 10, 100, 1000, or 10000) most abundant RNA or cDNA molecule(s) by at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90%.
- the number of molecules (copy number) of the most abundant RNA or cDNA molecule in the (first and/or second) RNA or cDNA sample may be reduced by at least 50% in the processed RNA or cDNA.
- the relative abundance of the (1 , 10, 100, 1000, or 10000) least abundant RNA or cDNA molecule(s) may be increased by at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90%.
- the number of molecules (copy number) of the least abundant RNA or cDNA molecule in the (first and/or second) cDNA sample may be increased by at least 50% in the processed RNA or cDNA.
- the processed RNA or cDNA sample may be more readily analysable. It may be more efficiently sequenced because the relative representation or levels of less abundant sequences is increased.
- Normalizing an RNA or cDNA sample results in production of a normalized RNA or cDNA sample. Normalizing may comprise (selectively) increasing the relative abundance of less abundant sequences without targeting specific sequences based on their nucleotide sequence (i.e. identity or homology to a known sequence).
- a normalized RNA or cDNA sample may be one in which the amount of each unique RNA or cDNA sequence is more uniform than in the same sample prior to normalization i.e. a normalized RNA or cDNA sample is closer to achieving each unique RNA or cDNA sequence having the same abundance (relative to other unique RNA or cDNA sequences within the normalized RNA or cDNA sample) than the same sample prior to normalization.
- a normalized RNA or cDNA sample is closer to achieving each unique RNA or cDNA sequence having the same abundance (relative to other unique RNA or cDNA sequences within the normalized RNA or cDNA sample) than the same sample prior to normalization.
- the relative representation or levels of less abundant sequences may be increased and/or the relative representation or levels of more abundant sequences may be decreased.
- the increase in less abundant sequences/decrease in more abundant sequences is selective in the sense that if all sequences were increased/decreased to the same degree the relative abundance would stay the same.
- the relative representation or levels of less abundant sequences may be increased and/or the relative representation or levels of more abundant sequences may be decreased without targeting (for example, using pre-defined probes) specific sequences based on their nucleotide composition (i.e. based on their identity or homology to a known sequence).
- the less abundant sequences may be the unique sequences with an amount that is below a threshold, for example they are present in the RNA or cDNA sample prior to normalization in an amount that is below the mean amount for a unique sequence in the sample.
- the less abundant sequences may be present in the RNA or cDNA sample prior to normalization at an amount that is 0.1%, 1%, 10%, 20%, 30%, 40%, 50%, 60%, 70, 80% or 90% below the mean amount for a unique sequence in the sample.
- the more abundant sequences may be the unique sequences with an amount that is above a threshold, for example they are present in the RNA or cDNA sample prior to normalization in an amount that is above the mean amount for a unique sequence in the sample.
- the more abundant sequences may be present in the RNA or cDNA sample prior to normalization at an amount that is 0.1%, 1%, 10%, 20%, 30%, 40%, 50%, 60%, 70, 80% or 90% above the mean amount for a unique sequence in the sample.
- relative abundance is meant abundance relative to other unique sequences in the sample.
- a normalized RNA or cDNA sample may comprise RNA or cDNA sequences having substantially the same levels. For example, wherein the levels of the sequences of the normalized RNA or cDNA vary by less than 50%, less than 40%, less than 30%, less than 20%, or less than 10%.
- the normalized RNA or cDNA may be a normalized RNA or cDNA sample in which at least a portion of the 10, 100, 1000, or 10000 most abundant (unique) sequences in the RNA or cDNA sample have been removed or reduced (by at least 1%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80% or 90%) in copy number.
- the normalized RNA or cDNA may be a normalized RNA or cDNA sample in which levels of at least a portion of the 10, 100, 1000, or 10000 least abundant (unique) sequences in the RNA or cDNA sample have been increased e.g. by at least 1%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80% or 90% in copy number.
- the methods for normalizing RNA or cDNA sample(s) described herein may be methods for equalizing cDNA sample(s) i.e. equalizing the relative abundances of each unique sequence.
- Normalizing the RNA or cDNA sample(s) may increase the amount (copy number) of at least a portion of the low abundance RNA or cDNA sequences within the (first and/or the second) cDNA sample(s).
- the low abundance RNA or cDNA sequences may be the 50%, 40%, 30%, 20%, 10% or 1% of (unique) sequences with the lowest copy number.
- normalizing may comprise selectively increasing the amount of low abundance RNA or cDNA within each RNA or cDNA sample.
- Normalized RNA or cDNA may be RNA or cDNA that is more readily analysable. It may be more efficiently sequenced because the relative representation of less abundant sequences is increased. Normalizing the RNA or cDNA sample(s) may not comprise removing abundant (more abundant) RNA or cDNA molecules/sequences (such as those corresponding to Albumin, IgG, Apolipoprotein A-l, Transferrin, Apolipoprotein A-ll, ai- Proteinase inhibitor, ai-Acid glycoprotein, Transthyretin, Hepatoglobin and/or Hemopexin) from the sample(s) (for example, using duplex-specific nuclease or sequence targeted methods).
- abundant RNA or cDNA molecules/sequences such as those corresponding to Albumin, IgG, Apolipoprotein A-l, Transferrin, Apolipoprotein A-ll, ai- Proteinase inhibitor, ai-Acid glycoprotein, Transthyretin
- Normalizing may not comprise targeting specific (unique) sequences (such as those corresponding to Albumin, IgG, Apolipoprotein A-l, Transferrin, Apolipoprotein A-ll, C -Proteinase inhibitor, c -Acid glycoprotein, Transthyretin, Hepatoglobin and/or Hemopexin).
- Normalizing a RNA or cDNA sample may be non-targeted i.e. it may not involve targeting specific sequences based on their nucleotide sequence (for example, it may not involve targeting a particular sequence based on its identity or homology to a known sequence).
- Processing the RNA sample(s) may improve detection of low-abundance RNA or cDNA, optionally wherein processing the RNA sample(s) comprises increasing the amount of low abundance RNA or cDNA within each sample.
- Methods that may be used for processing an RNA or cDNA sample include depletion methods (such as CRISPR-based depletion methods), methods that comprise inhibiting reverse transcription of abundant RNA sequences (e.g. inhibition of cDNA synthesis using oligo blockers) and normalization (e.g. cDNA or RNA normalization) methods.
- depletion methods such as CRISPR-based depletion methods
- methods that comprise inhibiting reverse transcription of abundant RNA sequences e.g. inhibition of cDNA synthesis using oligo blockers
- normalization e.g. cDNA or RNA normalization
- Depletion methods comprise removing unwanted RNA or cDNA molecules/sequences. These may be contaminating RNA or cDNA molecules/sequences (for example bacterial transcripts, optionally bacterial ribosomal RNA) or abundant (more abundant) RNA or cDNA molecules/sequences (such as those corresponding to ribosomal, mitochondrial, globin and housekeeping genes, optionally those corresponding to Albumin, IgG, Apolipoprotein A-l, Transferrin, Apolipoprotein A-ll, ai-Proteinase inhibitor, cu-Acid glycoprotein, Transthyretin, Hepatoglobin and/or Hemopexin) from the sample(s).
- RNA or cDNA molecules/sequences for example bacterial transcripts, optionally bacterial ribosomal RNA
- abundant (more abundant) RNA or cDNA molecules/sequences such as those corresponding to ribosomal, mitochondrial,
- CRISPR-Cas9 may be used to degrade abundant sequences.
- CRISPR-Cas9 complexes are formed with a pool of designed guide RNAs, and the complexes are mixed with a cDNA sample. After the unwanted abundant sequences are cut, they cannot be substrates for PCR amplification and subsequent sequencing.
- Example products include CRISPcIeanTM Stranded Total RNA Prep with rRNA Depletion from Jumpcode Genomics.
- Methods that comprise inhibition of reverse transcription may use high-affinity RNA-binding oligonucleotides to block reverse transcription and/or PCR amplification of specific RNA transcripts (see, for example, Everaert C et al., Biological Procedures Online 25, Article number: 7 (2023) https://doi.org/10.1186/s12575-023-00193-3, which is hereby incorporated by reference).
- An LNA-modified oligonucleotide complementary to an unwanted RNA can be designed, which can block reverse transcription and/or PCR amplification when bound downstream of the priming site.
- cDNA normalization (Alex S. Shcheglov, Pavel A. Zhulidov, Ekaterina A. Bogdanova, D. A. S. Normalization of cDNA Libraries, Nucleic Acids Hybrid. CHAPTER 5, (2014)) addresses issues with high abundance house-keeping genes reducing sampling efficiency for genes of interest. Since RNA sequencing typically relies on the conversion of RNA to double stranded cDNA, cDNA normalization takes advantage of the biochemical properties of cDNA to generate a uniform distribution of unique genes and isoforms within a cDNA library. In theory, the maximum non-targeted sampling efficiency is produced if all unique RNA sequences are represented at the same relative abundance. Thus, the objective of normalization is to re-distribute a cDNA library (sample) to meet this criterion as closely as possible.
- Complementary DNA (cDNA) normalization may be full length cDNA normalization.
- Complementary DNA (cDNA) normalization may be performed by the Duplex Specific Nuclease (DSN) method (see e.g. Zhulidov, P. A. et al. Simple cDNA normalization using kamchatka crab duplex-specific nuclease. Nucleic Acids Res. 32, e37 (2004)) or the hydroxyapatite column method (see e.g. Andrews-Pfannkoch, C., Fadrosh, D. W., Thorpe, J. & Williamson, S. J. Hydroxyapatite-mediated separation of double-stranded DNA, singlestranded DNA, and RNA genomes from natural viral assemblages.
- DSN Duplex Specific Nuclease
- hydroxyapatite column method see e.g. Andrews-Pfannkoch, C., Fadrosh, D. W., Thorpe, J. & Williamson, S
- processing the RNA sample may comprise synthesizing double stranded cDNA using the RNA as a template and then denaturing and re-hybridizing the cDNA strands.
- the difference between the DSN method and the hydroxyapatite column method lies in their approach for isolating the single stranded cDNA library from the re-hybridized double stranded cDNA molecules.
- an enzyme which specifically cleaves double stranded DNA is used to decompose all double stranded cDNA within the solution.
- the solution is then purified and size-selected for cDNA sequences above a certain length. These sequences are then amplified using the Polymerase Chain Reaction (PCR).
- the denatured and re-hybridized cDNA library is passed through a heated column filled with hydroxyapatite granules.
- the hydroxyapatite preferentially binds to larger DNA molecules.
- the size of DNA that is bound is controlled by the concentration of phosphate buffer in which the cDNA library is dissolved.
- concentration of phosphate buffer must be tuned specifically for cDNA molecules within a certain range of sequence length.
- the cDNA is eluted through the column using increasing concentrations of phosphate buffer to extract increasing sizes of DNA molecules.
- the single stranded cDNA will be roughly one half the size of the re-hybridized cDNA, elution of the single stranded fraction can be managed if the mean cDNA sequence length is known. The resulting elution is intended to be enriched for the single stranded cDNA which are then amplified using PCR.
- the invention provides methods and devices for preparing processed nucleic acid samples with a more uniform distribution of sequences, including methods for RNA and cDNA normalization.
- a first nucleic acid sample is used to produce a probe set based on the intrinsic sequence abundances in the sample. Abundant sequences will produce more probes.
- a second nucleic acid sample is applied to the probes more of the abundant sequences will bind to the probes enabling these sequences to be separated from the sample. In this manner the present invention enables normalization of full-length RNA, as well as cDNA.
- the method for processing nucleic acid may comprise:
- a nucleic acid sample e.g. an RNA sample
- an oligonucleotide array e.g. a DNA, optionally a cDNA array
- Processing the RNA or cDNA sample may comprise: (i) contacting the RNA or cDNA sample with a DNA array, comprising two or more DNA molecules, wherein one or more RNA or cDNA molecules from the RNA or cDNA sample anneal to the DNA molecules of the DNA array; and
- the array may be produced by a method comprising:
- the array may comprise two or more oligonucleotides with sequences comprising oligo-dT followed by a cDNA sequence.
- the DNA array may be produced by a method comprising:
- oligonucleotide array comprises two or more oligonucleotides linked to a surface, and wherein two or more RNA molecules from the RNA sample anneal to the oligonucleotides of the oligonucleotide array;
- the preparatory RNA sample and the test RNA sample may be derived from the same subject, optionally wherein the preparatory RNA sample and the test RNA sample are derived from the same blood sample.
- the array may be produced by a method comprising:
- the array may comprise two or more oligonucleotides with sequences comprising oligo-dT followed by a DNA sequence.
- the method for processing nucleic acid may comprise:
- oligonucleotide array comprises two or more oligonucleotides linked to a surface, and wherein two or more nucleic acid molecules from the first nucleic acid sample anneal to the oligonucleotides of the oligonucleotide array;
- the nucleic acid is not limiting according to the invention. Any suitable nucleic acid molecule may processed using the devices, kits and methods of the invention.
- the nucleic acid may be double stranded or single stranded, optionally when the nucleic acid molecules are double-stranded, the double stranded nucleic acid molecules are first denatured to produce single stranded nucleic acid molecules.
- the nucleic acid may be DNA.
- the DNA may be genomic DNA, mitochondrial DNA, cDNA etc. cDNA is preferred.
- the DNA may be purified from any suitable sample. Sample types include blood samples (in particular from plasma, and also serum), other bodily fluids such as saliva, urine or lymph fluid. Other sample types include solid tissues, including frozen tissue or formalin fixed, paraffin embedded (FFPE) material.
- the DNA molecule may be a double-stranded DNA (dsDNA) molecule.
- the DNA molecule may be a singlestranded DNA (ssDNA) molecule. ssDNA may already have been denatured in situ in the original sample. For example, the ssDNA may be purified from FFPE material.
- the nucleic acid sample may comprise both ssDNA and dsDNA molecules.
- the DNA may include both ssDNA and dsDNA.
- the DNA may be found in, or derived from cells in a sample. Alternatively the DNA may be circulating, or “cell-free”, DNA (cfDNA).
- cfDNA DNA
- Such DNA can be obtained from a range of bodily fluids including blood samples (in particular from plasma, and also serum), other bodily fluids such as saliva, urine or lymph fluid.
- the nucleic acid may also be RNA.
- RNA may be obtained from the same sample types as DNA, as discussed above.
- the RNA may be messenger RNA (mRNA), microRNA (miRNA), transfer RNA (tRNA), ribosomal RNA (rRNA), long non-coding RNA (IncRNA), small interfering RNA (siRNA), small nucleolar RNA (snoRNA), piwi-interacting RNA (piRNA), tRNA-derived small RNA (tsRNA), small rDNA-derived RNA (srRNA), viral RNA etc.
- mRNA messenger RNA
- miRNA microRNA
- tRNA transfer RNA
- rRNA ribosomal RNA
- IncRNA long non-coding RNA
- siRNA small interfering RNA
- snoRNA small nucleolar RNA
- piRNA piwi-interacting RNA
- srRNA small rDNA-derived RNA
- Processing the RNA sample may comprise:
- oligonucleotide array comprises two or more oligonucleotides linked to a surface, and wherein two or more RNA molecules from the first RNA sample anneal to the oligonucleotides of the oligonucleotide array;
- the first RNA sample may also be referred to as a preparatory RNA sample.
- the second RNA sample may also be referred to as a test RNA sample.
- the RNA sample to be processed may be split to form the first/preparatory RNA sample and second/test RNA sample.
- the invention provides a method for determining a set of RNA sequences associated with a disease or condition, the method comprising:
- processing the test RNA or cDNA sample comprises:
- oligonucleotide array comprises two or more oligonucleotides linked to a surface, and wherein two or more RNA molecules from the preparatory RNA sample anneal to the oligonucleotides of the oligonucleotide array; (ii) extending two or more of the oligonucleotides by reverse transcription using the annealed RNA molecules as templates to generate a DNA array comprising two or more DNA (cDNA) molecules;
- the invention provides a method for producing a database of RNA sequences associated with a disease or condition, the method comprising:
- processing each test RNA or cDNA sample comprises:
- oligonucleotide array comprises two or more oligonucleotides linked to a surface, and wherein two or more RNA molecules from the preparatory RNA sample anneal to the oligonucleotides of the oligonucleotide array;
- the invention provides a method for diagnosing a disease or condition, wherein the method comprises:
- processing the test RNA or cDNA sample comprises:
- oligonucleotide array comprises two or more oligonucleotides linked to a surface, and wherein two or more RNA molecules from the preparatory RNA sample anneal to the oligonucleotides of the oligonucleotide array;
- the invention provides a method for producing an RNA vaccine for a subject with a disease, the method comprising:
- test RNA sample extracted from a blood sample obtained from the subject
- processing the test RNA sample optionally comprising synthesizing cDNA using the RNA as a template
- processing the test RNA or cDNA sample comprises:
- oligonucleotide array comprises two or more oligonucleotides linked to a surface, and wherein two or more RNA molecules from the preparatory RNA sample anneal to the oligonucleotides of the oligonucleotide array;
- the invention provides a method for discovering a disease biomarker, the method comprising:
- processing the test RNA or cDNA sample comprises: (i) contacting the test RNA or cDNA sample with a DNA array, comprising two or more DNA molecules, wherein one or more RNA or cDNA molecules from the test RNA or cDNA sample anneal to the DNA molecules of the DNA array; and
- oligonucleotide array comprises two or more oligonucleotides linked to a surface, and wherein two or more RNA molecules from the preparatory RNA sample anneal to the oligonucleotides of the oligonucleotide array;
- the invention provides a method for diagnosing cancer in a subject, the method comprising:
- processing the test RNA or cDNA sample comprises:
- RNA or cDNA molecules extracting the unannealed RNA or cDNA molecules thereby generating processed RNA or cDNA; and wherein the DNA array was produced by a method comprising: (i) contacting a preparatory RNA sample with an oligonucleotide array, wherein the oligonucleotide array comprises two or more oligonucleotides linked to a surface, and wherein two or more RNA molecules from the preparatory RNA sample anneal to the oligonucleotides of the oligonucleotide array;
- the preparatory RNA sample may be extracted from a blood sample obtained from a subject, optionally the same subject as the test RNA sample.
- the preparatory RNA sample and test RNA sample may be derived from the same blood sample. Where there is more than one test RNA sample, each test RNA sample may have a corresponding preparatory RNA sample derived from the same blood sample.
- the methods defined herein may further comprise steps of providing a blood sample obtained from a subject and extracting a preparatory RNA sample and a test RNA sample from the blood sample.
- the invention provides use of a method for processing a test RNA or cDNA sample in a method for determining soil microfauna composition, wherein the method for processing an RNA or cDNA sample comprises:
- oligonucleotide array comprises two or more oligonucleotides linked to a surface, and wherein two or more RNA molecules from the preparatory RNA sample anneal to the oligonucleotides of the oligonucleotide array; (ii) extending two or more of the oligonucleotides by reverse transcription using the annealed RNA molecules as templates to generate a DNA array comprising two or more DNA (cDNA) molecules;
- test RNA sample and the preparatory RNA sample may be derived from soil sample(s), optionally the test RNA sample and the preparatory RNA sample are derived from the same soil sample.
- the methods defined herein may further comprise steps of providing a soil sample and extracting a preparatory RNA sample and a test RNA sample from the soil sample.
- the method for processing cDNA may comprise:
- oligonucleotide array comprises two or more oligonucleotides linked to a surface, and wherein two or more cDNA molecules from the first cDNA sample anneal to the oligonucleotides of the oligonucleotide array;
- array is meant a collection or arrangement of oligonucleotide (DNA, optionally cDNA) molecules linked or attached to a (solid) surface.
- oligonucleotides to a surface
- amine-modified oligonucleotides covalently linked to an activated carboxylate group or succinimidyl ester for example amine-modified oligonucleotides covalently linked to an activated carboxylate group or succinimidyl ester, thiol-modified oligonucleotides covalently linked via an alkylating reagent such as an iodoacetamide or maleimide, Digoxigenin NHS Ester, cholesterol-TEG, biotin-modified oligonucleotides captured by immobilized streptavidin) and are well-known to the skilled person.
- the link may be covalent or non-covalent.
- the link may be direct or indirect.
- the DNA array may be a cDNA array.
- the method for processing RNA may reduce the variability in the levels of the RNA (e.g. by at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90%).
- the method for processing RNA may achieve a more uniform distribution of RNA sequences.
- the difference in abundance between the most abundant RNA and the least abundant RNA may be reduced (e.g. by at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90%).
- the method for processing RNA may reduce the number of molecules (copy number) of the (1 , 10, 100, 1000, or 10000) most abundant RNA molecule(s) by at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90%.
- the number of molecules (copy number) of the most abundant RNA molecule in the (second) RNA sample may be reduced by at least 50% in the processed RNA.
- the relative abundance of the (1, 10, 100, 1000, or 10000) least abundant RNA molecule(s) may be increased by at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90%.
- the method for processing RNA may be a method for normalizing RNA.
- the method for processing cDNA may reduce the variability in the levels of the cDNA (e.g. by at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90%).
- the method for processing cDNA may achieve a more uniform distribution of cDNA sequences.
- the difference in abundance between the most abundant cDNA and the least abundant cDNA may be reduced (e.g. by at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90%).
- the method for processing cDNA may reduce the number of molecules (copy number) of the (1 , 10, 100, 1000, or 10000) most abundant cDNA molecule(s) by at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90%.
- the number of molecules (copy number) of the most abundant cDNA molecule in the (second) cDNA sample may be reduced by at least 50% in the processed cDNA.
- the relative abundance of the (1, 10, 100, 1000, or 10000) least abundant cDNA molecule(s) may be increased by at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90%.
- the method for processing cDNA may be a method for normalizing cDNA.
- the method for processing nucleic acid may reduce the variability in the levels of the nucleic acid (e.g. by at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90%).
- the method for processing nucleic acid may achieve a more uniform distribution of nucleic acid sequences.
- the difference in abundance between the most abundant nucleic acid and the least abundant nucleic acid may be reduced (e.g. by at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90%).
- the method for processing nucleic acid may reduce the number of molecules (copy number) of the (1 , 10, 100, 1000, or 10000) most abundant nucleic acid molecule(s) by at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90%.
- the number of molecules (copy number) of the most abundant nucleic acid molecule in the (second) nucleic acid sample may be reduced by at least 50% in the processed nucleic acid.
- the relative abundance of the (1, 10, 100, 1000, or 10000) least abundant nucleic acid molecule(s) may be increased by at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90%.
- the method for processing nucleic acid may be a method for normalizing nucleic acid.
- the maximum non-targeted sampling efficiency is produced if all unique nucleic acid sequences are represented at the same relative abundance.
- the objective of normalization is to re-distribute a nucleic acid sample to meet this criterion as closely as possible.
- Processed RNA, DNA or nucleic acid may be RNA, DNA or nucleic acid that is more readily analysable. It may be more efficiently sequenced because the relative representation of less abundant sequences is increased.
- the processed RNA, DNA or nucleic acid may be normalized RNA, DNA or nucleic acid, respectively.
- Processed RNA may comprise RNA sequences having substantially the same levels. For example, wherein the levels of the sequences of the processed RNA vary by less than 50%, less than 40%, less than 30%, less than 20%, or less than 10%.
- the processed RNA may be a processed RNA sample in which at least a portion of the (1 , 10, 100, 1000, or 10000) most abundant sequence(s) in the second RNA sample have been removed.
- Processed cDNA may comprise cDNA sequences having substantially the same levels.
- the processed cDNA may be a processed cDNA sample in which at least a portion of the (1, 10, 100, 1000, or 10000) most abundant sequence(s) in the second cDNA sample have been removed.
- Processed nucleic acid may comprise nucleic acid sequences having substantially the same levels. For example, wherein the levels of the sequences of the processed nucleic acid vary by less than 50%, less than 40%, less than 30%, less than 20%, or less than 10%.
- the processed nucleic acid may be a processed nucleic acid sample in which at least a portion of the (1 , 10, 100, 1000, or 10000) most abundant sequence(s) in the second nucleic acid sample have been removed.
- the invention provides a method for preparing normalized RNA comprising:
- the invention provides a method for preparing normalized cDNA comprising
- Normalizing a nucleic acid sample results in production of a normalized nucleic acid sample.
- normalized is meant that the levels of RNA or cDNA sequences in the sample are more equal. To achieve this the relative representation or levels of less abundant sequences may be increased and/or the relative representation or levels of more abundant sequences may be decreased.
- Normalized RNA or cDNA may comprise RNA or cDNA sequences having substantially the same levels. For example, wherein the levels of the sequences of the normalized RNA or DNA vary by less than 50%, less than 40%, less than 30%, less than 20%, or less than 10%.
- the normalized RNA or cDNA may be a normalized RNA or cDNA sample in which at least a portion of the 10, 100, 1000, or 10000 most abundant sequences in the second RNA or cDNA sample have been removed.
- the methods for processing nucleic acid described herein may be methods for equalizing nucleic acid samples.
- the methods of the invention may be employed with both RNA and DNA.
- the use of double stranded cDNA may require a denaturation step to produce single stranded DNA molecules.
- a strand selection may also be employed as part of the processing of double stranded cDNA. Oligo-dT molecules will only bind to the cDNA strand comprising the poly (A) sequence.
- the method for processing cDNA may further comprise following the last step (step (viii)):
- the oligonucleotide(s) may be DNA molecules.
- the oligonucleotide(s) may comprise oligo- dT sequences (optionally 2 to 200, 5 to 200, 2 to 100, 5 to 50, 7 to 25 or 12 to 18 nucleotides long).
- the oligonucleotide(s) may be oligo-dT molecule(s).
- oligo-dT molecule is meant a molecule comprising a stretch of deoxythymidine.
- the oligo-dT molecule may be of any length appropriate to bind to the poly(A) tail (a sequence of adenine nucleotides) of messenger RNA or the second strand of a double stranded cDNA molecule.
- the oligo-dT molecule(s) may be 2 to 100, 5 to 50, 7 to 25 or 12 to 18 nucleotides long.
- the oligo-dT molecule(s) may be at least 2, at least 5, at least 7, at least 12, at least 18 or at least 25 nucleotides long.
- the oligonucleotide(s) may be immobilized on the surface.
- the surface may be two- dimensional such as a glass slides or three-dimensional such as micro-beads or microspheres.
- the surface may be one or more beads or spheres, optionally magnetic beads.
- the methods of the invention may also be carried out in a microfluidic flowcell.
- the RNA (the first RNA sample and/or the second RNA sample) may comprise full length RNA.
- the surface may comprise two or more oligonucleotides and the oligonucleotides may be optimally spaced so that the DNA molecules they prime do not interact with each other.
- the oligonucleotides may be optimally spaced so that the DNA (cDNA) molecules of the DNA array do not interact with each other.
- the optimal spacing for a given sample type may be determined based on the length of the DNA (cDNA) molecule expected to be produced. This is in turn determined by the (maximum) length of the RNA molecules in the first RNA sample or biological sample or cDNA molecules in the first cDNA sample or nucleic acid molecules in the (first) nucleic acid sample.
- the spacing between the oligonucleotides may be at least 1, at least 1.1, at least 1.2, at least 1.3, at least 1.4, at least 1.5, at least 1.6, at least 1.7, at least 1.8, at least 1.9, at least 2, at least 2.5, at least 3, at least 4 or at least 5 times the (maximum) length of the RNA molecules in the first RNA sample or biological sample or cDNA molecules in the first cDNA sample or nucleic acid molecules in the (first) nucleic acid sample.
- the spacing between the oligonucleotides may be between 1 and 5, between 1.3 and 3.5, between 1.4 and 3, or between 1.5 and 2.5 times the (maximum) length of the RNA molecules in the first RNA sample or biological sample or cDNA molecules in the first cDNA sample or nucleic acid molecules in the (first) nucleic acid sample.
- the spacing between the oligonucleotides may be 2 times the (maximum) length of the RNA molecules in the first RNA sample or biological sample or cDNA molecules in the first cDNA sample or nucleic acid molecules in the (first) nucleic acid sample.
- the spacing between the oligonucleotides may be at least 2 times the (maximum) length of the RNA molecules in the first RNA sample.
- the oligonucleotides may be optimally spaced if the density of oligonucleotides (of the oligonucleotide array) is between 0.01 oligonucleotides per 1 micrometer squared and 10000 oligonucleotides per 1 micrometer squared, preferably between 0.1 oligonucleotides per 1 micrometer squared and 1000 oligonucleotides per 1 micrometer squared, more preferably between 1 oligonucleotide per 1 micrometer squared and 100 oligonucleotides per micrometer squared.
- the first/preparatory RNA sample and the second/test RNA sample may be derived from the same (biological) sample.
- the first cDNA sample and the second cDNA sample may be derived from the same (biological) sample.
- the first nucleic acid sample and the second nucleic acid sample may be derived from the same (biological) sample.
- a portion may be removed to form the first RNA sample and a further portion removed to form the second RNA sample.
- a portion may be removed to form the first cDNA sample and a further portion removed to form the second cDNA sample.
- RNA sample or first cDNA sample or first nucleic acid sample
- second RNA sample or second cDNA sample or second nucleic acid sample
- the first RNA sample (or first cDNA sample or first nucleic acid sample) and the second RNA sample (second cDNA sample or second nucleic acid sample) may be derived from blood.
- the method may further comprise sequencing the processed RNA, cDNA or nucleic acid.
- the method for processing nucleic acid may be a method for preparing nucleic acid (cDNA, RNA) for sequencing.
- Sequencing may be RNA or DNA sequencing.
- RNA may bereverse transcribed to cDNA prior to sequencing.
- Sequencing may detect and/or quantify the (target) nucleic acid molecules.
- Such methods comprise processing according to the invention followed by sequencing of the processed products, optionally using a next generation sequencing (NGS) platform.
- NGS platforms include Illumina sequencing (such as Hi-Seq and Mi-Seq), SMRT sequencing (Pacific Biosciences), Nanopore sequencing, SoLID sequencing, pyrosequencing (e.g. Roche 454) and Ion-Torrent (Thermo Fisher) which are well-known to the skilled person.
- the invention is also concerned with RNA extraction.
- the invention provides a method comprising:
- oligonucleotide array comprises two or more oligonucleotides linked to a surface, wherein one or more RNA molecules from the biological sample anneal to the oligonucleotides of the oligonucleotide array;
- RNA sample (b) removing the unannealed sample from the surface; and (c) disassociating the annealed RNA molecule(s) from the oligonucleotides to obtain an RNA sample.
- the invention provides a method comprising:
- RNA sample may be reverse transcribed to cDNA.
- the oligonucleotide(s) may comprise one or more oligo-dT molecules.
- the oligonucleotide(s) may be oligo-dT molecules. Oligo-dT molecules will anneal with mRNA molecules with a poly(A) tail.
- the oligonucleotide(s) may comprise random or unique sequences to capture a range of RNAs in addition to mRNA.
- Custom oligonucleotide(s) may be designed to capture specific target RNA molecules (with complementary sequences).
- RNA molecules may be polyadenylated following extraction if they do not comprise a poly(A) tail.
- the method may further comprise disassociating the annealed RNA molecules from the cDNA molecules (or the annealed cDNA molecules from the DNA molecules).
- the disassociated molecules may be removed (optionally disposed of) leaving a surface comprising the cDNA molecules (or the DNA molecules).
- a further RNA or cDNA sample may then be processed using the surface.
- the method for processing RNA may further comprise, following step (vi) disassociating the annealed RNA molecules from the cDNA molecules and removing the disassociated RNA molecules from the surface and, optionally, repeating steps (v) and (vi) with a further RNA sample.
- the method for processing cDNA may further comprise, following step (viii) disassociating the annealed cDNA molecules from the DNA molecules and removing the disassociated cDNA molecules from the surface and, optionally, repeating steps (vi), (vii) and (viii) with a further cDNA sample.
- the oligonucleotide(s) may be at least 5 nucleotides, at least 10 nucleotides, at least 100 nucleotides, at least 200 nucleotides or at least 500 nucleotides in length.
- the oligonucleotide(s) may consist of 5 to 200 nucleotides.
- the oligonucleotide array or surface may comprise at least 10, at least 100, at least 1000, at least 10000, at least 100000 or at least 1 million oligonucleotides.
- the oligonucleotide array or surface may comprise at least 1.1, at least 1.2, at least, 1.3, at least 1.4, at least 1.5, at least, 1.6, at least 1.7, at least 1.8, at least 1.9, at least 2, at least 3, at least 4, at least 5, at least 10, at least 100, or at least 1000 times as many oligonucleotides as there are RNA molecules in the first and/or second RNA sample, cDNA molecules in the first and/or second cDNA sample or nucleic acid molecules in the nucleic acid sample.
- the oligonucleotide array or surface may comprise at least 10, at least 100, at least 1000, at least 10000, at least 100000 or at least 1 million oligonucleotides with unique sequences (i.e. no two sequences are identical).
- the oligonucleotide(s) may comprise sequences complementary to the 10, 20, 50, 100, 1000 or 10000 most abundant RNAs (mRNAs) in a given sample, optionally the 10, 20, 50, 100, 1000 or 10000 most abundant RNAs (mRNAs) in human blood.
- the oligonucleotide(s) may comprise one or more sequences complementary to the mRNA coding for human serum albumin, one or more alpha globulins (for example haptoglobin), one or more beta globulins (for example plasminogen) and/or one or more gamma globulins.
- the amount of RNA molecules in the (first and/or second) RNA sample or cDNA molecules in the (first and/or second) cDNA sample or nucleic acid molecules in the (first and/or second) nucleic acid sample may not not exceed the number of oligonucleotides in the oligonucleotide array and/or DNA molecules in the DNA array.
- the amount of RNA molecules in the second RNA sample may not exceed the number of cDNA molecules in the DNA array.
- the (biological) sample may comprise a biological fluid or a fluid or lysate generated from a biological material.
- the biological fluid may comprise blood. Blood may be processed on the same day as collection, no more than 72 hours after collection, no more than 2 weeks after collection, no more than 4 weeks after collection or 4-12 months after collection. Blood may be stored at -80°C prior to processing. Plasma, and also serum, samples are envisaged.
- the sample may be a human sample. Sample types include other biological fluids such as saliva, urine or lymph fluid. Other sample types include solid tissues, including frozen tissue or formalin fixed, paraffin embedded (FFPE) material. These samples may be processed to lyse cells.
- FFPE paraffin embedded
- the RNA may be messenger RNA (mRNA), transfer RNA (tRNA), ribosomal RNA (rRNA), long non-coding RNA (IncRNA), small interfering RNA (siRNA), small nucleolar RNA (snoRNA), piwi-interacting RNA (piRNA), tRNA-derived small RNA (tsRNA), small rDNA- derived RNA (srRNA), microRNA (miRNA), or viral RNA etc.
- mRNA messenger RNA
- tRNA transfer RNA
- rRNA ribosomal RNA
- IncRNA long non-coding RNA
- siRNA small interfering RNA
- snoRNA small nucleolar RNA
- piRNA piwi-interacting RNA
- tsRNA tRNA-derived small RNA
- srRNA small rDNA- derived RNA
- miRNA microRNA
- the invention provides a system or device for performing a method as described herein.
- the RNA sample may be processed using an RNA processing device for producing processed RNA from a biological sample (e.g. blood), the device comprising:
- a first module to receive the biological sample wherein the first module comprises:
- the oligonucleotides may comprise oligo-dT sequences (optionally 2 to 200, 5 to 200, 2 to 100, 5 to 50, 7 to 25 or 12 to 18 nucleotides long).
- the oligonucleotides of the first module and/or the oligonucleotides of the second module may be oligo-dT molecules.
- oligo-dT molecule is meant a molecule comprising a stretch of deoxythymidine.
- the oligo-dT molecule may be of any length appropriate to bind to the poly(A) tail (a sequence of adenine nucleotides) of messenger RNA or the second strand of a double stranded cDNA molecule.
- the oligo-dT molecule(s) may be 2 to 100, 5 to 50, 7 to 25 or 12 to 18 nucleotides long.
- the oligonucleotides may comprise random or unique sequences to capture a range of RNAs in addition to mRNA. Custom oligonucleotides may be designed to capture specific target RNA molecules (with complementary sequences).
- the oligonucleotides (oligo-dT molecules) of the first module may be linked to a first surface and the oligonucleotides (oligo-dT molecules) of the second module may be linked to a second surface.
- the first module may further comprise a sample inlet through which the biological sample is capable of entering the first module.
- the first module may further comprise a first reagent inlet through which reagents are capable of entering the first module and/or the second module may further comprise a second reagent inlet through which reagents are capable of entering the second module.
- the RNA processing device may further comprise temperature control means for adjusting the temperature of the first module and/or the second module.
- the first module may comprise a flow cell and/or the second module may comprise a flow cell.
- the oligonucleotides may be optimally spaced. Optimal spacing is discussed above.
- the spacing between the oligonucleotides of the first module and/or the second module may be at least 2 times the (maximum) length of the RNA molecules in the biological sample.
- the oligonucleotides may be optimally spaced if the density of oligonucleotides (linked to the first and/or second surface) is between 0.01 oligonucleotides per 1 micrometer squared and 10000 oligonucleotides per 1 micrometer squared, preferably between 0.1 oligonucleotides per 1 micrometer squared and 1000 oligonucleotides per 1 micrometer squared, more preferably between 1 oligonucleotide per 1 micrometer squared and 100 oligonucleotides per 1 micrometer squared.
- the RNA processing device may further comprise a third module to receive the processed RNA, wherein the third module may comprise reagents for preparing the processed RNA for sequencing.
- the RNA processing device may further comprise a fourth module to receive the RNA prepared for sequencing, wherein the fourth module comprises sequencing reagents.
- the devices disclosed herein may further comprise means for sequencing the processed RNA or cDNA, for example a sequencing machine or sequencer.
- the devices disclosed herein may further comprise means for uploading the sequencing output to a cloud server.
- the invention provides use of an RNA processing device as described herein in a method of normalizing RNA.
- the method for processing nucleic acid may be a method for removing nucleic acid from a sample.
- the method for processing RNA may be a method for removing (abundant) RNA from the second RNA sample.
- the method for processing cDNA may be a method for removing (abundant) cDNA from the second cDNA sample.
- the target nucleic acid can bind to the one or more oligonucleotides. In this manner the target nucleic acid may be removed from a sample.
- the target nucleic acid may also be subjected to further processing such as sequencing.
- the method for processing nucleic acid may comprise contacting a nucleic acid sample with a surface, wherein the surface comprises one or more oligonucleotides complementary to a target nucleic acid wherein the one or more oligonucleotides is at least 100 nucleotides in length and wherein the target nucleic acid anneals to the one or more oligonucleotides.
- the oligonucleotide(s) may be at least 200 nucleotides in length, optionally at least 500 nucleotides in length.
- the surface may comprise two or more oligonucleotides.
- the oligonucleotide(s) may be linked to the surface.
- the oligonucleotide(s) complementary to a target nucleic acid may be complementary to the full length (or at least 70%, at least 80%, at least 90% of the full length) of the target nucleic acid.
- the methods may also comprise use of an RNA processing device for producing processed RNA from a biological sample, the device comprising: (i) a first module to receive the biological sample, wherein the first module comprises:
- a target RNA outlet through which the target RNA can be obtained following disassociation from the one or more oligonucleotide molecules wherein the first module and the second module together define a flow path along which a sample is capable of flowing.
- the one or more oligonucleotides complementary to the target RNA in the sample may be at least 100 nucleotides in length, preferably at least 200 nucleotides in length, more preferably at least 500 nucleotides in length.
- the target nucleic acid may be from an RNA virus.
- the target nucleic acid may be (transcribed from) a bacterial gene such as an antibiotic resistance gene.
- the target nucleic acid may be a biomarker for a disease.
- the magnetic beads for use in the claimed methods may also be provided in the form of a kit.
- the methods may comprise use of a kit for processing an RNA sample, the kit comprising:
- a reverse transcriptase (c) a reverse transcriptase. Any suitable reverse transcriptase may be included in the kit. Suitable buffers are also well known and commercially available.
- the invention provides use of a kit as described herein in a method of normalizing RNA.
- the invention provides a kit for processing a DNA sample, the kit comprising:
- DNA polymerases examples include thermostable polymerases such as Taq or Pfu polymerase and the various derivatives of those enzymes. Suitable buffers are also well known and commercially available.
- the invention provides use of a kit as described herein in a method of normalizing cDNA.
- the methods may comprise use of a kit for detection of a target nucleic acid in a sample, the kit comprising:
- kits of the invention may further comprise one or more, up to all, of dinucleotide triphosphates (dNTPs), MgCh and a buffer.
- dNTPs dinucleotide triphosphates
- the oligonucleotide(s) may comprise sequences complementary to the 10, 20, 50, 100, 1000 or 10000 most abundant RNAs (mRNAs) in a given sample, optionally the 10, 20, 50, 100, 1000 or 10000 most abundant RNAs (mRNAs) in human blood.
- the oligonucleotide(s) may comprise one or more sequences complementary to the mRNA coding for human serum albumin, one or more alpha globulins (for example haptoglobin), one or more beta globulins (for example plasminogen) and/or one or more gamma globulins.
- Methods of RNA extraction and processing may be combined and incorporated into pipelines for analysing biological samples.
- the invention provides a method of analysing a biological sample from a subject, the method comprising:
- the RNA may be full length RNA.
- the biological sample may comprise a biological fluid or a fluid or lysate generated from a biological material.
- the biological sample may be a liquid biopsy.
- the biological sample may be a blood sample, optionally a human blood sample.
- Preparing a processed RNA sample may comprise RNA normalization (reducing the variability in the levels of different RNA sequences in the sample).
- the processed RNA sample may be a normalized RNA sample.
- normalized is meant that the levels of RNA sequences in the sample are more equal. To achieve this the relative representation or levels of less abundant sequences may be increased and/or the relative representation or levels of more abundant sequences may be decreased.
- a normalized RNA sample may comprise RNA sequences having substantially the same levels. For example, wherein the levels of the sequences of the normalized RNA sample vary by less than 50%, less than 40%, less than 30%, less than 20%, or less than 10%.
- the normalized RNA may be a normalized RNA sample in which at least a portion of the 10, 100, 1000, or 10000 most abundant sequences in the sample have been removed.
- Preparing a processed RNA sample may comprise equalizing the RNA sample.
- the relative abundance of all the unique RNA sequences may be more equal.
- the levels of the unique sequences in the processed RNA sample may vary by less than 50%, less than 40%, less than 30%, less than 20%, or less than 10%.
- Preparing a processed RNA sample may reduce the variability in the levels of the RNA (e.g. by at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90%). Preparing a processed RNA sample may achieve a more uniform distribution of RNA sequences. In the processed RNA sample the difference in abundance between the most abundant RNA and the least abundant RNA may be reduced (e.g. by at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90%).
- Preparing a processed RNA sample may reduce the number of molecules (copy number) of the (1 , 10, 100, 1000, or 10000) most abundant RNA molecule(s) by at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90%.
- the number of molecules (copy number) of the most abundant RNA molecule in the RNA sample may be reduced by at least 50% in the processed RNA.
- the relative abundance of the (1, 10, 100, 1000, or 10000) least abundant RNA molecule(s) may be increased by at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90% in the processed RNA.
- the processed RNA sample may be more readily analysable. It may be more efficiently sequenced because the relative representation of less abundant sequences is increased.
- the method may comprise diagnosing a disease in the subject.
- the invention provides a method for diagnosing a disease in a subject, the method comprising:
- diagnosing is meant determining that a subject has the disease at the time of testing.
- the methods may comprise predicting a disease or identifying an increased risk of developing a disease.
- the invention provides a method for predicting a disease or identifying an increased risk of developing a disease in a subject, the method comprising:
- the increased risk may be a risk higher than the average risk for the population.
- the increased risk may be a risk above a pre-calculated threshold level.
- the threshold level may be the point above which the benefits of increased monitoring and/or prophylactic treatment outweigh the negatives of potentially unnecessary intervention.
- the increased risk may be a percentage lifetime risk of greater than 1.5%, greater than 2%, greater than 5%, greater than 10%, greater than 50% or greater than 75%.
- the methods may comprise selecting a treatment for a subject having a disease, predicting the responsiveness of a subject with a disease to a therapeutic agent and/or determining the clinical prognosis of a subject with a disease.
- Sequencing the processed RNA allows the presence or absence and/or level of one or more RNA molecules to be determined.
- the presence or absence of one or more RNA molecules in the processed RNA sample may be used to identify whether the subject has the disease.
- the level of one or more RNA molecules in the processed RNA sample may be used to identify whether the subject has the disease.
- a comparison with a reference point or value may be used to diagnose, or predict a clinical condition or outcome. While as few as one specific RNA molecule (an RNA molecule with a specific sequence) may be used to diagnose or predict a clinical prognosis or response to a therapeutic agent, the specificity and sensitivity or diagnosis or prediction accuracy may increase using more RNA molecules (of other specific sequences).
- the RNA extracted from the sample may comprise cell-free RNA.
- Extracting RNA from the biological sample may comprise:
- oligonucleotide array comprises two or more oligonucleotides linked to a surface and wherein one or more RNA molecules from the sample anneals to the oligonucleotides of the oligonucleotide array;
- the oligonucleotides may comprise one or more oligo-dT sequences.
- the oligonucleotides may be oligo-dT molecules.
- Extracting RNA from the biological sample produces extracted RNA.
- Preparing a processed RNA sample may comprise following the steps of the method(s) for processing RNA defined above.
- Preparing a processed RNA sample may comprise taking a portion of the extracted RNA formed by extracting RNA from the biological sample to be a first RNA sample and a portion of the extracted RNA to be a second RNA sample and following the steps of the method(s) for processing RNA defined above.
- Preparing a processed RNA sample may comprise taking a portion of the extracted RNA to be a first RNA sample and a portion of the extracted RNA to be a second RNA sample and:
- oligonucleotide array comprises two or more oligonucleotides linked to a surface, and wherein two or more RNA molecules from the first RNA sample anneal to the oligonucleotides of the oligonucleotide array;
- Preparing a processed RNA sample may comprise:
- oligonucleotide array e.g. a DNA, optionally a cDNA array
- oligonucleotide array e.g. a DNA, optionally a cDNA array
- the method of analysing a biological sample from a subject may comprise use of one or more of the RNA processing device(s) and kit(s) of the present invention.
- sequencing the processed RNA comprises long-read sequencing.
- the invention provides a peptide or protein comprising, consisting of, or consisting essentially of one or more amino acid sequences selected from SEQ ID NO: 1 to 15, subsequences, portions, homologues, variants and derivatives thereof.
- the invention provides a polynucleotide that encodes a peptide or protein comprising, consisting of, or consisting essentially of one or more amino acid sequences selected from SEQ ID NO: 1 to 15, subsequences, portions, homologues, variants and derivatives thereof.
- the one or more amino acid sequences may be selected from SEQ ID NO: 3, 6 and 15.
- the polynucleotide may be an RNA or DNA molecule.
- the subsequences, portions, homologues, variants or derivatives may have about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or higher identity with the relevant sequence i.e. 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or higher identity with one or more amino acid sequences selected from SEQ ID NO: 1 to 15.
- a "percentage of sequence identity” may be determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide or polypeptide sequence in the comparison window may comprise additions or deletions (j.e., gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences.
- the percentage may be calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the result by 100 to yield the percentage of sequence identity.
- the invention provides a method for diagnosing and/or prognosing cancer in a subject comprising measuring the level of PMS2, APC or at least one peptide thereof in a sample from the subject wherein the level of the protein or peptide is used to provide a diagnosis of and/or a prognosis for the cancer.
- the invention provides a method for diagnosing and/or prognosing cancer in a subject comprising measuring the level of one or more amino acid sequences selected from SEQ ID NO: 1 to 15 or a polynucleotide that encodes a peptide or protein comprising, consisting of, or consisting essentially of one or more amino acid sequences selected from SEQ ID NO: 1 to 15 in a sample from the subject wherein the level of the amino acid sequence or polynucleotide is used to provide a diagnosis of and/or a prognosis for the cancer.
- the polynucleotide may be an RNA or DNA molecule.
- the invention provides use of a method, device or kit as described herein in a method for determining soil microfauna composition.
- the invention provides use of a method for processing a (test) RNA or cDNA sample in a method for determining soil microfauna composition, wherein the method for processing an RNA or cDNA sample comprises:
- RNA or cDNA sample (i) contacting the RNA or cDNA sample with a DNA array, comprising two or more DNA molecules, wherein one or more RNA or cDNA molecules from the RNA or cDNA sample anneal to the DNA molecules of the DNA array; and
- Soil microfauna composition may be determined through the identification of microorganisms and/or viruses present in a soil sample. This may be achieved through the identification of sequences in the sequencing output.
- the invention provides use of a method, device or kit as described herein in ecological research.
- the invention provides use of a method for processing a (test) RNA or cDNA sample in ecological research, wherein the method for processing an RNA or cDNA sample comprises:
- RNA or cDNA sample (i) contacting the RNA or cDNA sample with a DNA array, comprising two or more DNA molecules, wherein one or more RNA or cDNA molecules from the RNA or cDNA sample anneal to the DNA molecules of the DNA array; and
- Ecological research may comprise sequencing processed RNA or cDNA from one or more species.
- Ecological research may comprise obtaining information regarding the transcriptome of one or more species.
- Ecological research may lead to improved understanding of the overall biology of one or more species and/or how one or more species impact their local ecology.
- the invention provides use of a method, device or kit as described herein in a method for assessing water quality.
- the invention provides use of a method for processing a (test) RNA or cDNA sample in a method for assessing water quality, wherein the method for processing an RNA or cDNA sample comprises:
- RNA or cDNA sample (i) contacting the RNA or cDNA sample with a DNA array, comprising two or more DNA molecules, wherein one or more RNA or cDNA molecules from the RNA or cDNA sample anneal to the DNA molecules of the DNA array; and
- Water quality may be assessed through the identification/detection of microorganisms (for example bacteria and/or fungi) and/or viruses. Water quality may be assessed by extracting RNA from a water sample from a water source. This may be achieved through the identification of sequences in the sequencing output.
- the invention provides use of a method, device or kit as described herein in a method for screening for dangerous biological material.
- the invention provides use of a method for processing a (test) RNA or cDNA sample in a method for screening for dangerous biological material, wherein the method for processing an RNA or cDNA sample comprises:
- RNA or cDNA sample (i) contacting the RNA or cDNA sample with a DNA array, comprising two or more DNA molecules, wherein one or more RNA or cDNA molecules from the RNA or cDNA sample anneal to the DNA molecules of the DNA array; and
- the dangerous biological material may comprise a fungus, bacterium and/or a virus, optionally a pathogenic fungus, bacterium or virus, and/or a fungal, plant or animal derived toxin or drug that may still contain traces of nucleic acid (for example RNA).
- the screening may take place at a travel gateway.
- the invention provides use of a method, device or kit as described herein in a method for confirming the identity of a subject.
- the invention provides use of a method for processing a (test) RNA or cDNA sample in a method for confirming the identity of a subject, wherein the method for processing an RNA or cDNA sample comprises:
- RNA or cDNA sample (i) contacting the RNA or cDNA sample with a DNA array, comprising two or more DNA molecules, wherein one or more RNA or cDNA molecules from the RNA or cDNA sample anneal to the DNA molecules of the DNA array; and
- Confirming the identify of a subject may be achieved through the identification of sequences in the sequencing output. Confirming the identity of a subject may comprise DNA fingerprinting or include steps from a DNA fingerprinting method. Confirming the identify of a subject may be achieved through analysis of repetitive sequences that are highly variable, for example variable number tandem repeats, optionally short tandem repeats. The method may take place at a travel gateway.
- the invention provides use of a method, device or kit as described herein in a process of RNA or DNA sequencing, optionally for discovery of new RNA and/or detection of low abundance RNA, further optionally wherein the sequencing is single cell sequencing.
- the invention provides use of a method, device or kit as described herein in a process of metagenomic sequencing for discovery of new microbes and/or detection of low abundance microbes.
- the invention provides use of a method, device or kit as described herein in a process of screening DNA or RNA samples, or screening genetic samples for the presence of infectious diseases.
- the invention provides use of a method, device or kit as described herein in a process of detecting a nucleic acid biomarker, optionally a disease biomarker, further optionally a cancer biomarker.
- the method may further comprise reporting the result.
- the result may be in the form of an RNA or DNA sequence, an indication of the presence or absence of a microbe or disease and/or an indication of the presence or absence or level of a disease biomarker.
- a method for determining a set of RNA sequences associated with a disease or condition comprising:
- a method for diagnosing a disease or condition comprising:
- RNA vaccine for a subject with a disease, the method comprising:
- processing the RNA or cDNA sample comprises: (i) contacting the RNA or cDNA sample with a DNA array, comprising two or more DNA molecules, wherein one or more RNA or cDNA molecules from the RNA or cDNA sample anneal to the DNA molecules of the DNA array; and
- RNA transcript encodes a protein isoform present in the subject with the disease.
- RNA vaccine comprises producing an RNA molecule comprising at least a portion of the RNA transcript sequence, wherein the RNA molecule comprises an open reading frame (ORF) encoding at least one antigenic peptide.
- ORF open reading frame
- RNA vaccine for use in therapy, wherein the RNA vaccine is produced using the method of any of clauses 4 to 6.
- a method for discovering a disease biomarker comprising:
- processing the RNA or cDNA sample comprises:
- RNA or cDNA sample (i) contacting the RNA or cDNA sample with a DNA array, comprising two or more DNA molecules, wherein one or more RNA or cDNA molecules from the RNA or cDNA sample anneal to the DNA molecules of the DNA array; and
- a method for diagnosing cancer in a subject comprising: (a) providing a test RNA sample extracted from a blood sample obtained from the subject;
- processing the RNA or cDNA sample comprises:
- RNA or cDNA sample (i) contacting the RNA or cDNA sample with a DNA array, comprising two or more DNA molecules, wherein one or more RNA or cDNA molecules from the RNA or cDNA sample anneal to the DNA molecules of the DNA array; and
- RNA or cDNA sample (i) contacting the RNA or cDNA sample with a DNA array, comprising two or more DNA molecules, wherein one or more RNA or cDNA molecules from the RNA or cDNA sample anneal to the DNA molecules of the DNA array; and
- oligonucleotide array comprises two or more oligonucleotides linked to a surface, and wherein two or more RNA molecules from the RNA sample anneal to the oligonucleotides of the oligonucleotide array; (ii) extending two or more of the oligonucleotides by reverse transcription using the annealed RNA molecules as templates to generate a DNA array comprising two or more cDNA molecules;
- control RNA sample optionally comprising synthesizing cDNA using the RNA as a template
- Figure 1 A schematic representation of a magnetic bead with oligo-dT primers attached.
- RNA molecules anneal to the oligo-dT molecules and reverse transcription creates a cDNA copy attached to the bead.
- FIG 2 A schematic representation of a magnetic bead with cDNA probes attached.
- the magnetic bead is re-introduced to another batch of RNA.
- a portion of the RNA anneals to the probes (captured RNA).
- the beads are then immobilized and the solution extracted comprising the normalized RNA.
- Figure 3 A schematic representation of a microfluidic flowcell in use for RNA processing according to the invention. RNA flows through and the temperature is cooled down to allow for poly-A annealing to the oligo-dT forest.
- Figure 4 A schematic representation of a microfluidic flowcell in use for RNA processing according to the invention. Reverse transcription materials are added and incubation for reverse transcription carried out.
- FIG. 5 A schematic representation of a microfluidic flowcell in use for RNA processing according to the invention. Heating disassociates RNA which is then flushed out.
- Figure 6 A schematic representation of a microfluidic flowcell in use for RNA processing according to the invention showing the cDNA forest ready for RNA to be normalized.
- FIG. 7 A schematic representation of a microfluidic flowcell in use for RNA processing according to the invention.
- New RNA is added and incubated between 45°C and 75°C (for example, at around 68°C) for association.
- the abundant RNA anneals to the cDNA forest while the free normalized RNA flows through.
- Figure 8 A schematic representation of a microfluidic flowcell in use for RNA processing according to the invention. Heating to between 80°C and 100°C (for example, 98°C) disassociates RNA which flows through to waste. The cycle starting from Figure 7 can then be repeated.
- 80°C and 100°C for example, 98°C
- FIG. 9 A schematic representation of RNA extraction according to the invention.
- a sample with lysed cells flows over the surface. Cooling down the temperature allows for poly-A annealing to the oligo-dT forest. The flowcell is flushed leaving only bound RNA. Heating up disassociates RNA which flows through to the next step.
- FIG. 10 A schematic representation of an RNA processing device according to the invention.
- FIG 11 A schematic representation of RNA processing according to the invention where the oligonucleotides linked to the surface are complementary to target RNA (designed probe cDNA forest).
- a sample with lysed cells flows through. Incubation between 45°C and 75°C (for example, at around 68°C) allows for full length association. The cell is flushed leaving only bound RNA. Heating up disassociates the RNA which flows though for further processing.
- Figure 12 A schematic representation of DNA processing according to the invention where the oligonucleotides linked to the surface are complementary to target DNA (designed probe cDNA forest).
- a sample with lysed cells and fragmented DNA flows through. Heating to between 80°C and 100°C (for example, 98°C) disassociates double strands. Incubation between 45°C and 75°C (for example, at around 68°C) allows for full length association. The cell is flushed leaving only bound DNA. Heating up disassociates the DNA which flows though for further processing.
- Figure 13 An example of a cancer specific transcript (novel transcript) from PMS2 with a novel exon that creates a peptide sequence that is specific to cancer.
- Figure 14 An example of a cancer specific transcript (novel transcript) from APC with a novel exon that creates a peptide sequence that is specific to cancer.
- Figure 15 An example workflow for identifying cancer-associated k-mers.
- RNA sequences from cancer and control patients are converted to the possible peptide sequences that can be translated from the RNA i.e. three possible open reading frames (ORFs).
- ORFs open reading frames
- the resulting peptide sequences are then split into k-mers.
- the k-mers from the cancer patients are compared to k-mers found in control patients to identify k-mers that are either only present in cancer patients or highly enriched in cancer patients.
- Figure 16 Overlapping k-mers.
- the sequence DQPSQHGETLSLLKI is formed of multiple overlapping k-mers.
- the sequence may be split into the overlapping k-mers.
- the overlapping k-mers may be combined to make up the sequence of a full neoantigen region.
- the invention is based on methods that take advantage of the ability to generate full length sequences from RNA extracted from blood without fragmenting RNA or cDNA products before sequencing.
- This provides a transcriptome representing any RNA that make its way into the circulatory system including RNA from typical blood cells (like red blood cells, white blood cells, other immune cells, etc.), RNA from any other cells that are typically uncommon in the circulatory system such as cancer cells or cells that somehow dislodged into the circulatory system and extracellular RNA which could have originated from any cell within the body.
- RNA from typical blood cells like red blood cells, white blood cells, other immune cells, etc.
- RNA from any other cells that are typically uncommon in the circulatory system such as cancer cells or cells that somehow dislodged into the circulatory system and extracellular RNA which could have originated from any cell within the body.
- the transcription start site, end site, and splicing are features which typically represent unique combinations used by different cell types. This information can also be used to directly identify the protein isoform that would be translated from messenger RNA.
- the ability to detect low level signals of full length RNA which can then be translated into protein isoforms means the present inventors can detect unique protein isoforms that are expressed exclusively by certain cells, for example cancer cells. These isoforms will have sections of their peptide sequence which are unique to the specific cells, for example the cancer cells ( Figures 13 and 14). These cancer specific peptide sub-sequences are also known as neo-antigens because they are typically presented on the cell surface via MHC complexes.
- Detecting isoform based neo-antigens has particular advantages. For example, it means that instead of having to design a new vaccine for each patient based on the unique mutation presenting in each patient, the present methods make it possible to identify a set of isoform based neo-antigens that are represented across a large percentage of the population. This means that RNA cancer vaccines can be based on that set instead of having to design new vaccines for each patient. With this paradigm, the development, safety testing, production, QC, and delivery of RNA cancer vaccines can be managed much more economically and with less risk of negative effects. This means that RNA cancer vaccines can be more affordable and also means that RNA cancer vaccines could be used more often since they would be safer and more cost effective.
- tumour samples there is a limited amount of material that may be difficult to obtain and only available at a particular time point.
- targets are based on the removed primary cells when the aim is to target the remaining distal cells that are likely to be genetically different. If targets are not identified that span all remaining cancer cells it creates selective pressure to allow cancer cells that do not present the targets to thrive and cause recurrence. This may even encourage the development of cancers with higher mutations rates that could be more problematic.
- RNA cancer vaccine therapy could be used early and often. It can be applied multiple times depending on the neo-antigen readouts obtained from the present methods until it is observed that there are no longer any neo-antigens presenting in the blood of the patient.
- Neo-antigen targets also represent unique structures in cancer proteins that could be exploited for new drug development. Since a large percentage of targeted cancer therapies interact with proteins, data obtained from the present methods could be used to inform usefulness of a wide range of cancer therapies, including check point inhibitors.
- the present methods can detect diseases at the earliest stages and indicate best therapies all from one testing paradigm. This allows for not only earlier treatment of diseases thus increasing success rates but also reducing time to treatment after diagnosis.
- the process could also be applied to prevent the formation of tumours or damaging cancers by screening the population and applying RNA cancer vaccines before cancer cells have the chance to grow to any meaningful size.
- the data collected using the present methods can be used to compile an extensive full length RNA human database which can be used to data mine new potential drug targets.
- RNA cancer vaccines can be developed, tested, and produced for each of the targets in the set of targets and applied individually or in combination when detected using the methods herein. This vastly reduces the cost and increases the safety which means RNA cancer vaccines could be used early and often. This means that that it is not necessary to destroy all the cancer cells in one go.
- the present methods allow for monitoring and adjusting treatment until all neoantigens in the blood disappear.
- the present methods could also guide complementary treatments like checkpoint inhibitors so that combination therapies could be applied when needed.
- the present methods could be applied to both prevent cancer formation and treat already formed cancer.
- RNA or cDNA samples are typically dominated by sequences from highly expressed genes. Normalization to achieve a more uniform distribution of sequences can increase the efficiency of sequencing for transcript discovery and/or detection. First, genes and isoforms which are specific to the condition in question are easier to detect, and second, there is less redundancy in data generated reducing data storage requirements.
- the methods provided herein may include a processing step in the form of RNA or cDNA normalization.
- Figure 1 shows schematically an array of oligonucleotides, in this case oligo-dT molecules linked to a magnetic bead.
- a first RNA sample is contacted with the magnetic bead and RNA molecules comprising a poly-A tail anneal to the oligo-dT molecules.
- the oligo-dT molecules are extended by reverse transcription using the annealed RNA molecules as templates to generate cDNA molecules linked to the bead (a DNA array).
- Abundant RNA molecules RNA sequences that occur more frequently in the sample
- the annealed RNA molecules are disassociated from the cDNA molecules and the first RNA sample removed from the magnetic bead leaving the cDNA molecules linked to the bead.
- This stage of the method to generate the cDNA molecules linked to the bead (DNA array) involves the following steps:
- a second RNA sample is then contacted with the bead comprising the linked cDNA molecules.
- RNA molecules from the second RNA sample anneal to the cDNA molecules with the complementary sequence.
- abundant RNA molecules produce more cDNA molecules in the stage shown in Figure 1, more of the abundant RNA molecules in the second RNA sample will be captured by the cDNA molecules then will be the case for the less abundant RNA molecules.
- the RNA molecules that do not anneal to the cDNA molecules therefore, have a more uniform distribution of sequences - the RNA is normalized as it is no longer dominated by a few very abundant sequences.
- the magnetic bead is then immobilized and the unannealed RNA molecules are extracted thereby generating processed RNA.
- RNA molecules in the second RNA sample should ideally not exceed the number of DNA molecules in the DNA array (cDNA forest) for each reaction cycle. If the DNA array outnumbers each pass of RNA it ensures there are enough probes to anneal to the high abundance RNA.
- This stage of the method to generate the processed RNA involves the following steps:
- the present invention makes possible the normalization of full length RNA.
- the advantages of analysing RNA directly include the fact that it is not necessary to do PCR (saves time and reagents and no PCR artefacts), lack of bias, nanopore sequencing can directly detect modifications present in RNA (modifications change the way in which RNA moves through pores).
- the oligonucleotide array may be linked to any appropriate surface and the present invention is not limited to the use of magnetic beads.
- the method may also be carried out in a microfluidic flowcell.
- Figure 3 shows schematically an array of oligonucleotides, in this case oligo-dT molecules (also termed oligo-dT forest herein), linked to the surface of a flowcell.
- oligo-dT molecules also termed oligo-dT forest herein
- a first RNA sample flows through the flowcell and RNA molecules comprising a poly-A tail anneal to the oligo- dT molecules.
- the temperature is cooled down to below 65°C (i.e. 65°C or below, optionally between 30°C and 65°C) for the oligo-dT molecules to anneal to the poly-A tails of the RNA.
- RNA sequences that occur more frequently in the sample will produce more cDNA molecules.
- RNA molecules from the second RNA sample annealed to the cDNA molecules with the complementary sequence.
- abundant RNA molecules produce more cDNA molecules in the step shown in Figure 4, more of the abundant RNA molecules in the second RNA sample will be captured by the cDNA molecules then will be the case for the less abundant RNA molecules.
- RNA molecules that do not anneal to the cDNA molecules therefore, have a more uniform distribution of sequences - the RNA is normalized as it is no longer dominated by a few very abundant sequences.
- the unannealed RNA molecules flow through thereby generating processed RNA.
- Figure 8 illustrates a further step of disassociating the annealed RNA molecules from the cDNA molecules by heating to 98°C.
- the disassociated RNA flows through to waste.
- the surface comprising the cDNA molecules can then be re-used with further RNA samples to generate more processed RNA.
- RNA extraction As illustrated in Figure 9 a similar principle can be applied to RNA extraction.
- a biological sample with lysed cells comprising RNA, DNA, proteins etc. flows over an array of oligonucleotides, in this case oligo-dT molecules (also termed oligo-dT forest herein) linked to the surface of a flowcell.
- oligo-dT molecules also termed oligo-dT forest herein
- the temperature is cooled down to below 65°C (i.e. 65°C or below, optionally between 30°C and 65°C) for the oligo-dT molecules to anneal to the poly- A tails of the RNA.
- the flowcell is flushed to leave only the annealed RNA.
- the temperature is then increased to between 80°C and 100°C (for example, 98°C) to disassociate the annealed RNA molecules from the oligonucleotides to obtain an RNA sample.
- the RNA then flows through for further processing.
- RNA extraction and RNA processing can be linked through combining microfluidic flowcells.
- One flowcell also termed module or reaction chamber herein extracts RNA which is then processed in a further flowcell (or module).
- An RNA processing device is illustrated schematically in Figure 10, which comprises two flowcells.
- the biological sample is input through a sample inlet in the first flowcell.
- the biological sample may be a sample of lysed cells comprising RNA, DNA, proteins etc.
- Reagents enter through a reagent inlet, for example buffer and/or RNA stabilising reagents.
- the surface of the first flowcell is as shown in Figure 9 i.e. an array of oligonucleotides, in this case oligo-dT molecules, linked to the surface of the flowcell.
- Both the first and second flowcells comprise temperature control means (thermocontrol) for adjusting the temperature.
- the temperature control means allow the temperature to be cooled down to below 65°C (i.e. 65°C or below, optionally between 30°C and 65°C) for the oligonucleotides to anneal to the RNA.
- the first flowcell comprises a first waste outlet to remove unannealed sample such that when the first flowcell is flushed only the annealed RNA is left.
- the temperature control means then allow the temperature to be increased to between 80°C and 100°C (for example, 98°C) to disassociate the annealed RNA molecules from the oligonucleotides to obtain an RNA sample.
- the first flow cell comprises a sample outlet though which RNA sample is capable of flowing following disassociation from the oligonucleotides.
- the first flowcell and the second flowcell together define a flow path along which the sample is capable of flowing.
- the first and second flowcells are joined by a connecter, for example a tube, that allows the RNA to flow from the first flowcell to the second flowcell for further processing.
- the RNA sample enters through an RNA sample inlet in the second flowcell.
- the second flowcell also comprises a second reagent inlet through which reagents are capable of entering the second flowcell.
- the surface of the second flowcell comprises an array of oligonucleotides, (for example oligo-dT molecules) linked to the surface of the flowcell.
- the RNA sample flows through the flowcell and RNA molecules anneal to the oligonucleotides.
- Reverse transcription reagents are added through the second reagent inlet.
- the oligonucleotides are extended by reverse transcription using the annealed RNA molecules as templates to generate cDNA molecules linked to the surface (a DNA array).
- the annealed RNA molecules are disassociated from the cDNA molecules by heating using the temperature control means.
- the temperature control means are capable of heating the RNA molecules to 98°C.
- the second flowcell comprises a waste RNA outlet to remove one or more RNA molecules. The waste RNA outlet allows the RNA sample to be flushed out leaving the cDNA molecules linked to the surface.
- a further RNA sample then enters the second flowcell, contacts the surface comprising the linked cDNA molecules and is incubated between 45°C and 75°C (for example, at around 68°C). RNA molecules from the further RNA sample anneal to the cDNA molecules with the complementary sequence.
- the second flowcell comprises a processed RNA outlet through which the unannealed RNA molecules flow through thereby generating processed (normalized) RNA.
- the surface or oligonucleotide array comprises one or more oligonucleotides with a sequence that is complementary to a target nucleic acid of interest
- the target nucleic acid can bind to the one or more oligonucleotides. In this manner the target nucleic acid may be removed from a sample.
- the target nucleic acid may also be subjected to further processing such as sequencing.
- a further flowcell may be included in the RNA processing device discussed above that comprises one or more oligonucleotides with a sequence that is complementary to a target nucleic acid of interest.
- the second flowcell may comprise one or more oligonucleotides with a sequence that is complementary to a target nucleic acid of interest.
- the target nucleic acid may also be directly extracted from a biological sample.
- Figure 11 shows a surface or array that comprises oligonucleotides complementary to a target RNA (also termed designed probe cDNA forest herein).
- the oligonucleotides are at least 100 nucleotides in length.
- a biological sample with lysed cells comprising RNA, DNA, proteins etc. flows over the array of oligonucleotides. Incubation between 45°C and 75°C (for example, at around 68°C) allows for full length association of target RNA with the oligonucleotides.
- the flowcell is flushed to leave only the annealed RNA.
- the temperature is then increased to between 80°C and 100°C (for example, 98°C) to disassociate the annealed RNA molecules from the oligonucleotides to obtain the target RNA.
- the RNA then flows through for further processing.
- the remaining RNA, DNA and protein may then be discarded or further processed.
- RNA viruses can be used to target RNA viruses, bacterial genes such as antibiotic resistance genes and RNA biomarkers for disease. Coupled with RNA sequencing this allows for precise diagnostics.
- the DNA array probe forest
- the device can be used as a quick reusable screening for viral infection if coupled with (Nanopore) sequencing or another detection method (PCR, LAMP, etc.).
- the methods, kits and devices can also be used in agritech to monitor crops and livestock for diseases. Sample processing is fast and efficient.
- Figure 12 shows a surface or array that comprises oligonucleotides complementary to a target DNA (also termed designed probe cDNA forest herein).
- the oligonucleotides are at least 100 nucleotides in length.
- a biological sample with lysed cells comprising RNA, fragmented DNA, proteins etc. flows over the array of oligonucleotides. The sample is heated to between 80°C and 100°C (for example, 98°C) to disassociate the double stranded DNA.
- RNA, DNA and protein may then be discarded or further processed.
- the methods, devices and kits described herein can be used to target DNA viruses, for bacterial identification (and identification of other microbes), to detect DNA biomarkers for disease and for rapid DNA identification as a means of validating individual identities.
- the DNA array (probe forest) can be reused.
- the device can be used as a quick reusable screening for viral infection if coupled with (Nanopore) sequencing or another detection method (PCR, LAMP, etc.).
- the methods, kits and devices can also be used in agritech to monitor crops and livestock for diseases. Sample processing is fast and efficient.
- oligonucleotide(s) complementary to a target nucleic acid may be complementary to the full length (or at least 70%, at least 80%, or at least 90% of the full length) of the target nucleic acid. This differs from typical probe based systems which only use a short oligonucleotide sequence to target nucleic acid.
- the DNA array there is an optimum distance between the DNA molecules so that they do not interact with each other. This distance is influenced by the length of the cDNA expected so that it is optimal that any two points need to be about twice the length of the longest cDNA from each other. For example, when the biological sample is (human) blood, the maximum length of RNA is around 5 kb so the maximum length of the cDNA produced therefrom will be around 5 kb. Thus, at least 10 kb (6000 nm) would be the optimal spacing between the oligonucleotides in the oligonucleotide array and/or cDNA molecules in the DNA array.
- the distance between the oligonucleotides can be smaller as the known sequences allow for designing of the oligonucleotide sequences so there is minimal interaction. Accordingly where oligonucleotides complementary to a target DNA or RNA are used, the distance between the oligonucleotides may be at least 1.1, at least 1.2, at least 1.3, at least 1.4, at least 1.5, at least 1.6, at least 1.7, at least 1.8 or at least 1.9 times the length of the oligonucleotides.
- the density of oligonucleotides in the oligonucleotide array influences the density of DNA molecules in the DNA array.
- one means of preventing the DNA molecules in the DNA array from interacting with each other is using a certain spacing (i.e. a maximum density) of oligonucleotides in the array as discussed above.
- the density of DNA molecules in the DNA array is also influenced by the concentration of RNA or cDNA molecules in the first RNA or first cDNA sample respectively. This concentration influences how many oligonucleotides in the oligonucleotide array capture an RNA molecule or cDNA molecule, as appropriate. This in turn influences how many DNA molecules are synthesised using the captured RNA or DNA as a template.
- the concentration of RNA or cDNA molecules in the first RNA or first cDNA sample may be adjusted to prevent the DNA molecules in the DNA array from interacting with each other.
- Performance is also based on the ratio between the RNA and the DNA array (cDNA forest). Thermal control and kinetic control are relevant to optimum performance.
- a micropump can be used to generate laminar flow or turbulent flow.
- RNA extraction and processing as described above may be combined and incorporated into pipelines for analysing biological samples.
- the method comprises:
- RNA stabilizing reagents are commercially available and include RNAIater® (Sigma-Aldrich) and RNAprotect (Qiagen).
- a suitable buffer contains EDTA, sodium citrate and ammonium sulfate.
- Incubation for cell lysis can be, for example, 1 minute to 3 hours.
- the sample is then added to an RNA processing device as described above. First RNA extraction (purification) takes place.
- the first reaction chamber also termed flowcell or module herein
- oligonucleotides are bound to the surface of the chamber to make an oligo-forest.
- the chamber is heated to somewhere between 30°C and 75°C (optionally between 30°C and 65°C or between 60°C and 65°C) to allow for annealing of RNA to the oligo-forest.
- the remaining fluid is flushed out to a waste channel.
- the chamber is then heated to above 75°C (for example, between 80°C and 100°C) to release the remaining annealed RNA. This is then pumped through to the second reaction chamber.
- the next step is preparing a processed RNA sample, in this case RNA normalization.
- the second reaction chamber (flowcell, module) has another oligo-forest.
- the purified RNA is cooled down in this chamber to below 65°C (i.e. 65°C or below, for example between 30°C and 65°C), optionally below 60°C, to allow for annealing to the oligo-forest.
- Reverse transcriptase and buffer is then added to create a complementary DNA strand using the oligo-forest as primers.
- the chamber is heated to between 80°C and 100°C (optionally above 90°C) to disassociate the RNA from the cDNA-forest.
- the solution is then flushed to waste channel.
- RNA is then pumped into the second chamber with the cDNA-forest.
- the chamber is heated to between 45°C and 75°C (optionally between 60°C and 75°C) to allow for full length annealing of RNA to cDNA.
- RNA is pumped into the collection chamber for further processing (to be sequenced).
- the second chamber is then heated to between 80°C and 100°C (optionally above 90°C) to release the RNA.
- the disassociated RNA is flushed to the waste channel. This process is repeated until an adequate amount of normalized RNA is produced for sequencing.
- the next stage is preparation for sequencing.
- the normalized RNA can then be pumped into additional reaction chambers (flowcells, modules) which will prepare the sequencing libraries. Depending on the sequencing technology this could involve the ligation of adapters, second strand synthesis and/or any other required modifications to allow for sequencing.
- the sequencing libraries will then be pumped into a sequencing chamber for sequencing.
- the next stage is sequencing and data processing. During sequencing the raw data can be uploaded to cloud servers for data processing and archiving.
- RNA extraction and processing in this way provides a device that can be used for immediate processing of blood or other samples minimizing issues with RNA degradation.
- RNA sequences are then converted to the possible peptide sequences that can be translated from the RNA ( Figure 15) i.e. three possible open reading frames (ORFs). These are full translations, from first to last codon, without start codon selection.
- k length of peptides
- Overlapping k-mers are shown in Figure 16.
- a database of these k-mers is created with each k-mer representing a potential target. Counts for how many times each k-mer is presented in the data are also included.
- the k- mers from the cancer patients are compared to k-mers found in controls and/or other sources of data to identify k-mers that are either only present in cancer patients or highly enriched in cancer patients.
- the resulting cancer specific k-mers represent possible neoantigen targets, targets for antibody drug conjugates, and targets for radiopharmaceuticals. Additional data from public sources is used to annotate the k-mers and their source genes to identify and rank the k- mers in terms of suitability as a cancer therapy target. For example, for neoantigen targets, k-mers are compared with the guidelines on MHC presenting signatures (as described in Shao XM, Bhattacharya R, Huang J, Sivakumar I KA, Tokheim C, Zheng L, Hirsch D, Kaminow B, Omdahl A, Bonsack M, Riemer AB, Velculescu VE, Anagnostou V, Pagel KA, Karchin R.
- a new cancer specific exon could code for a peptide sequence that is already represented in the control proteome in another gene. This phenomenon is well known with respect to gene paralogues where similar transcript sequences are found in different loci on the genome which produce very similar proteins. However, by looking at new sequences in the context of peptide k-mers it is possible to check that the specific sequence is truly not represented in the control proteome.
- RNA sample collection procedure for full-length RNA extraction cDNA is DNA synthesized from a RNA template.
- quality of a cDNA sample is related to the RNA from which it is reverse transcribed.
- Prior art processes for handling blood samples prior to RNA extraction involve overnight thawing of the frozen blood samples. The inventor has found that this leads to significant RNA degradation which negatively impacts long-read sequencing.
- the following protocol to process blood samples prior to RNA extraction minimizes degradation and optimizes RNA extraction for long-read sequencing.
- RNA is to be extracted on the same day as sample collection, store the blood sample upright at room temperature (18-25°C) for 2-3 hours, then continue with RNA extraction immediately using the PAXgene Blood RNA Kit.
- RNA extraction is not carried out on the day of blood collection, follow instructions below for sample freezing, storage, and thawing:
- the blood sample should be stored at -20°C or below immediately after collection. For long-term storage, freeze the blood sample at -20°C for 24 hours before transferring to a -70°C or -80°C freezer.
- RNA integrity is improved with a shorter (3-hour) thawing period relative to overnight thawing.
- RNA Integrity Number was calculated as in Schroeder et al. (The RIN: an RNA integrity number for assigning integrity values to RNA measurements. BMC Molecular Biology 7, 3 (2006).
Landscapes
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Organic Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Zoology (AREA)
- Engineering & Computer Science (AREA)
- Wood Science & Technology (AREA)
- Analytical Chemistry (AREA)
- Immunology (AREA)
- Genetics & Genomics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Pathology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biochemistry (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Veterinary Medicine (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Oncology (AREA)
- Medicinal Chemistry (AREA)
- Public Health (AREA)
- Animal Behavior & Ethology (AREA)
- Hospice & Palliative Care (AREA)
- General Chemical & Material Sciences (AREA)
- Pharmacology & Pharmacy (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
L'invention concerne des méthodes de détermination d'un ensemble de séquences d'ARN associées à une maladie et de production d'une base de données de séquences d'ARN associées à une maladie. L'invention concerne également des méthodes de découverte de biomarqueurs de maladie et de diagnostic d'une maladie. La présente invention concerne également des méthodes de traitement de séquences d'ARN et d'ADNc et des utilisations de ces méthodes. L'invention concerne également des méthodes de production de vaccins à ARN.
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP24155610.9A EP4596050A1 (fr) | 2024-02-02 | 2024-02-02 | Procédés de préparation d'échantillons d'arn traités et leurs utilisation dans la production de vaccins à arn |
| EP24155610.9 | 2024-02-02 | ||
| EP24207928.3 | 2024-10-21 | ||
| EP24207928 | 2024-10-21 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025163086A1 true WO2025163086A1 (fr) | 2025-08-07 |
Family
ID=94476412
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP2025/052429 Pending WO2025163086A1 (fr) | 2024-02-02 | 2025-01-30 | Méthodes de préparation d'échantillons d'arn traités et leur utilisation dans la préparation de vaccins à arn |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025163086A1 (fr) |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5891637A (en) * | 1996-10-15 | 1999-04-06 | Genentech, Inc. | Construction of full length cDNA libraries |
| WO2020132144A1 (fr) * | 2018-12-18 | 2020-06-25 | Grail, Inc. | Procédés de détection d'une maladie à l'aide d'une analyse d'arn |
| WO2021172990A1 (fr) * | 2020-02-28 | 2021-09-02 | Frame Pharmaceuticals B.V. | Néo-antigènes à cadre caché |
| WO2022229128A1 (fr) * | 2021-04-26 | 2022-11-03 | The University Court Of The University Of Edinburgh | Amplification d'adn simple brin |
| WO2022240867A1 (fr) * | 2021-05-11 | 2022-11-17 | Genomic Expression Inc. | Identification et conception de thérapies anticancéreuses basées sur le séquençage d'arn |
| WO2024028505A1 (fr) * | 2022-08-04 | 2024-02-08 | Wobble Genomics Limited | Méthodes de préparation d'échantillons d'acides nucléiques normalisés, trousses et dispositifs à utiliser dans cette méthode |
| WO2024084059A1 (fr) * | 2022-10-21 | 2024-04-25 | Wobble Genomics Limited | Procédés et produits d'identification de biomarqueurs |
-
2025
- 2025-01-30 WO PCT/EP2025/052429 patent/WO2025163086A1/fr active Pending
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5891637A (en) * | 1996-10-15 | 1999-04-06 | Genentech, Inc. | Construction of full length cDNA libraries |
| WO2020132144A1 (fr) * | 2018-12-18 | 2020-06-25 | Grail, Inc. | Procédés de détection d'une maladie à l'aide d'une analyse d'arn |
| WO2021172990A1 (fr) * | 2020-02-28 | 2021-09-02 | Frame Pharmaceuticals B.V. | Néo-antigènes à cadre caché |
| WO2022229128A1 (fr) * | 2021-04-26 | 2022-11-03 | The University Court Of The University Of Edinburgh | Amplification d'adn simple brin |
| WO2022240867A1 (fr) * | 2021-05-11 | 2022-11-17 | Genomic Expression Inc. | Identification et conception de thérapies anticancéreuses basées sur le séquençage d'arn |
| WO2024028505A1 (fr) * | 2022-08-04 | 2024-02-08 | Wobble Genomics Limited | Méthodes de préparation d'échantillons d'acides nucléiques normalisés, trousses et dispositifs à utiliser dans cette méthode |
| WO2024084059A1 (fr) * | 2022-10-21 | 2024-04-25 | Wobble Genomics Limited | Procédés et produits d'identification de biomarqueurs |
Non-Patent Citations (14)
| Title |
|---|
| ANDREWS-PFANNKOCH, C.FADROSH, D. W.THORPE, J.WILLIAMSON, S. J: "Hydroxyapatite-mediated separation of double-stranded DNA, single-stranded DNA, and RNA genomes from natural viral assemblages", APPL. ENVIRON. MICROBIOL, vol. 76, 2010, pages 5039 - 5045, XP002724958, DOI: 10.1128/AEM.00204-10 |
| BARCZAK WOJCIECH ET AL: "Long non-coding RNA-derived peptides are immunogenic and drive a potent anti-tumour response", NATURE COMMUNICATIONS, vol. 14, no. 1, 25 February 2023 (2023-02-25), UK, XP093126144, ISSN: 2041-1723, Retrieved from the Internet <URL:https://www.nature.com/articles/s41467-023-36826-0> DOI: 10.1038/s41467-023-36826-0 * |
| BARCZAK WOJCIECH ET AL: "Supplementary Materials and Figures Long non-coding RNA-derived peptides are immunogenic and drive a potent anti-tumour response", NATURE COMMUNICATIONS, 25 February 2023 (2023-02-25), XP093270452, Retrieved from the Internet <URL:https://static-content.springer.com/esm/art%3A10.1038%2Fs41467-023-36826-0/MediaObjects/41467_2023_36826_MOESM1_ESM.pdf> * |
| EVERAERT C ET AL., BIOLOGICAL PROCEDURES ONLINE 25, no. 7, 2023, Retrieved from the Internet <URL:https://doi.org/10.1186/s12575-023-00193-3> |
| JIYEON PARKYEUN-JUN CHUNG, GENOMICS AND INFORMATICS, vol. 17, no. 3, 2019, pages e23 |
| KUO, R.I.CHENG, Y.ZHANG, R ET AL., BMC GENOMICS, vol. 21, 2020, pages 751, Retrieved from the Internet <URL:https://doi.org/10.1186/s12864-020-07123-7> |
| LARSON MATTHEW H. ET AL: "A comprehensive characterization of the cell-free transcriptome reveals tissue- and subtype-specific biomarkers for cancer detection - with supplementary material", vol. 12, no. 1, 21 April 2021 (2021-04-21), UK, XP093183562, ISSN: 2041-1723, Retrieved from the Internet <URL:https://www.nature.com/articles/s41467-021-22444-1> DOI: 10.1038/s41467-021-22444-1 * |
| LUIS ROJAS ET AL., NATURE, vol. 618, no. 7963, June 2023 (2023-06-01), pages 144 - 150 |
| NA XIE ET AL., SIGNAL TRANSDUCTION AND TARGETED THERAPY, vol. 8, 2023, pages 9, Retrieved from the Internet <URL:https://doi.org/10.1038/s41392-022-01270-x> |
| SARA SOUSA ROSA ET AL., VACCINE, vol. 39, no. 16, 15 April 2021 (2021-04-15), pages 2190 - 2200 |
| SCHROEDER ET AL.: "The RIN: an RNA integrity number for assigning integrity values to RNA measurements", BMC MOLECULAR BIOLOGY, vol. 7, no. 3, 2006, Retrieved from the Internet <URL:https://doi.org/10.1186/1471-2199-7-3> |
| SHAO XMBHATTACHARYA RHUANG JSIVAKUMAR IKATOKHEIM CZHENG LHIRSCH DKAMINOW BOMDAHL ABONSACK M, CANCER IMMUNOL RES, vol. 8, no. 3, March 2020 (2020-03-01), pages 396 - 408 |
| STARK, R.GRZELAK, MHADFIELD, J: "RNA sequencing: the teenage years", NAT. REV. GENET, vol. 20, 2019, pages 631 - 656, XP036906438, DOI: 10.1038/s41576-019-0150-2 |
| ZHULIDOV, P. A ET AL.: "Simple cDNA normalization using kamchatka crab duplex-specific nuclease", NUCLEIC ACIDS RES., vol. 32, no. e37, 2004 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12448649B2 (en) | Single cell nucleic acid detection and analysis | |
| Enderle et al. | Characterization of RNA from exosomes and other extracellular vesicles isolated by a novel spin column-based method | |
| CN118291584A (zh) | 基于核酸酶的rna耗尽 | |
| JP2019162102A (ja) | 末梢血中で、がんによって変化したrnaを検出するシステムおよび方法 | |
| Del Chierico et al. | Choice of next-generation sequencing pipelines | |
| KR20250047294A (ko) | 정규화된 핵산 샘플을 제조하는 방법, 상기 방법에 사용하기 위한 키트 및 장치 | |
| AU2023362576A1 (en) | Methods and products for biomarker identification | |
| US20230032847A1 (en) | Method for performing multiple analyses on same nucleic acid sample | |
| EP4596050A1 (fr) | Procédés de préparation d'échantillons d'arn traités et leurs utilisation dans la production de vaccins à arn | |
| WO2025163086A1 (fr) | Méthodes de préparation d'échantillons d'arn traités et leur utilisation dans la préparation de vaccins à arn | |
| EP4596712A1 (fr) | Procédés d'identification et d'utilisation de biomarqueurs | |
| WO2025163087A1 (fr) | Procédés d'identification et d'utilisation de biomarqueurs | |
| Norton | Next-generation sequencing technologies and formalin-fixed paraffin-embedded tissue: Application to clinical cancer research |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 25702839 Country of ref document: EP Kind code of ref document: A1 |
|
| DPE1 | Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101) |