[go: up one dir, main page]

WO2025085495A1 - Methods and compositions for generation of sequencing libraries - Google Patents

Methods and compositions for generation of sequencing libraries Download PDF

Info

Publication number
WO2025085495A1
WO2025085495A1 PCT/US2024/051531 US2024051531W WO2025085495A1 WO 2025085495 A1 WO2025085495 A1 WO 2025085495A1 US 2024051531 W US2024051531 W US 2024051531W WO 2025085495 A1 WO2025085495 A1 WO 2025085495A1
Authority
WO
WIPO (PCT)
Prior art keywords
probes
sample
cancer
sequencing
nucleic acids
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2024/051531
Other languages
French (fr)
Inventor
Shujun Luo
Fang Liu
Binggang Xiang
Yong Huang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Predicine Inc
Original Assignee
Predicine Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Predicine Inc filed Critical Predicine Inc
Publication of WO2025085495A1 publication Critical patent/WO2025085495A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay

Definitions

  • Cancer is a leading cause of deaths worldwide. Detection of cancer in individuals may be critical for providing treatment and improving patient outcomes. Cancer may be caused by genetic aberration which may lead to unregulated growth of cells. Detection of the genetic aberrations may be important for the detection of cancer. Sequencing of nucleic acids in a sample from a patient may be used to detect genetic aberrations.
  • the sequencing libraries may be used to detect the presence of nucleic acid sequences.
  • the presence or absence of nucleic acid sequences may be used to detect the presence or absence of a disease or disorder, such as cancer, in a subject.
  • the systems and methods provided herein may allow for polynucleotides to be assayed to identify biomarkers of cancers in a subject. Detection of a type of cancer or the specific biomarkers for a given cancer may allow an effective treatment to be provided to an individual and may result in improved outcomes.
  • Non-invasive testing for cancer may allow for improved detection of cancer without needing to perform invasive procedures or sampling from a tumor.
  • non-invasive sampling and testing may allow for detection of particular biomarkers that indicate a particular cancer type (or subtype) and may be used to identify a prognosis for an individual suffering from the cancer.
  • sample preparation that generates accurate data should be used.
  • the detection of a cancer or a cancer parameter
  • the detection of a cancer may be improved and may allow for the recommendation of an effective treatment and may also allow for the prognosis to be more accurate.
  • methods that improve detection of cancer associated biomarkers are needed to improve the accuracy, sensitivity, or efficiency of cancer detection.
  • the present disclosure provides a method of preparing a sequencing library, the method comprising: (a) obtaining a biological sample derived from a subject, wherein the biological sample comprises nucleic acid molecules; (b) contacting the biological sample with (i) a first set of probes comprising a pulldown moiety and (ii) a second set of probes without a pulldown moiety, to anneal the first set of probes and the second set of probes to at least a one or more subsets of the nucleic acid molecules, thereby generating (A) a first set of annealed nucleic acids comprising probes from the first set of probes and a first subset of the nucleic acid molecules annealed thereto, and (B) a second set of annealed nucleic acids comprising probes from the second set of probes and second subset of nucleic acid molecules annealed thereto, wherein one or more probes of the first set of probes and one or more probes
  • the separating further comprises performing a pulldown reaction at least in part by contacting the pulldown moiety with one or more pulldown moiety binding agents, thereby selectively binding the first set of annealed nucleic acids.
  • the pulldown moiety binding agents comprise streptavidin, or a functional derivative thereof. In some embodiments, the pulldown moiety binding agents are attached to a support. In some embodiments, the support is a bead. In some embodiments, the bead is a magnetic bead.
  • the separating comprises removing the second set of annealed nucleic acids. In some embodiments, the removing comprises subjecting the second set of annealed nucleic acids to a wash buffer.
  • the pulldown moiety comprises biotin, or a functional derivative thereof.
  • the method further comprises removing the second set of annealed nucleic acids.
  • the method further comprises, prior to (b), amplifying the nucleic acid molecules. In some embodiments, the amplifying comprises reverse transcription. [0011] In some embodiments, the method further comprises, subsequent to (c), amplifying the first set of annealed nucleic acids, thereby generating enriched nucleic acids. In some embodiments, the amplifying comprises universal amplification or targeted amplification. In some embodiments, the amplifying comprises polymerase chain reaction (PCR). In some embodiments, the PCR comprises digital PCR, droplet PCR, or digital droplet PCR. In some embodiments, the amplifying comprises reverse transcription.
  • PCR polymerase chain reaction
  • the method further comprises subjecting the amplified nucleic acids, or derivatives thereof, to a sequencing reaction thereby generating a set of sequencing reads.
  • the sequencing reaction comprises whole exome sequencing. In some embodiments, the sequencing reaction comprises targeted sequencing.
  • the method further comprises analyzing the set of sequencing reads to determine a presence or an absence of a disease, disorder, or condition of the subject.
  • the disease, disorder, or condition comprises cancer.
  • the method further comprises detecting the presence of the cancer of the subject.
  • the method further comprises, responsive to detecting the presence of the cancer of the subject, administering a cancer therapy to the subject.
  • the cancer therapy is selected from the group consisting of a surgical tumor removal, a chemotherapy, a radiation therapy, a targeted therapy, an immunotherapy, and a combination thereof.
  • the method further comprises analyzing the set of sequencing reads to determine one or more expression levels of one or more genes.
  • the method further comprises comparing the one or more expression levels to an expression levels derived from a non-cancer control.
  • the splice variant comprises a splice variant of androgen receptor (AR).
  • the splice variant of AR is an AR-V1, AR-V7, AR-V12, or combination thereof.
  • the one or more expression levels comprise expression levels of a gene fusion or rearrangement.
  • the method further comprises, prior to (b), attaching sequencing adapters to the cell-free nucleic acids, or derivatives thereof.
  • the attaching comprises ligation.
  • the first set of probes are complementary to one or more exomes.
  • the second set of probes are complementary to a subset of one or more exomes.
  • the second set of probes are complementary to one or more genes selected from the group consisting of HBA1, HBA2, HBB, HBD, B2M, CD74, and RN7SL.
  • the second set of probes are complementary to one or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, and RPS21.
  • one or more probes of the first set of probes and one or more probes of the second set of probes comprise a same sequence.
  • a probe of the first set of probe and probe of the second set of probe comprise different sequences and target a same exome.
  • the second set of probes comprises one or more probes configured to bind to transcripts that are deemed unwanted or uninformative.
  • the nucleic acids comprises cell free RNA (cfRNA).
  • the biological sample is selected from the group consisting of: a cell-free deoxyribonucleic acid (cfDNA) sample, a cell-free ribonucleic acid (cfRNA) sample, a plasma sample, a serum sample, a buffy coat sample, a peripheral blood mononuclear cell (PBMC) sample, a red blood cell sample, a urine sample, a urine cell pellet sample, a saliva sample, tissue biopsy, pleural fluid sample, peritoneal fluid sample, amniotic fluid sample, cerebrospinal fluid sample, bile sample, lymphatic fluid sample, sweat sample, tear sample, semen sample, or any derivative thereof, and any combination thereof.
  • the biological sample comprises the plasma sample.
  • the biological sample comprise a blood sample.
  • the biological sample is obtained or derived from the subject using an ethylenediaminetetraacetic acid (EDTA) collection tube, a cell-free RNA collection tube, or a cell-free deoxyribonucleic acid (DNA) collection tube, other blood collection tube, and CTC collection tubes.
  • the sample further comprise cell-free DNA (cfDNA) molecules.
  • the method further comprises amplifying the cfDNA molecules thereby generating amplified DNA molecules.
  • the method further comprises subjecting the amplified DNA molecules, or derivatives thereof, to a DNA sequencing reaction thereby generating a set of DNA sequencing reads.
  • the DNA sequencing reaction comprises a whole genome sequencing reaction. In some embodiments, the method further comprises processing the DNA sequencing reads to determine a copy number parameter. In some embodiments, the method further comprises, based at least on the copy parameter, identifying the subject as having a copy number gain or a copy number loss at loci as compared to a reference copy number.
  • the present disclosure provides a composition comprising: (i) a first set of probes comprising a pulldown moiety, and (ii) a second set of probes without a pulldown moiety, wherein the first set of probes are complementary to one or more human exomes, and wherein the second set of probes comprises one or more probes complementary to HBA1, HBA2, HBB, HBD, B2M, CD74, and RN7SL.
  • the present disclosure provides a composition comprising: (i) a first set of probes comprising a pulldown moiety, and (ii) a second set of probes without a pulldown moiety, wherein the first set of probes are complementary to one or more human exomes, and wherein the second set of probes comprises one or more probes complementary to HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, and RPS21.
  • Another aspect of the present disclosure provides a non-transitory computer readable medium comprising machine executable code that, upon execution by one or more computer processors, implements any of the methods above or elsewhere herein.
  • Another aspect of the present disclosure provides a system comprising one or more computer processors and computer memory coupled thereto.
  • the computer memory comprises machine executable code that, upon execution by the one or more computer processors, implements any of the methods above or elsewhere herein.
  • FIG. 1 shows a schematic of an example method of the disclosure.
  • FIGs. 2A-2C shows plots of transcripts per million of samples using depletion probes.
  • FIG. 3A shows a chart of the expression levels of genes with or without depletion.
  • FIG. 3B shows a chart of the number of transcripts with or without depletion.
  • FIG. 4 shows a chart relating to the concordance of transcripts in libraries with or without depletion.
  • FIG. 5 shows a chart of the exonic rate of cfRNA whole transcriptome libraries.
  • FIG. 6A-6E show charts of the efficiency of depletion of genes targeted for depletion.
  • FIG. 7 shows a graph relating to the genes that were upregulated, down regulated, or had no significant change for cancer versus non-cancer samples.
  • FIG. 8 shows a graph of the output of the principal competent analysis of differential gene expression of cancer versus non-cancer.
  • FIG. 9 shows a chart of the androgen receptor splice variant expression levels in different cancer and non-cancer samples.
  • FIG. 10 shows a computer system 1001 that is programmed or otherwise configured to implement a method of the present disclosure.
  • the methods may be used to assay for the presence or absence of cancer in a subject.
  • the systems and methods provided herein comprises assaying polynucleotides to identify biomarkers of cancers in a subject.
  • the biomarkers may be processed in order to identify the presence or absence of cancer.
  • the method described herein may be used to generate or prepare a sequencing library.
  • the present disclosure provides a method of preparing a sequencing library, the method comprising: (a) obtaining a biological sample derived from a subject, wherein the biological sample comprises cell-free ribonucleic acids; (b) contacting the biological sample with (i) a first set of probes comprising a pulldown moiety and (ii) a second set of probes without a pulldown moiety, to anneal the first set of probes and the second set of probes to at least a portion of the cell-free ribonucleic acids, thereby generating (A) a first set of annealed nucleic acids comprising probes from the first set of probes and nucleic acids annealed thereto, and (B) a second set of annealed nucleic acids comprising probes from the second set of probes and nucleic acids annealed thereto, wherein one or more probes of the first set of probes and one or more probes of the second set of probes are
  • Nucleic acids derived from samples may be generally processed to generate a sequencing library.
  • the sequencing library may comprise nucleic acids derived from the sample as well as additional regions that are able to interface with the sequencer and allow for the nucleic acids to be sequenced.
  • the nucleic acids may be subjected to additional reactions to generate the sequencing library. For example, amplification or enrichment reactions may be performed such that specific sequences of interest are sequenced. For example, in the case of disease, such as cancer, specific genes may be indicative of a disease and specific sequencing of those specific genes may be desired.
  • whole genome e.g., Illumina or Pacific Biosciences of California
  • whole transcriptome sequencing e.g., Illumina
  • cell-free nucleic acids multiple nucleic acids derived from specific genes may be more prevalent than those derived from other genes.
  • cancer cells may shed cell-free nucleic acid into a biological fluid of a subject, however these cell-free nucleic acids may be at a low concentration or may be otherwise difficult to detect.
  • Non-cancer cells or nucleic acids that are not correlated with cancer may also be in the biological fluid. These non-cancer associated nucleic acids may outcompete or otherwise make it difficult for the cancer-associated nucleic acids to be detected.
  • sequencing adapters may be attached to the cell-free nucleic acids and then may be subjected to a sequencing reaction.
  • the sequencing adapter may be attached to significantly more non-cancer associated cell-free nucleic acids than cancer associated cell-free nucleic acids and result in more sequencing reads to non-cancer associated cell-free nucleic acids.
  • These non- cancer associated sequencing reads may be uninformative for detecting cancer, and may therefore result in a reduction of efficiency for detecting cancer in a subject. This loss of efficiency may result in increased costs resulting from additional sampling, sample preparation, and sequencing. Reducing the number of uninformative or otherwise unwanted nucleic acids in a sample may improve the efficiencies of the assays.
  • RNA cell-free RNA
  • certain transcripts may be more abundant that others.
  • hemoglobin transcripts may be highly present in blood.
  • sequencing libraries that are heavily populated by these more abundant (e.g., hemoglobin) transcripts.
  • These more abundant transcripts when sequences may take up the majority of the outputted sequencing reads, effectively reducing the number of reads that are informative of a disease (e.g. cancer) or other genetic variation of interest.
  • the methods disclosed herein may use probes to anneal to specific sequences.
  • the probes may bind to nucleic acids of a sample allowing for manipulation of the bound nucleic acids.
  • the methods may comprise contacting a sample with one or more pulldown probes.
  • the pulldown probe may allow for enrichment of sequences.
  • the pulldown probes may bind to sequences and then allow for nucleic acid with these sequences to be “pulled down” or enriched by binding the sequences to a solid substrate.
  • the unbound sequences can then be washed out of the sample, with the pulled down nucleic acids available for additional downstream processing.
  • the pulldown probes may comprise a pulldown moiety.
  • the pulldown probe may comprise a nucleic acids sequence that is able to hybridize to a nucleic acid (e.g. cfRNA) in a sample.
  • pulldown probes may comprise exome sequences.
  • the pulldown moiety may be a biotin, ligand, or chemical moiety.
  • the pulldown probes or pulldown moiety binding agents may be attached to a support.
  • the support may be a bead.
  • the support may be magnetic bead.
  • the support may comprise an array or may be part of an array.
  • the support may comprise a flow cell or may be a part of a flow cell.
  • the pulldown probes may be subjected to pulldown reactions by binding the pulldown moiety to a substrate or support.
  • the pulldown moiety may be a biotin and a streptavidin (or other avidin) may be added to the solution to bind the biotin.
  • the streptavidin may be attached to support, thereby binding the probes and the annealed nucleic acids.
  • the pulldown probes may be attached to a magnetic bead.
  • the pulldown probes may anneal to a nucleic acids and the pulldown probes may be attached to a magnetic bead.
  • the magnetic bead may be subjected to a magnetic field separated from a solution via the magnetic field.
  • the pulldown probes may comprise probes that enrich for sequences that may be uninformative for a given use case or assay.
  • hemoglobin may be a highly prevalent transcript in samples derived from blood and may be generally uninformative for detection of cancer in a subject.
  • whole transcriptome sequencing kits may contain pulldown probes for hemoglobin which may be useful for sequencing of tissue, but may result in too many transcripts of hemoglobin in a sequencing assay using blood (or a sample derived from blood).
  • probes without pulldown moieties may also be used. These probes without pulldown moieties may be probes that compete (i.e., depletion probes) with the pulldown probes.
  • depletion probes as they do not have a pulldown moiety, may not be pulled down when subjected to a pulldown moiety binding agent, thereby allowing for the pulldown probes to be separated from the depletion probes.
  • probes without pulldown moieties may be generated that bind to the unwanted transcripts. Since the unwanted transcripts are bound to the probes without pulldown moieties, the unwanted transcripts cannot also bind to the probes with pulldown moieties, thus reducing the number of unwanted transcripts that are pull downed.
  • the probes without a pulldown moiety may bind to any sequence.
  • the depletion probe may bind to a gene.
  • the depletion probe may bind to an exome.
  • the probes without a pulldown moiety may comprise a same sequence as a probe with a pulldown moiety, and may directly compete for a same target.
  • the probes without a pulldown moiety may comprise a different sequence as a probe with a pulldown moiety, but may target a same exome.
  • the depletion probes may be designed to bind to any sequence of a transcript that is deemed unwanted or uninformative.
  • a set of depletion probes may be designed to target a plurality of different genes.
  • the set of depletion probes may be designed to target a plurality of different genes that are abundant in blood samples.
  • the depletion probes may bind to sequences of one or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL.
  • the depletion probes may bind to sequences of two or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL.
  • the depletion probes may bind to sequences of three or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL.
  • the depletion probes may bind to sequences of four or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL.
  • the depletion probes may bind to sequences of five or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL.
  • the depletion probes may bind to sequences of six or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL.
  • the depletion probes may bind to sequences of seven or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL.
  • the depletion probes may bind to sequences of eight or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL.
  • the depletion probes may bind to sequences of nine or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL.
  • the depletion probes may bind to sequences of ten or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL.
  • the depletion probes may bind to sequences of eleven or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL.
  • the depletion probes may bind to sequences of twelve or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL.
  • the depletion probes may bind to sequences of thirteen or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL.
  • the depletion probes may bind to sequences of fourteen or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL.
  • the depletion probes may bind to sequences of fifteen or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL.
  • the depletion probes may bind to sequences of sixteen or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL.
  • the depletion probes may bind to sequences of seventeen or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL.
  • the depletion probes may bind to sequences of eighteen or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL.
  • the depletion probes may bind to sequences of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21, and RN7SL.
  • the methods disclosed herein may comprise conducting one or more enrichment reactions on one or more nucleic acid molecules in a sample.
  • the enrichment reactions may comprise one or more hybridization reactions.
  • the enrichment reactions may comprise contacting a sample with one or more probes or bait molecules that hybridize to a nucleic acid molecule of the biological sample.
  • the enrichment reaction may comprise differential amplification of a set of nucleic acid molecules.
  • the enrichment reaction may enrich for a plurality of genetic loci or sequences corresponding to genetic loci.
  • the enrichment reactions may comprise the use of primers or probes that may complementarity to sequences (or sequences upstream or downstream) of a sequence that is to be enriched.
  • a capture probe may comprise sequence complementarity to a set of genomic loci and allow the enrichment of the genomic loci.
  • the enrichments reactions may comprise a plurality of probes or primers.
  • a plurality of probes e.g., a plurality of pulldown probes or a plurality of depletion probes
  • the methods may comprise using two or more probe set (e.g., a set of probes with pulldown moieties and a set of probes without pulldown moieties).
  • the two or more probes set may be in the same concentrations.
  • the two or more probes set may be in different concentrations.
  • the set of probes without pulldown moieties may be at a greater concentration than the set of probes with pulldown moieties.
  • the set of probes with pulldown moieties may be at a greater concentration than the set of probes without pulldown moieties.
  • a set of probes may be at a concentration of at least 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 90, 100, or more, times greater than the concentration of another set of probes.
  • the methods disclosed herein may comprise conducting separation (e.g., purification or isolation) reactions on one or more nucleic acid molecules in a sample.
  • the separation reactions may comprise contacting a sample with one or more beads or bead sets (e.g., beads comprising pulldown probes).
  • the one or more beads may comprise a physical property (e.g., size, shape, composition) that may allow it to be separated or manipulated.
  • the one or more beads may be magnetic beads.
  • the separation reaction may comprise one or more hybridization reactions, enrichment reactions, amplification reactions, sequencing reactions, or a combination thereof.
  • the separation reaction may comprise the use of one or more separators.
  • the one or more separators may comprise a magnetic separator.
  • the separation reaction may comprise separating bead bound nucleic acid molecules from bead free nucleic acid molecules.
  • the separating reaction may comprise separating pulldown probe hybridized nucleic acid molecules from pulldown probe free nucleic acid molecules.
  • the separation reactions may comprises removing or separating a group of nucleic acid molecules from another group of nucleic acids.
  • the separating reactions may comprise performing a wash or using a wash buffer to remove a group of nucleic acids.
  • FIG. 1 shows an example schematic of using the probes without pulldown moieties and the probes with pulldown moieties.
  • FIG. 1 shows a set of “depletion probes” and “exome panel probes”.
  • the “exome panel probes” are shown to comprise a pulldown moiety indicated by the star symbol on each probe, whereas the probes shown for the “depletion probes” do not have a star symbol indicating a lack of a pulldown moiety.
  • the probes When added to the transcripts (shown as the thick lines), the probes are able to bind to transcripts. Those transcripts that bind to only depletion probes are not pulled down, whereas the transcripts that bind to the exome panel probes are able to be pulled down.
  • the pulled downed transcripts can then be subjected to the subsequent reactions (e.g., sequencing reactions, amplification reactions), wherein the transcripts that are not pulled down can be removed from the sample and discarded.
  • the depletion probes then can result in increasing the number of transcripts that are not pulled down, and unwanted transcripts can be targeted by depletion probes.
  • the biological samples may be subjected to additional reactions or conditions prior to assaying.
  • the biological sample may be subjected to conditions that are sufficient to isolate, enrich, or extract nucleic acids, such cfDNA molecules or cfRNA molecules.
  • the biological sample may comprise nucleic acids.
  • the biological sample may be a cell-free deoxyribonucleic acid (cfDNA) sample or a cell-free ribonucleic acid (cfRNA) sample.
  • the biological sample may comprise genomic DNA or germline DNA(gDNA).
  • the nucleic acid may be a DNA (e.g. double-stranded DNA, single-stranded DNA, single-stranded DNA hairpins, cDNA, genomic DNA, germline DNA, circulating tumor DNA (ctDNA), cell-free DNA (cfDNA)), an RNA (e.g.
  • the biological sample may be a derived from or contain a tissue.
  • the biological sample may be a derived from or contain a biological fluid.
  • the biological sample may be a plasma sample, a serum sample, a buffy coat sample, a peripheral blood mononuclear cell (PBMC) sample, a red blood cell sample, a urine sample, a saliva sample, or other body fluid sample.
  • PBMC peripheral blood mononuclear cell
  • the biological sample may comprise or be a pleural fluid sample, peritoneal fluid sample, amniotic fluid sample, cerebrospinal fluid sample, bile sample, lymphatic fluid sample, sweat sample, tear sample, semen sample, or any combination of biological fluid.
  • the samples may comprise RNA and DNA.
  • a sample may comprise cfDNA and cfRNA and the cfDNA and cfRNA may be analyzed by methods as described elsewhere herein.
  • the methods disclosed herein may comprise conduction extraction reactions on one or more nucleic acids in a biological sample.
  • the extraction reactions may lyse cells or disrupt nucleic acid interactions with the cell or with cellular proteins, such that the nucleic acids may be isolated, purified, enriched or subjected to other reactions.
  • the methods disclosed herein may comprise amplification or extension reactions.
  • the amplification reactions may comprise polymerase chain reaction.
  • the amplification reaction may comprise PCR-based amplifications, non-PCR-based amplifications, or a combination thereof.
  • the one or more PCR-based amplifications may comprise PCR, qPCR, nested PCR, linear amplification, or a combination thereof.
  • the PCR reaction may be a digital PCR, droplet PCR, or digital droplet PCR.
  • the one or more non-PCR-based amplifications may comprise multiple displacement amplification (MDA), transcription-mediated amplification (TMA), nucleic acid sequence-based amplification (NASBA), strand displacement amplification (SDA), real-time SDA, rolling circle amplification, circle-to-circle amplification or a combination thereof.
  • the amplification reactions may comprise an isothermal amplification.
  • the amplification reaction may comprise a untargeted amplification or a targeted amplification.
  • the amplification reaction may be a universal amplification reaction.
  • the universal amplification reaction may amplify all members of a sequencing library.
  • the amplification reaction may be a reverse transcription reaction.
  • an amplification reaction may amplify RNA (e.g., cfRNA) in a sample and generate cDNA.
  • the method disclosed herein may comprise a barcoding reaction.
  • a barcoding reaction may comprise the additional of a barcode or tag to the nucleic acid.
  • the barcode may be a molecular barcode or a sample barcode.
  • a barcode nucleic acid may comprise a barcode sequence which may be a degenerate n-mer. The sequence may be randomly generated or generated such to synthesize a specific barcode sequence.
  • the barcode nucleic acid may be added to a sample such to label the nucleic acid molecules in the sample.
  • the barcodes may be specific to a sample. For example, a plurality of barcode nucleic acids may be added to a sample in which the barcode sequence is the same.
  • those originating from a same sample may have a same barcode sequence, and may allow a nucleic acid to be identified as belonging to a particular or given sample.
  • a molecular barcode may also be used such that each molecule (or a plurality of molecules) in a same volume have a different molecular barcode.
  • This barcode may be subjected to amplification such that all amplicons derived from a molecule have the same barcode. In this way, molecules originating from a same molecule may be identified.
  • the sequence reads may be processed based on the barcode sequences. For example, the processing may reduce errors or allow a molecule to be tracked.
  • Barcode sequences may be appended or otherwise added or incorporated into a sequence by various reactions, for example an amplification, extension, or ligation reaction, and may be performed enzymatically using a nucleic acid polymerase or ligase.
  • the ligation may be an overhang or blunt end ligation and the barcodes may comprise complementarity to nucleic acids to be barcoded. This complementarity may be a sequence derived from the sample from the subject or may be constant sequence generated via a reaction performed on the nucleic acids in the sample.
  • the method disclosed herein may comprise adding an adapter to a nucleic acid.
  • the adapter may be a sequencing adapter.
  • the sequencing adapter may allow for the nucleic acids to be attached to a sequencing flow cell or another support from a sequencer.
  • This adapter may be subjected to amplification such that nucleic acids with adapters may be enriched.
  • Adapters may be appended or otherwise added or incorporated into a sequence by various reactions, for example an amplification, extension, or ligation reaction, and may be performed enzymatically using a nucleic acid polymerase or ligase.
  • the ligation may be an overhang or blunt end ligation and the barcodes may comprise complementarity to nucleic acids to be barcoded. This complementarity may be a sequence derived from the sample from the subject or may be constant sequence generated via a reaction performed on the nucleic acids in the sample.
  • the methods disclosed herein may be used in conjunction with other assays.
  • the samples may comprise multiple types of nucleic acids (e.g. RNA and DNA)
  • different assays e.g., sequencing reactions specific to DNA or RNA
  • a sample may comprise DNA (e.g., cfDNA) and RNA (cfRNA).
  • the sample may be subjected to both a sequencing assay on DNA and a sequencing assay on RNA.
  • a sample may be subjected to whole genome sequencing and whole transcriptome sequencing.
  • the sequencing assays may use libraries that may be subjected to depletion via depletion probes.
  • the sequencing library that comprise nucleic acids derived from RNA and nucleic acids derived from DNA.
  • the nucleic acids derived from RNA may be subjected to depletion via depletion probes, whereas the nucleic acids derived from DNA may not be subjected to depletion.
  • the biological sample may comprise multiple components.
  • the biological sample may be a whole blood sample.
  • the biological sample may be subjected to reactions such to separate or fractionate a biological sample.
  • a whole blood sample may be a fractionated and cell free nucleic acids may be obtained.
  • the whole blood sample may be fractionated using centrifugation such that blood cells may be separated from the plasma (which may contain cell free nucleic acid).
  • a sample may be subjected to multiple rounds of separation or fractionation.
  • the biological sample may be collected, obtained, or derived from the subject using a collection tube.
  • the collection tube may be an ethylenediaminetetraacetic acid (EDTA) collection tube, a cell-free RNA collection tube, or a cell-free deoxyribonucleic acid (DNA) collection tube and CTC collection tubes, or other blood collection tube.
  • the collection tube may be a urine collection cup.
  • the collection tube may comprise additional reagents for stabilizing the nucleic acid molecules or blood cells.
  • the collection tube may allow the nucleic acid or blood cells to be stable such to minimize degradation of the biological sample prior to assaying.
  • the additional reagents may comprise buffer salts or chelators.
  • the biological sample may be obtained or derived from a subject at a variety of times.
  • the biological sample may be obtained or derived from a subject prior to the subject receiving a therapy for cancer.
  • the biological sample may be obtained or derived from a subject during receiving a therapy for cancer.
  • the biological sample may be obtained or derived from a subject after receiving a therapy for cancer.
  • the biological sample may be collected over 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000 or time points.
  • the time points may occur over a 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 55, 60 or more hour period.
  • the time points may occur over a 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 55, 60 or more day period.
  • the time points may occur over a 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 55, 60 or more week period.
  • the time points may occur over a 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 55, 60 or more month period.
  • the time points may occur over a I, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 55, 60 or more year period.
  • the subject may be a suspected of a suffering from a cancer.
  • the cancer may be specific or originating from an organ or other area of the subject.
  • the cancer may be breast cancer, lung cancer, prostate cancer, colorectal cancer, melanoma, bladder cancer, nonHodgkin lymphoma, kidney cancer, endometrial cancer, leukemia, pancreatic cancer, thyroid cancer, and liver cancer, and any combination thereof.
  • the cancer may be a hormone sensitive prostate cancer (HSPC), castrate-resistant prostate cancer (CRPC), metastatic prostate cancer, and a combination thereof.
  • the cancer may comprise biomarkers that are specific to a particular cancer.
  • the specific biomarkers may indicate a presence of a particular cancer.
  • biomarker may indicate that a castrate-resistant prostate cancer is present.
  • the identification of the presence of a type of cancer may allow the determination of a treatment option or recommendation.
  • the subject may be asymptomatic for cancer.
  • the cancer may not exhibit any symptoms and the subject may be unaware of the presence of cancer.
  • the methods described herein may allow a cancer to be identified at an earlier stage than otherwise.
  • the identification of the presence of the cancer at an earlier stage may allow a treatment option or recommendation to be determined at an earlier stage and may allow the subject to have an improved prognosis.
  • the nucleic acids may be subjected to sequencing reactions.
  • the sequencing the reactions may be used on DNA, RNA or other nucleic acid molecules.
  • Example of a sequencing reaction that may be used include capillary sequencing, next generation sequencing, Sanger sequencing, sequencing by synthesis, single molecule nanopore sequencing, sequencing by ligation, sequencing by hybridization, sequencing by nanopore current restriction, or a combination thereof.
  • Sequencing by synthesis may comprise reversible terminator sequencing, processive single molecule sequencing, sequential nucleotide flow sequencing, or a combination thereof.
  • Sequential nucleotide flow sequencing may comprise pyrosequencing, pH-mediated sequencing, semiconductor sequencing or a combination thereof.
  • the sequencing reactions may comprise whole genome sequencing, whole exome sequencing, low-pass whole genome sequencing, targeted sequencing, methylation- aware sequencing, enzymatic methylation sequencing, bisulfite methylation sequencing.
  • the sequencing reaction may be a transcriptome sequencing, mRNA-seq, totalRNA-seq, smallRNA- seq, exosome sequencing, or combinations thereof. Combinations of sequencing reactions may be used in the methods described elsewhere herein.
  • a sample may be subjected to whole genome sequencing and whole transcriptome sequencing.
  • the samples may comprise multiple types of nucleic acids (e.g. RNA and DNA), sequencing reactions specific to DNA or RNA may be used such to obtain sequence reads relating to the nucleic acid type.
  • the sequencing of nucleic acids may generate sequencing read data.
  • the sequencing reads may be processed such to generate data of improved quality.
  • the sequencing reads may be generated with a quality score.
  • the quality score may indicate an accuracy of a sequence read or a level or signal above a nose threshold for a given base call.
  • the quality scores may be used for filtering sequencing reads. For example, sequencing reads may be removed that do not meet a particular quality score threshold.
  • the sequencing reads may be processed such to generate a consensus sequence or consensus base call.
  • a given nucleic acid (or nucleic acid fragment) may be sequenced and errors in the sequence may be generated due to reactions prior or during sequencing. For example, amplification or PCR may generate error in amplicons such that the sequences are not identical to a parent sequence.
  • error correction may include identifying sequence reads that do not corroborate with other sequences from a same sample or same original parent molecules.
  • the use of barcodes may allow the identification or a same parent or sample.
  • the sequence reads may be processed by performing single strand consensus calling or double stranded consensus call, thereby reducing or suppressing error.
  • the methods as disclosed herein may comprise determining allele frequency or other cancer related metric.
  • the methods may comprise a mutant allele frequency of a set of somatic mutation among a set of biomarkers.
  • the mutant allele frequency may be used to determine a circulating tumor DNA (ctDNA) fraction of a cancer of a subject.
  • a plasma tumor mutational burden (pTMB) of a cancer of the subject may be determined based at least in part on the set of mutant allele frequencies. Detection of microsatellite instability may also be used to determine the presence or absence of a cancer or cancer metric. Methylation states may be determined using methods described herein and may be used to identify a presence of a cancer or cancer parameter.
  • the methods as disclosed herein may comprise determining one or more expression levels of one or more genes.
  • libraries subjected to depletion may be sequenced (e.g., whole transcriptome sequencing) to generated sequencing reads.
  • the sequencing reads may be processed to determine an expression level of a gene.
  • the expression levels may be compared against a reference expression level.
  • a non-cancer control or reference sample may be analyzed and expression levels for genes of the non-cancer control may be obtained.
  • the expression levels may be compared and a sample may be identified as having an upregulation or down regulation of gene expression of one or more genes. Based at least on the expression levels, the subject may be identified as having a cancer.
  • the expression levels may be the expression levels of genetic aberrations of one or more genes.
  • the expression level may be of a fusion gene or rearrangement.
  • the expression levels may be the expression levels of one or more splice variants. Genes may be splice differently or incorrectly and the splice variants may be associated with the presence of cancer. Detection of presence or the expression level of the splice variant may indicate the presence of cancer.
  • splice variants in the androgen receptor (AR) may be indicative of cancer.
  • the splice variant may be an AR-V form such as AR-V1, AR-V7, or AR-V12.
  • sets of biomarkers are processed and data corresponding to the biomarkers are generated.
  • the sets of biomarkers may comprise quantitative measures from a set of cancer-associated genomic loci.
  • the cancer-associated genomic loci may correspond to a set of genes.
  • the sets of biomarkers may correspond to genetic aberration of a genetic locus.
  • the genetic aberration may a tumor associated alteration.
  • the genetic aberration may be copy number alterations (CNAs), copy number losses (CNLs), copy number gains, single nucleotide variants (SNVs), insertions or deletions (indels), fusion genes, and rearrangements.
  • the set of biomarkers may comprise splice variants.
  • the set of biomarkers may be identified in a variety of nucleic acid types.
  • the tumor associated alteration may be identified in cfDNA or cfRNA.
  • the tumor associated alteration may comprise changes in allelic expression, or gene expression.
  • the methods may comprise identifying the presence of a cancer or a cancer parameter.
  • the methods may comprises determining a probability or a likelihood of the presence of cancer or a cancer parameter. For example, instead of a binary output indicating a presence or absence, an output may be generated that indicates a probability that subject has cancer. This probability may be determined based on algorithms as described elsewhere herein. Similarly, a probability or likely of response to a particular treatment or a probability of relapse may be outputted.
  • the increased cfRNA transcriptional expression of drug resistance-related gene alterations or splicing variants may serve as predictive biomarker, identifying the response or resistance to therapy.
  • the increased cfRNA transcriptional expression of drug resistance-related AR mutations such as W742C/L and F877L or splicing variants such as AR-V7 or AR-V9, may serves as predictive biomarker, identifying the response or resistance to anti-androgen therapy.
  • blood ctRNA-based variant detection (including fusion) can be used to be more effectively to identify known and novel variants especially fusions in cancer.
  • blood cfRNA-based detection of TMPRSS2-ERG may provide higher detection sensitivity in prostate cancer.
  • the increased ratio of blood-based cancer variants versus urine-based cancer variants may serve as a prognostic biomarker in GU cancers, indicating the disease aggressiveness and guide clinical treatment decision making.
  • MIBC muscle-invasive bladder cancer
  • the increased level of blood-based cancer variants versus urine-based cancer variants may serve as a prognostic biomarker in patients with MIBC and provide evidence for clinical decision making.
  • These cancer variants may include ctDNA, cfRNA, microRNA, methylation, among others.
  • cfRNA and/or microRNA can also be used either alone or in combination with genomic and epigenomic biomarkers for minimal residual disease (MRD) detection, therapy monitoring and early cancer detection.
  • MRD minimal residual disease
  • the sets of biomarkers are processed using an algorithm.
  • the algorithm may be a trained algorithm.
  • the trained algorithms may use the sets of biomarkers as an input and generate an output regarding the presence or absence of a cancer.
  • the output may be specific to a type of cancer or subtype of cancer.
  • the output may indicate the presence of a castrate-resistant prostate cancer.
  • the expression levels of one or more genes may be processed using an algorithm (e.g., a trained algorithm).
  • the algorithm e.g., a trained algorithm
  • the algorithm may use to the expression levels of one or more genes to generate an output regarding the presence or absence of a cancer.
  • a trained algorithm may be trained on expression levels of healthy samples, non-cancer samples, or cancer samples. The trained algorithm may then be able to analyze a sample and expression levels of one or more genes in the sample to determine that the subject has cancer.
  • the trained algorithm may be trained on multiple samples.
  • the trained algorithm may be trained using at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 300, 400, 500 , 600 ,700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, or more independent training samples.
  • the trained algorithm may be trained using no more 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 300, 400, 500 , 600 ,700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, or less, independent training samples.
  • the training samples may be associated with a presence or an absence of the cancer.
  • the training samples may be associated with a relapse of cancer.
  • the training samples may compromise health or a non-cancer sample.
  • the training samples may be associated with cancer that is resistant to a particular drug or treatment.
  • An individual training sample may be positive for a particular cancer.
  • An individual training sample may be negative for a particular cancer.
  • the trained algorithm may be able to detect a cancer, determine a probability of recurrence or relapse of a cancer, or determine if a cancer comprises a set of biomarkers may be resistant to a treatment.
  • the training sample may be associated with additional clinical health data of a subject.
  • additional clinical health data may comprise the gender, weight, height, or levels of metabolites or antibodies in a subject.
  • Additional clinical health data may comprise indication of other diseases, disorders, or diseases conditions.
  • the trained algorithms may be trained using multiple sets of training samples.
  • the sets may comprise training samples as described elsewhere herein.
  • the training may be performed using a first set of independent training samples associated with a presence of the cancer and a second set of independent training samples associated with an absence of the cancer.
  • a first set may be associated with relapse and a second sample may be associated with the absence of relapse.
  • the trained algorithm may also process additional clinical health data of the subject.
  • additional clinical health data may comprise the gender, weight, height, or levels of metabolites or antibodies in a subject.
  • Additional clinical health data may comprise indication of other diseases, disorders, or diseases conditions that the subject may suffer from.
  • the trained algorithm may output a presence or absences of cancer, probability of relapse, or resistance to drug treatment, that may be different from the output of an algorithm that does not process additional clinical health.
  • the trained algorithm may be an unsupervised machine learning algorithm.
  • the unsupervised machine learning algorithm may utilize cluster analysis to identify attributes of interest.
  • the trained algorithm may be a supervised machine learning algorithm.
  • the algorithm may be inputted with training data such to generate an expected or desired output.
  • the supervised learning algorithm may comprise a deep learning algorithm, a support vector machine (SVM), a neural network, or a Random Forest.
  • SVM support vector machine
  • the trained algorithm may be able to identify relationships of biomarkers to a particular cancer prognosis or diagnosis. Without the trained algorithm, it may otherwise be difficult to identify relationships of the biomarkers to accurately identify the presence of a cancer or other parameters associated with the cancer.
  • the systems and methods may comprise an accuracy, sensitivity, or specificity of detection of the cancer or a parameter of the cancer.
  • the methods or systems may comprise detecting the presence or the absence of cancer (or the presence of a parameter of the cancer, such as recurrence, relapse, or drug resistance) in the subject at an accuracy of at least about 60%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or at least about 99%.
  • the methods or systems may comprise detecting the presence or the absence of cancer (or the presence of a parameter of the cancer, such as recurrence, relapse, or drug resistance) in the subject at a sensitivity of at least about 60%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or at least about 99%.
  • the methods or systems may comprise detecting the presence or the absence of cancer (or the presence of a parameter of the cancer, such as recurrence, relapse, or drug resistance) in the subject at a specificity of at least about 60%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or at least about 99%.
  • the methods or systems may comprise detecting the presence or the absence of cancer (or the presence of a parameter of the cancer, such as recurrence, relapse, or drug resistance) in the subject at a positive predictive value of at least about 60%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or at least about 99%.
  • the methods or systems may comprise detecting the presence or the absence of cancer (or the presence of a parameter of the cancer, such as recurrence, relapse, or drug resistance) in the subject at a negative predictive value of at least about 60%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or at least about 99%.
  • a clinical intervention or a therapy may be identified at least in part based on the identification of the presences of cancer, or the presence of a parameter of cancer.
  • the clinical intervention may be a plurality of clinical interventions.
  • the clinical intervention may be selected from a plurality of clinical interventions.
  • the clinical intervention may be a surgical tumor removal (e.g., surgical resection), chemotherapy, radiotherapy, immunotherapy, adjuvant therapy, neoadjuvant therapy, androgen deprivation therapy, or a combination thereof.
  • the clinical interventions may be administered to the subject. After administration of the clinical intervention, a sample may be obtained or derived from the subject such to monitor the cancer or cancer parameters.
  • the methods and systems disclosed herein may be performed iteratively such that monitoring of a cancer can be performed. Additionally, by performing the methods or systems iteratively, therapies or clinical interventions may be updated based at least in part on the results of the methods.
  • the monitoring of the cancer may include an assessment as well as a difference in assessment from a previously generated assessment.
  • the difference in an assessment of cancer in the subject among a plurality of time points (or samples) may be indicative of one or more clinical indications such as a diagnosis of the cancer, a prognosis of the cancer, or an efficacy or non-efficacy of a course of treatment for treating the cancer of the subject.
  • the prognosis may comprise expected progression-free survival (PFS), overall survival (OS), or other metrics relating the severity or survivability of a cancer.
  • FIG. 10 shows a computer system 1001 that is programmed or otherwise configured to perform analysis or steps of the methods, for example determine a likelihood of the presence of a cancer based at least in part on a set of biomarkers of an individual or run an algorithm.
  • the computer system 1001 can regulate various aspects of methods and systems of the present disclosure, such as, for example, perform an algorithm, input training data, analyze sets of biomarkers, or output a result for the user as to the presence or absence of cancer.
  • the computer system 1001 can be an electronic device of a user or a computer system that is remotely located with respect to the electronic device.
  • the electronic device can be a mobile electronic device.
  • the computer system 1001 includes a central processing unit (CPU, also “processor” and “computer processor” herein) 1005, which can be a single core or multi core processor, or a plurality of processors for parallel processing.
  • the computer system 1001 also includes memory or memory location 1010 (e.g., random-access memory, read-only memory, flash memory), electronic storage unit 1015 (e.g., hard disk), communication interface 1020 (e.g., network adapter) for communicating with one or more other systems, and peripheral devices 1025, such as cache, other memory, data storage and/or electronic display adapters.
  • the memory 1010, storage unit 1015, interface 1020 and peripheral devices 1025 are in communication with the CPU 1005 through a communication bus (solid lines), such as a motherboard.
  • the storage unit 1015 can be a data storage unit (or data repository) for storing data.
  • the computer system 1001 can be operatively coupled to a computer network (“network”) 1030 with the aid of the communication interface 1020.
  • the network 1030 can be the Internet, an internet and/or extranet, or an intranet and/or extranet that is in communication with the Internet.
  • the network 1030 in some cases is a telecommunication and/or data network.
  • the network 1030 can include one or more computer servers, which can enable distributed computing, such as cloud computing.
  • the network 1030, in some cases with the aid of the computer system 1001, can implement a peer-to- peer network, which may enable devices coupled to the computer system 1001 to behave as a client or a server.
  • the CPU 1005 can execute a sequence of machine-readable instructions, which can be embodied in a program or software.
  • the instructions may be stored in a memory location, such as the memory 1010.
  • the instructions can be directed to the CPU 1005, which can subsequently program or otherwise configure the CPU 1005 to implement methods of the present disclosure. Examples of operations performed by the CPU 1005 can include fetch, decode, execute, and writeback.
  • the CPU 1005 can be part of a circuit, such as an integrated circuit. One or more other components of the system 1001 can be included in the circuit. In some cases, the circuit is an application specific integrated circuit (ASIC).
  • ASIC application specific integrated circuit
  • the storage unit 1015 can store files, such as drivers, libraries and saved programs.
  • the storage unit 1015 can store user data, e.g., user preferences and user programs.
  • the computer system 1001 in some cases can include one or more additional data storage units that are external to the computer system 1001, such as located on a remote server that is in communication with the computer system 1001 through an intranet or the Internet.
  • the computer system 1001 can communicate with one or more remote computer systems through the network 100.
  • the computer system 1001 can communicate with a remote computer system of a user (e.g., a medical professional or patient).
  • remote computer systems include personal computers (e.g., portable PC), slate or tablet PC’s (e.g., Apple® iPad, Samsung® Galaxy Tab), telephones, Smart phones (e.g., Apple® iPhone, Android- enabled device, Blackberry®), or personal digital assistants.
  • the user can access the computer system 1001 via the network 1030.
  • Methods as described herein can be implemented by way of machine (e.g., computer processor) executable code stored on an electronic storage location of the computer system 1001, such as, for example, on the memory 1010 or electronic storage unit 1015.
  • the machine executable or machine readable code can be provided in the form of software.
  • the code can be executed by the processor 1005.
  • the code can be retrieved from the storage unit 1015 and stored on the memory 1010 for ready access by the processor 1005.
  • the electronic storage unit 1015 can be precluded, and machine-executable instructions are stored on memory 1010.
  • the code can be pre-compiled and configured for use with a machine having a processer adapted to execute the code, or can be compiled during runtime.
  • the code can be supplied in a programming language that can be selected to enable the code to execute in a precompiled or as-compiled fashion.
  • aspects of the systems and methods provided herein can be embodied in programming.
  • Various aspects of the technology may be thought of as “products” or “articles of manufacture” typically in the form of machine (or processor) executable code and/or associated data that is carried on or embodied in a type of machine readable medium.
  • Machine-executable code can be stored on an electronic storage unit, such as memory (e.g., read-only memory, random-access memory, flash memory) or a hard disk.
  • “Storage” type media can include any or all of the tangible memory of the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide non-transitory storage at any time for the software programming. All or portions of the software may at times be communicated through the Internet or various other telecommunication networks. Such communications, for example, may enable loading of the software from one computer or processor into another, for example, from a management server or host computer into the computer platform of an application server.
  • another type of media that may bear the software elements includes optical, electrical and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links.
  • a machine readable medium such as computer-executable code
  • a tangible storage medium such as computer-executable code
  • Non-volatile storage media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s) or the like, such as may be used to implement the databases, etc. shown in the drawings.
  • Volatile storage media include dynamic memory, such as main memory of such a computer platform.
  • Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system.
  • Carrier-wave transmission media may take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications.
  • RF radio frequency
  • IR infrared
  • Common forms of computer-readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a ROM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer may read programming code and/or data.
  • the computer system 1001 can include or be in communication with an electronic display 1035 that comprises a user interface (UI) 1040 for providing, for example, an input of biomarkers or sequencing data, or a visual output relating to a detection, diagnosis, or prognosis.
  • UI user interface
  • Examples of UI’s include, without limitation, a graphical user interface (GUI) and web-based user interface.
  • Methods and systems of the present disclosure can be implemented by way of one or more algorithms.
  • An algorithm can be implemented by way of software upon execution by the central processing unit 1005.
  • the algorithm can, for example, determine a presence or absence of a cancer or cancer parameter based on a set of input sequencing data from a sample derived from a subject.
  • Example 1 Sequencing of transcripts after use of depletion probes
  • cfRNA was performed demonstrating the advantages of the methods and compositions of the present disclosure.
  • Samples of blood comprising cfRNA are subjected to library preparation.
  • cfRNA was subjected to a reverse transcription to generate cDNA.
  • Sequencing adapters are ligated to the cDNA and PCR using primers specific to the sequencing adapters is performed to amplify the cDNA.
  • the resulting cDNA is split into four equal parts to generate four identical libraries.
  • Each library is subjected to an exome panel and different amount of depletion probes: no depletion probes, IX depletion probes, 10X depletion probes, and 100X depletion probes.
  • the depletion probes were able to bind HBB, HBA2, HBA1, HBD, B2M, and CD74.
  • the libraries are then subjected to a pulldown and wash to remove unbound nucleic acids and nucleic acid bound to the depletion probes.
  • the resulting libraries were then sequenced and analyzed.
  • Table 1 shows the resulting transcript counts for the genes that were targeted by the depletion probes. Specifically, with increasing amounts of depletion probes, the TPM (transcripts per million) were shown to decrease, indicating the successful removal of transcripts from the final sequencing reads. Transcripts that were not targeted by depletion probes (e.g., ACTB, TMSB4X, and FTH1), were shown to not maintain the number of transcripts and not be depleted.
  • FIGs. 2A-2C show plots of transcripts per million (TPM) for a sample without depletion and samples with depletion. Each gene is shown as a point with the y axis indicating the TPM for a sample of IX (FIG. 2A), 10X (FIG. 2B), or 100X (FIG. 2C) of depletion probes, with the x- axis indicating the TPM for a sample with no depletion probes.
  • the plots show a linear relationship with a slope near to 1, demonstrating that the depletion probes generally do not affect the sample processi.ng for transcripts that are not targeted with the depletion probes.
  • Example 2 Depletion of transcripts from whole transcriptome libraries of cfRNA [0096] Samples of plasma comprising cfRNA are subjected to library preparation. cfRNA was subjected to a reverse transcription to generate cDNA. Sequencing adapters were ligated to the cDNA and PCR using primers specific to the sequencing adapters was performed to amplify the cDNA. The library was then split into two identical libraries, with depletion probes added to one of the libraries.
  • the depletion probes were able to bind HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21.
  • FIG. 3 A shows the expression levels of genes depleted in the library with or without the depletion. As shown, the library with no depletion shows a large number of transcripts that correspond to the genes that are to be depleted. In contrast, the library that was subject to depletion shows little to no reads corresponding to the depletion probes. This indicates the successful removal of transcripts from the final sequencing reads.
  • FIG. 3B shows this increase in transcripts.
  • Three different samples were used and generated 3 different libraries. Each of these libraries were divided in half, in which one half was subjected to depletion. Each of the libraries demonstrated an increase in number of unique transcripts for the half subjected to depletion.
  • FIGS. 6A-6E shows the numbers of transcripts detected for the various genes that were targeted with the depletion probes. Across every targeted gene there was a significantly lower number of reads (e.g., transcripts per million) in the libraries subjected to depletion vs libraries that were not subjected to depletion.
  • FIG. 7 shows data relating to differentially expressed genes in cancer samples (P a dj ⁇ 0.01 as determined be DESeq2) as compared to healthy normal samples. It shows there are a significant number of genes that are down regulated or upregulated in the cancer sample as compared to the normal control. Principal component analysis also corroborated these results, showing a clear separation between normal and cancer samples (FIG. 8).
  • FIG. 9 shows a chart representing the expression of AR-V forms and full-length AR (AR-FL) in various samples.
  • TMPRSS2::ERG RNA was identified in some cancer clinical samples.
  • Table 2 shows results of select genes of this analysis. As shown in Table 2, copy number gain was correlated with an upregulation of genes, whereas copy number loss was correlated with a downregulation of genes.
  • these assays demonstrate that the depletion probes can be used to generate cfRNA libraries that accurately measure expression profiles, identify splice variants and gene fusions and additionally can be used in conjunction with cfDNA sequencing assays. These assays allow for improved liquid biopsy-based cancer diagnostics.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Physics & Mathematics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Provided herein are methods and compositions for generating sequencing libraries. The methods and compositions may comprise using pulldown probes. The methods may comprise using probes that do not have a pulldown moiety. The probes that do not have pulldown moieties may bind to unwanted transcripts and may reduce the number of sequencing reads to unwanted transcripts.

Description

METHODS AND COMPOSITIONS FOR GENERATION OF SEQUENCING
LIBRARIES
CROSS-REFERENCE
[0001] This application claims the benefit of U.S. Provisional Application No. 63/591,012, filed October 17, 2023, which is incorporated by reference herein in its entirety.
BACKGROUND
[0002] Cancer is a leading cause of deaths worldwide. Detection of cancer in individuals may be critical for providing treatment and improving patient outcomes. Cancer may be caused by genetic aberration which may lead to unregulated growth of cells. Detection of the genetic aberrations may be important for the detection of cancer. Sequencing of nucleic acids in a sample from a patient may be used to detect genetic aberrations.
SUMMARY
[0003] Provided herein are systems and methods for generating sequencing libraries. The sequencing libraries may be used to detect the presence of nucleic acid sequences. The presence or absence of nucleic acid sequences may be used to detect the presence or absence of a disease or disorder, such as cancer, in a subject. The systems and methods provided herein may allow for polynucleotides to be assayed to identify biomarkers of cancers in a subject. Detection of a type of cancer or the specific biomarkers for a given cancer may allow an effective treatment to be provided to an individual and may result in improved outcomes. Non-invasive testing for cancer may allow for improved detection of cancer without needing to perform invasive procedures or sampling from a tumor. For multiple types of cancer, non-invasive sampling and testing may allow for detection of particular biomarkers that indicate a particular cancer type (or subtype) and may be used to identify a prognosis for an individual suffering from the cancer. In order to provide accurate detection and prognosis for a cancer, sample preparation that generates accurate data should be used. By increasing the number of analytes (and sets of biomarkers from the analytes) detected or assayed, the detection of a cancer (or a cancer parameter) may be improved and may allow for the recommendation of an effective treatment and may also allow for the prognosis to be more accurate. As such, methods that improve detection of cancer associated biomarkers are needed to improve the accuracy, sensitivity, or efficiency of cancer detection.
[0004] In an aspect, the present disclosure provides a method of preparing a sequencing library, the method comprising: (a) obtaining a biological sample derived from a subject, wherein the biological sample comprises nucleic acid molecules; (b) contacting the biological sample with (i) a first set of probes comprising a pulldown moiety and (ii) a second set of probes without a pulldown moiety, to anneal the first set of probes and the second set of probes to at least a one or more subsets of the nucleic acid molecules, thereby generating (A) a first set of annealed nucleic acids comprising probes from the first set of probes and a first subset of the nucleic acid molecules annealed thereto, and (B) a second set of annealed nucleic acids comprising probes from the second set of probes and second subset of nucleic acid molecules annealed thereto, wherein one or more probes of the first set of probes and one or more probes of the second set of probes are complementary to a same target; and (c) separating the first set of annealed nucleic acids and the second set of annealed nucleic acids.
[0005] In some embodiments, the separating further comprises performing a pulldown reaction at least in part by contacting the pulldown moiety with one or more pulldown moiety binding agents, thereby selectively binding the first set of annealed nucleic acids.
[0006] In some embodiments, the pulldown moiety binding agents comprise streptavidin, or a functional derivative thereof. In some embodiments, the pulldown moiety binding agents are attached to a support. In some embodiments, the support is a bead. In some embodiments, the bead is a magnetic bead.
[0007] In some embodiments, the separating comprises removing the second set of annealed nucleic acids. In some embodiments, the removing comprises subjecting the second set of annealed nucleic acids to a wash buffer.
[0008] In some embodiments, the pulldown moiety comprises biotin, or a functional derivative thereof.
[0009] In some embodiments, the method further comprises removing the second set of annealed nucleic acids.
[0010] In some embodiments, the method further comprises, prior to (b), amplifying the nucleic acid molecules. In some embodiments, the amplifying comprises reverse transcription. [0011] In some embodiments, the method further comprises, subsequent to (c), amplifying the first set of annealed nucleic acids, thereby generating enriched nucleic acids. In some embodiments, the amplifying comprises universal amplification or targeted amplification. In some embodiments, the amplifying comprises polymerase chain reaction (PCR). In some embodiments, the PCR comprises digital PCR, droplet PCR, or digital droplet PCR. In some embodiments, the amplifying comprises reverse transcription.
[0012] In some embodiments, the method further comprises subjecting the amplified nucleic acids, or derivatives thereof, to a sequencing reaction thereby generating a set of sequencing reads. In some embodiments, the sequencing reaction comprises whole exome sequencing. In some embodiments, the sequencing reaction comprises targeted sequencing.
[0013] In some embodiments, the method further comprises analyzing the set of sequencing reads to determine a presence or an absence of a disease, disorder, or condition of the subject. In some embodiments, the disease, disorder, or condition comprises cancer. In some embodiments, the method further comprises detecting the presence of the cancer of the subject. In some embodiments, the method further comprises, responsive to detecting the presence of the cancer of the subject, administering a cancer therapy to the subject. In some embodiments, the cancer therapy is selected from the group consisting of a surgical tumor removal, a chemotherapy, a radiation therapy, a targeted therapy, an immunotherapy, and a combination thereof. In some embodiments, the method further comprises analyzing the set of sequencing reads to determine one or more expression levels of one or more genes. In some embodiments, the method further comprises comparing the one or more expression levels to an expression levels derived from a non-cancer control. In some embodiments, the splice variant comprises a splice variant of androgen receptor (AR). In some embodiments, the splice variant of AR is an AR-V1, AR-V7, AR-V12, or combination thereof. In some embodiments, the one or more expression levels comprise expression levels of a gene fusion or rearrangement.
[0014] In some embodiments, the method further comprises, prior to (b), attaching sequencing adapters to the cell-free nucleic acids, or derivatives thereof. In some embodiments, the attaching comprises ligation.
[0015] In some embodiments, the first set of probes are complementary to one or more exomes. In some embodiments, the second set of probes are complementary to a subset of one or more exomes. In some embodiments, the second set of probes are complementary to one or more genes selected from the group consisting of HBA1, HBA2, HBB, HBD, B2M, CD74, and RN7SL. In some embodiments, the second set of probes are complementary to one or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, and RPS21. In some embodiments, one or more probes of the first set of probes and one or more probes of the second set of probes comprise a same sequence. In some embodiments, a probe of the first set of probe and probe of the second set of probe comprise different sequences and target a same exome. In some embodiments, the second set of probes comprises one or more probes configured to bind to transcripts that are deemed unwanted or uninformative. [0016] In some embodiments, the nucleic acids comprises cell free RNA (cfRNA). In some embodiments, the biological sample is selected from the group consisting of: a cell-free deoxyribonucleic acid (cfDNA) sample, a cell-free ribonucleic acid (cfRNA) sample, a plasma sample, a serum sample, a buffy coat sample, a peripheral blood mononuclear cell (PBMC) sample, a red blood cell sample, a urine sample, a urine cell pellet sample, a saliva sample, tissue biopsy, pleural fluid sample, peritoneal fluid sample, amniotic fluid sample, cerebrospinal fluid sample, bile sample, lymphatic fluid sample, sweat sample, tear sample, semen sample, or any derivative thereof, and any combination thereof. In some embodiments, the biological sample comprises the plasma sample. In some embodiments, the biological sample comprise a blood sample. In some embodiments, the biological sample is obtained or derived from the subject using an ethylenediaminetetraacetic acid (EDTA) collection tube, a cell-free RNA collection tube, or a cell-free deoxyribonucleic acid (DNA) collection tube, other blood collection tube, and CTC collection tubes. In some embodiments, the sample further comprise cell-free DNA (cfDNA) molecules. In some embodiments, the method further comprises amplifying the cfDNA molecules thereby generating amplified DNA molecules. In some embodiments, the method, further comprises subjecting the amplified DNA molecules, or derivatives thereof, to a DNA sequencing reaction thereby generating a set of DNA sequencing reads. In some embodiments, the DNA sequencing reaction comprises a whole genome sequencing reaction. In some embodiments, the method further comprises processing the DNA sequencing reads to determine a copy number parameter. In some embodiments, the method further comprises, based at least on the copy parameter, identifying the subject as having a copy number gain or a copy number loss at loci as compared to a reference copy number.
[0017] In another aspect, the present disclosure provides a composition comprising: (i) a first set of probes comprising a pulldown moiety, and (ii) a second set of probes without a pulldown moiety, wherein the first set of probes are complementary to one or more human exomes, and wherein the second set of probes comprises one or more probes complementary to HBA1, HBA2, HBB, HBD, B2M, CD74, and RN7SL.
[0018] In an aspect, the present disclosure provides a composition comprising: (i) a first set of probes comprising a pulldown moiety, and (ii) a second set of probes without a pulldown moiety, wherein the first set of probes are complementary to one or more human exomes, and wherein the second set of probes comprises one or more probes complementary to HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, and RPS21. [0019] Another aspect of the present disclosure provides a non-transitory computer readable medium comprising machine executable code that, upon execution by one or more computer processors, implements any of the methods above or elsewhere herein.
[0020] Another aspect of the present disclosure provides a system comprising one or more computer processors and computer memory coupled thereto. The computer memory comprises machine executable code that, upon execution by the one or more computer processors, implements any of the methods above or elsewhere herein.
[0021] Additional aspects and advantages of the present disclosure will become readily apparent to those skilled in this art from the following detailed description, wherein only illustrative embodiments of the present disclosure are shown and described. As will be realized, the present disclosure is capable of other and different embodiments, and its several details are capable of modifications in various obvious respects, all without departing from the disclosure. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.
INCORPORATION BY REFERENCE
[0022] All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference. To the extent publications and patents or patent applications incorporated by reference contradict the disclosure contained in the specification, the specification is intended to supersede and/or take precedence over any such contradictory material.
BRIEF DESCRIPTION OF THE DRAWINGS
[0023] The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings (also “figure” and “FIG.” herein), of which:
[0024] FIG. 1 shows a schematic of an example method of the disclosure.
[0025] FIGs. 2A-2C shows plots of transcripts per million of samples using depletion probes. [0026] FIG. 3A shows a chart of the expression levels of genes with or without depletion. FIG. 3B shows a chart of the number of transcripts with or without depletion. [0027] FIG. 4 shows a chart relating to the concordance of transcripts in libraries with or without depletion.
[0028] FIG. 5 shows a chart of the exonic rate of cfRNA whole transcriptome libraries.
[0029] FIG. 6A-6E show charts of the efficiency of depletion of genes targeted for depletion.
[0030] FIG. 7 shows a graph relating to the genes that were upregulated, down regulated, or had no significant change for cancer versus non-cancer samples.
[0031] FIG. 8 shows a graph of the output of the principal competent analysis of differential gene expression of cancer versus non-cancer.
[0032] FIG. 9 shows a chart of the androgen receptor splice variant expression levels in different cancer and non-cancer samples.
[0033] FIG. 10 shows a computer system 1001 that is programmed or otherwise configured to implement a method of the present disclosure.
DETAILED DESCRIPTION
[0034] While various embodiments of the invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions may occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed.
[0035] Provided herein are systems and methods for processing samples for assaying. The methods may be used to assay for the presence or absence of cancer in a subject. The systems and methods provided herein comprises assaying polynucleotides to identify biomarkers of cancers in a subject. The biomarkers may be processed in order to identify the presence or absence of cancer. The method described herein may be used to generate or prepare a sequencing library.
[0036] In an aspect, the present disclosure provides a method of preparing a sequencing library, the method comprising: (a) obtaining a biological sample derived from a subject, wherein the biological sample comprises cell-free ribonucleic acids; (b) contacting the biological sample with (i) a first set of probes comprising a pulldown moiety and (ii) a second set of probes without a pulldown moiety, to anneal the first set of probes and the second set of probes to at least a portion of the cell-free ribonucleic acids, thereby generating (A) a first set of annealed nucleic acids comprising probes from the first set of probes and nucleic acids annealed thereto, and (B) a second set of annealed nucleic acids comprising probes from the second set of probes and nucleic acids annealed thereto, wherein one or more probes of the first set of probes and one or more probes of the second set of probes are complementary to a same target; and (c) separating the first set of annealed nucleic acids and the second set of annealed nucleic acids.
[0037] Nucleic acids derived from samples may be generally processed to generate a sequencing library. The sequencing library may comprise nucleic acids derived from the sample as well as additional regions that are able to interface with the sequencer and allow for the nucleic acids to be sequenced. Depending on the type of sequencing performed or types of nucleic acids that are to be sequenced, the nucleic acids may be subjected to additional reactions to generate the sequencing library. For example, amplification or enrichment reactions may be performed such that specific sequences of interest are sequenced. For example, in the case of disease, such as cancer, specific genes may be indicative of a disease and specific sequencing of those specific genes may be desired. In other cases, whole genome (e.g., Illumina or Pacific Biosciences of California) or whole transcriptome sequencing (e.g., Illumina) may be performed which does not generally enrich for a specific set of genes and may provide data pertaining to a broader set of the genes.
[0038] In the case of cell-free nucleic acids, multiple nucleic acids derived from specific genes may be more prevalent than those derived from other genes. Regarding cancer detection, cancer cells may shed cell-free nucleic acid into a biological fluid of a subject, however these cell-free nucleic acids may be at a low concentration or may be otherwise difficult to detect. Non-cancer cells or nucleic acids that are not correlated with cancer may also be in the biological fluid. These non-cancer associated nucleic acids may outcompete or otherwise make it difficult for the cancer-associated nucleic acids to be detected. For example, in a sequencing reaction of cell-free nucleic acids, sequencing adapters may be attached to the cell-free nucleic acids and then may be subjected to a sequencing reaction. The sequencing adapter may be attached to significantly more non-cancer associated cell-free nucleic acids than cancer associated cell-free nucleic acids and result in more sequencing reads to non-cancer associated cell-free nucleic acids. These non- cancer associated sequencing reads may be uninformative for detecting cancer, and may therefore result in a reduction of efficiency for detecting cancer in a subject. This loss of efficiency may result in increased costs resulting from additional sampling, sample preparation, and sequencing. Reducing the number of uninformative or otherwise unwanted nucleic acids in a sample may improve the efficiencies of the assays.
[0039] In the case of cell-free RNA (cfRNA) analysis, certain transcripts may be more abundant that others. For example, hemoglobin transcripts may be highly present in blood. Using whole transcriptome analysis on blood derived samples may generate sequencing libraries that are heavily populated by these more abundant (e.g., hemoglobin) transcripts. These more abundant transcripts when sequences may take up the majority of the outputted sequencing reads, effectively reducing the number of reads that are informative of a disease (e.g. cancer) or other genetic variation of interest.
[0040] To reduce the unwanted transcripts, the methods disclosed herein may use probes to anneal to specific sequences. The probes may bind to nucleic acids of a sample allowing for manipulation of the bound nucleic acids. For example, the methods may comprise contacting a sample with one or more pulldown probes. The pulldown probe may allow for enrichment of sequences. The pulldown probes may bind to sequences and then allow for nucleic acid with these sequences to be “pulled down” or enriched by binding the sequences to a solid substrate. The unbound sequences can then be washed out of the sample, with the pulled down nucleic acids available for additional downstream processing. The pulldown probes may comprise a pulldown moiety. The pulldown probe may comprise a nucleic acids sequence that is able to hybridize to a nucleic acid (e.g. cfRNA) in a sample. For example, pulldown probes may comprise exome sequences. The pulldown moiety may be a biotin, ligand, or chemical moiety. The pulldown probes or pulldown moiety binding agents may be attached to a support. For example, the support may be a bead. The support may be magnetic bead. The support may comprise an array or may be part of an array. The support may comprise a flow cell or may be a part of a flow cell. The pulldown probes may be subjected to pulldown reactions by binding the pulldown moiety to a substrate or support. For example, the pulldown moiety may be a biotin and a streptavidin (or other avidin) may be added to the solution to bind the biotin. The streptavidin may be attached to support, thereby binding the probes and the annealed nucleic acids. The pulldown probes may be attached to a magnetic bead. For example, the pulldown probes may anneal to a nucleic acids and the pulldown probes may be attached to a magnetic bead. The magnetic bead may be subjected to a magnetic field separated from a solution via the magnetic field.
[0041] However, the pulldown probes may comprise probes that enrich for sequences that may be uninformative for a given use case or assay. For example, as described above, hemoglobin may be a highly prevalent transcript in samples derived from blood and may be generally uninformative for detection of cancer in a subject. For example, whole transcriptome sequencing kits may contain pulldown probes for hemoglobin which may be useful for sequencing of tissue, but may result in too many transcripts of hemoglobin in a sequencing assay using blood (or a sample derived from blood). In conjunction with pulldown probes, probes without pulldown moieties, may also be used. These probes without pulldown moieties may be probes that compete (i.e., depletion probes) with the pulldown probes. These depletion probes, as they do not have a pulldown moiety, may not be pulled down when subjected to a pulldown moiety binding agent, thereby allowing for the pulldown probes to be separated from the depletion probes. To reduce the number of unwanted transcripts, probes without pulldown moieties may be generated that bind to the unwanted transcripts. Since the unwanted transcripts are bound to the probes without pulldown moieties, the unwanted transcripts cannot also bind to the probes with pulldown moieties, thus reducing the number of unwanted transcripts that are pull downed.
[0042] The probes without a pulldown moiety (i.e., depletion probes) may bind to any sequence. For example, the depletion probe may bind to a gene. For example, the depletion probe may bind to an exome. The probes without a pulldown moiety may comprise a same sequence as a probe with a pulldown moiety, and may directly compete for a same target. The probes without a pulldown moiety may comprise a different sequence as a probe with a pulldown moiety, but may target a same exome. The depletion probes may be designed to bind to any sequence of a transcript that is deemed unwanted or uninformative. For example, a set of depletion probes may be designed to target a plurality of different genes. The set of depletion probes may be designed to target a plurality of different genes that are abundant in blood samples. The depletion probes may bind to sequences of one or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL. The depletion probes may bind to sequences of two or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL. The depletion probes may bind to sequences of three or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL. The depletion probes may bind to sequences of four or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL. The depletion probes may bind to sequences of five or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL. The depletion probes may bind to sequences of six or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL. The depletion probes may bind to sequences of seven or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL. The depletion probes may bind to sequences of eight or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL. The depletion probes may bind to sequences of nine or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL. The depletion probes may bind to sequences of ten or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL. The depletion probes may bind to sequences of eleven or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL. The depletion probes may bind to sequences of twelve or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL. The depletion probes may bind to sequences of thirteen or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL. The depletion probes may bind to sequences of fourteen or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL. The depletion probes may bind to sequences of fifteen or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL. The depletion probes may bind to sequences of sixteen or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL. The depletion probes may bind to sequences of seventeen or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL. The depletion probes may bind to sequences of eighteen or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21 and RN7SL. The depletion probes may bind to sequences of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21, and RN7SL.
[0043] The methods disclosed herein may comprise conducting one or more enrichment reactions on one or more nucleic acid molecules in a sample. The enrichment reactions may comprise one or more hybridization reactions. For example, as described herein, the enrichment reactions may comprise contacting a sample with one or more probes or bait molecules that hybridize to a nucleic acid molecule of the biological sample. The enrichment reaction may comprise differential amplification of a set of nucleic acid molecules. The enrichment reaction may enrich for a plurality of genetic loci or sequences corresponding to genetic loci. The enrichment reactions may comprise the use of primers or probes that may complementarity to sequences (or sequences upstream or downstream) of a sequence that is to be enriched. For example, a capture probe may comprise sequence complementarity to a set of genomic loci and allow the enrichment of the genomic loci. The enrichments reactions may comprise a plurality of probes or primers. A plurality of probes (e.g., a plurality of pulldown probes or a plurality of depletion probes) may comprise 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260, 265,
270, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350, 355, 360,
365, 370, 375, 380, 385, 390, 395, 400, 405, 410, 415, 420, 425, 430, 435, 440, 445, 450, 455,
460, 465, 470, 475, 480, 485, 490, 495, 400, 505, 510, 515, 520, 525, 530, 535, 540, 545, 550,
555, 560, 565, 570, 575, 580, 585, 590, 595, 600, 605, 610, 615, 620, 625, 630, 635, 640, 645,
650, 655, 660, 665, 670, 675, 680, 685, 690, 695, 700, 705, 710, 715, 720, 725, 730, 735, 740,
745, 750, 755, 760, 765, 770, 775, 780, 785, 790, 795, 800, 805, 810, 815, 820, 825, 830, 835,
840, 845, 850, 855, 860, 865, 870, 875, 880, 885, 890, 895, 900, 905, 910, 915, 920, 925, 930,
935, 940, 945, 950, 955, 960, 965, 970, 975, 980, 985, 990, 995, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 2000000, 300000, 400000, 500000, 600000, 700000, 800000, 900000, 1000000, or more different probes.
[0044] As described herein, the methods may comprise using two or more probe set (e.g., a set of probes with pulldown moieties and a set of probes without pulldown moieties). The two or more probes set may be in the same concentrations. The two or more probes set may be in different concentrations. For example, the set of probes without pulldown moieties may be at a greater concentration than the set of probes with pulldown moieties. For example, the set of probes with pulldown moieties may be at a greater concentration than the set of probes without pulldown moieties. A set of probes may be at a concentration of at least 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 90, 100, or more, times greater than the concentration of another set of probes.
[0045] The methods disclosed herein may comprise conducting separation (e.g., purification or isolation) reactions on one or more nucleic acid molecules in a sample. The separation reactions may comprise contacting a sample with one or more beads or bead sets (e.g., beads comprising pulldown probes). The one or more beads may comprise a physical property (e.g., size, shape, composition) that may allow it to be separated or manipulated. For example, the one or more beads may be magnetic beads. The separation reaction may comprise one or more hybridization reactions, enrichment reactions, amplification reactions, sequencing reactions, or a combination thereof. The separation reaction may comprise the use of one or more separators. The one or more separators may comprise a magnetic separator. The separation reaction may comprise separating bead bound nucleic acid molecules from bead free nucleic acid molecules. The separating reaction may comprise separating pulldown probe hybridized nucleic acid molecules from pulldown probe free nucleic acid molecules. The separation reactions may comprises removing or separating a group of nucleic acid molecules from another group of nucleic acids. The separating reactions may comprise performing a wash or using a wash buffer to remove a group of nucleic acids.
[0046] FIG. 1 shows an example schematic of using the probes without pulldown moieties and the probes with pulldown moieties. FIG. 1 shows a set of “depletion probes” and “exome panel probes”. The “exome panel probes” are shown to comprise a pulldown moiety indicated by the star symbol on each probe, whereas the probes shown for the “depletion probes” do not have a star symbol indicating a lack of a pulldown moiety. When added to the transcripts (shown as the thick lines), the probes are able to bind to transcripts. Those transcripts that bind to only depletion probes are not pulled down, whereas the transcripts that bind to the exome panel probes are able to be pulled down. The pulled downed transcripts can then be subjected to the subsequent reactions (e.g., sequencing reactions, amplification reactions), wherein the transcripts that are not pulled down can be removed from the sample and discarded. The depletion probes then can result in increasing the number of transcripts that are not pulled down, and unwanted transcripts can be targeted by depletion probes. [0047] The biological samples may be subjected to additional reactions or conditions prior to assaying. For example, the biological sample may be subjected to conditions that are sufficient to isolate, enrich, or extract nucleic acids, such cfDNA molecules or cfRNA molecules.
[0048] The biological sample may comprise nucleic acids. The biological sample may be a cell-free deoxyribonucleic acid (cfDNA) sample or a cell-free ribonucleic acid (cfRNA) sample. The biological sample may comprise genomic DNA or germline DNA(gDNA). The nucleic acid may be a DNA (e.g. double-stranded DNA, single-stranded DNA, single-stranded DNA hairpins, cDNA, genomic DNA, germline DNA, circulating tumor DNA (ctDNA), cell-free DNA (cfDNA)), an RNA (e.g. cfRNA, mRNA, cRNA, miRNA, siRNA, miRNA, snoRNA, piRNA, tiRNA, snRNA), or a DNA/RNA hybrids. The biological sample may be a derived from or contain a tissue. The biological sample may be a derived from or contain a biological fluid. For example, the biological sample may be a plasma sample, a serum sample, a buffy coat sample, a peripheral blood mononuclear cell (PBMC) sample, a red blood cell sample, a urine sample, a saliva sample, or other body fluid sample. The biological sample may comprise or be a pleural fluid sample, peritoneal fluid sample, amniotic fluid sample, cerebrospinal fluid sample, bile sample, lymphatic fluid sample, sweat sample, tear sample, semen sample, or any combination of biological fluid. In some case, the samples may comprise RNA and DNA. For example, a sample may comprise cfDNA and cfRNA and the cfDNA and cfRNA may be analyzed by methods as described elsewhere herein.
[0049] The methods disclosed herein may comprise conduction extraction reactions on one or more nucleic acids in a biological sample. The extraction reactions may lyse cells or disrupt nucleic acid interactions with the cell or with cellular proteins, such that the nucleic acids may be isolated, purified, enriched or subjected to other reactions.
[0050] The methods disclosed herein may comprise amplification or extension reactions. The amplification reactions may comprise polymerase chain reaction. The amplification reaction may comprise PCR-based amplifications, non-PCR-based amplifications, or a combination thereof. The one or more PCR-based amplifications may comprise PCR, qPCR, nested PCR, linear amplification, or a combination thereof. The PCR reaction may be a digital PCR, droplet PCR, or digital droplet PCR. The one or more non-PCR-based amplifications may comprise multiple displacement amplification (MDA), transcription-mediated amplification (TMA), nucleic acid sequence-based amplification (NASBA), strand displacement amplification (SDA), real-time SDA, rolling circle amplification, circle-to-circle amplification or a combination thereof. The amplification reactions may comprise an isothermal amplification. The amplification reaction may comprise a untargeted amplification or a targeted amplification. For example, the amplification reaction may be a universal amplification reaction. The universal amplification reaction may amplify all members of a sequencing library. The amplification reaction may be a reverse transcription reaction. For example, an amplification reaction may amplify RNA (e.g., cfRNA) in a sample and generate cDNA.
[0051] The method disclosed herein may comprise a barcoding reaction. A barcoding reaction may comprise the additional of a barcode or tag to the nucleic acid. The barcode may be a molecular barcode or a sample barcode. For example, a barcode nucleic acid may comprise a barcode sequence which may be a degenerate n-mer. The sequence may be randomly generated or generated such to synthesize a specific barcode sequence. The barcode nucleic acid may be added to a sample such to label the nucleic acid molecules in the sample. The barcodes may be specific to a sample. For example, a plurality of barcode nucleic acids may be added to a sample in which the barcode sequence is the same. Upon barcoding of the nucleic acids, those originating from a same sample may have a same barcode sequence, and may allow a nucleic acid to be identified as belonging to a particular or given sample. A molecular barcode may also be used such that each molecule (or a plurality of molecules) in a same volume have a different molecular barcode. This barcode may be subjected to amplification such that all amplicons derived from a molecule have the same barcode. In this way, molecules originating from a same molecule may be identified. The sequence reads may be processed based on the barcode sequences. For example, the processing may reduce errors or allow a molecule to be tracked. Barcode sequences may be appended or otherwise added or incorporated into a sequence by various reactions, for example an amplification, extension, or ligation reaction, and may be performed enzymatically using a nucleic acid polymerase or ligase. The ligation may be an overhang or blunt end ligation and the barcodes may comprise complementarity to nucleic acids to be barcoded. This complementarity may be a sequence derived from the sample from the subject or may be constant sequence generated via a reaction performed on the nucleic acids in the sample.
[0052] The method disclosed herein may comprise adding an adapter to a nucleic acid. The adapter may be a sequencing adapter. The sequencing adapter may allow for the nucleic acids to be attached to a sequencing flow cell or another support from a sequencer. This adapter may be subjected to amplification such that nucleic acids with adapters may be enriched. Adapters may be appended or otherwise added or incorporated into a sequence by various reactions, for example an amplification, extension, or ligation reaction, and may be performed enzymatically using a nucleic acid polymerase or ligase. The ligation may be an overhang or blunt end ligation and the barcodes may comprise complementarity to nucleic acids to be barcoded. This complementarity may be a sequence derived from the sample from the subject or may be constant sequence generated via a reaction performed on the nucleic acids in the sample.
[0053] The methods disclosed herein may be used in conjunction with other assays. As the samples may comprise multiple types of nucleic acids (e.g. RNA and DNA), different assays (e.g., sequencing reactions specific to DNA or RNA) may be used such to obtain sequence reads relating to the nucleic acid type. For example, a sample may comprise DNA (e.g., cfDNA) and RNA (cfRNA). The sample may be subjected to both a sequencing assay on DNA and a sequencing assay on RNA. For example, a sample may be subjected to whole genome sequencing and whole transcriptome sequencing. The sequencing assays may use libraries that may be subjected to depletion via depletion probes. For example, the sequencing library that comprise nucleic acids derived from RNA and nucleic acids derived from DNA. The nucleic acids derived from RNA may be subjected to depletion via depletion probes, whereas the nucleic acids derived from DNA may not be subjected to depletion.
[0054] In some cases, the biological sample may comprise multiple components. For example, the biological sample may be a whole blood sample. The biological sample may be subjected to reactions such to separate or fractionate a biological sample. For example, a whole blood sample may be a fractionated and cell free nucleic acids may be obtained. The whole blood sample may be fractionated using centrifugation such that blood cells may be separated from the plasma (which may contain cell free nucleic acid). A sample may be subjected to multiple rounds of separation or fractionation.
[0055] The biological sample may be collected, obtained, or derived from the subject using a collection tube. The collection tube may be an ethylenediaminetetraacetic acid (EDTA) collection tube, a cell-free RNA collection tube, or a cell-free deoxyribonucleic acid (DNA) collection tube and CTC collection tubes, or other blood collection tube. The collection tube may be a urine collection cup. The collection tube may comprise additional reagents for stabilizing the nucleic acid molecules or blood cells. The collection tube may allow the nucleic acid or blood cells to be stable such to minimize degradation of the biological sample prior to assaying. The additional reagents may comprise buffer salts or chelators.
[0056] The biological sample may be obtained or derived from a subject at a variety of times. The biological sample may be obtained or derived from a subject prior to the subject receiving a therapy for cancer. The biological sample may be obtained or derived from a subject during receiving a therapy for cancer. The biological sample may be obtained or derived from a subject after receiving a therapy for cancer. The biological sample may be collected over 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000 or time points. The time points may occur over a 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 55, 60 or more hour period. The time points may occur over a 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 55, 60 or more day period. The time points may occur over a 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 55, 60 or more week period. The time points may occur over a 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 55, 60 or more month period. The time points may occur over a I, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 55, 60 or more year period.
[0057] The subject may be a suspected of a suffering from a cancer. The cancer may be specific or originating from an organ or other area of the subject. For example, the cancer may be breast cancer, lung cancer, prostate cancer, colorectal cancer, melanoma, bladder cancer, nonHodgkin lymphoma, kidney cancer, endometrial cancer, leukemia, pancreatic cancer, thyroid cancer, and liver cancer, and any combination thereof. The cancer may be a hormone sensitive prostate cancer (HSPC), castrate-resistant prostate cancer (CRPC), metastatic prostate cancer, and a combination thereof. The cancer may comprise biomarkers that are specific to a particular cancer. The specific biomarkers may indicate a presence of a particular cancer. For example, biomarker may indicate that a castrate-resistant prostate cancer is present. The identification of the presence of a type of cancer may allow the determination of a treatment option or recommendation.
[0058] In some cases, the subject may be asymptomatic for cancer. For example, the cancer may not exhibit any symptoms and the subject may be unaware of the presence of cancer. The methods described herein may allow a cancer to be identified at an earlier stage than otherwise. The identification of the presence of the cancer at an earlier stage may allow a treatment option or recommendation to be determined at an earlier stage and may allow the subject to have an improved prognosis.
[0059] In various aspects described throughout the disclosure, the nucleic acids may be subjected to sequencing reactions. The sequencing the reactions may be used on DNA, RNA or other nucleic acid molecules. Example of a sequencing reaction that may be used include capillary sequencing, next generation sequencing, Sanger sequencing, sequencing by synthesis, single molecule nanopore sequencing, sequencing by ligation, sequencing by hybridization, sequencing by nanopore current restriction, or a combination thereof. Sequencing by synthesis may comprise reversible terminator sequencing, processive single molecule sequencing, sequential nucleotide flow sequencing, or a combination thereof. Sequential nucleotide flow sequencing may comprise pyrosequencing, pH-mediated sequencing, semiconductor sequencing or a combination thereof. The sequencing reactions may comprise whole genome sequencing, whole exome sequencing, low-pass whole genome sequencing, targeted sequencing, methylation- aware sequencing, enzymatic methylation sequencing, bisulfite methylation sequencing. The sequencing reaction may be a transcriptome sequencing, mRNA-seq, totalRNA-seq, smallRNA- seq, exosome sequencing, or combinations thereof. Combinations of sequencing reactions may be used in the methods described elsewhere herein. For example, a sample may be subjected to whole genome sequencing and whole transcriptome sequencing. As the samples may comprise multiple types of nucleic acids (e.g. RNA and DNA), sequencing reactions specific to DNA or RNA may be used such to obtain sequence reads relating to the nucleic acid type.
[0060] The sequencing of nucleic acids may generate sequencing read data. The sequencing reads may be processed such to generate data of improved quality. The sequencing reads may be generated with a quality score. The quality score may indicate an accuracy of a sequence read or a level or signal above a nose threshold for a given base call. The quality scores may be used for filtering sequencing reads. For example, sequencing reads may be removed that do not meet a particular quality score threshold. The sequencing reads may be processed such to generate a consensus sequence or consensus base call. A given nucleic acid (or nucleic acid fragment) may be sequenced and errors in the sequence may be generated due to reactions prior or during sequencing. For example, amplification or PCR may generate error in amplicons such that the sequences are not identical to a parent sequence. Using sample barcodes or molecular barcodes, error correction may be performed. Error correction may include identifying sequence reads that do not corroborate with other sequences from a same sample or same original parent molecules. The use of barcodes may allow the identification or a same parent or sample. Additionally, the sequence reads may be processed by performing single strand consensus calling or double stranded consensus call, thereby reducing or suppressing error.
[0061] The methods as disclosed herein may comprise determining allele frequency or other cancer related metric. The methods may comprise a mutant allele frequency of a set of somatic mutation among a set of biomarkers. The mutant allele frequency may be used to determine a circulating tumor DNA (ctDNA) fraction of a cancer of a subject. A plasma tumor mutational burden (pTMB) of a cancer of the subject may be determined based at least in part on the set of mutant allele frequencies. Detection of microsatellite instability may also be used to determine the presence or absence of a cancer or cancer metric. Methylation states may be determined using methods described herein and may be used to identify a presence of a cancer or cancer parameter. [0062] The methods as disclosed herein may comprise determining one or more expression levels of one or more genes. For example, libraries subjected to depletion may be sequenced (e.g., whole transcriptome sequencing) to generated sequencing reads. The sequencing reads may be processed to determine an expression level of a gene. The expression levels may be compared against a reference expression level. For example, a non-cancer control or reference sample may be analyzed and expression levels for genes of the non-cancer control may be obtained. The expression levels may be compared and a sample may be identified as having an upregulation or down regulation of gene expression of one or more genes. Based at least on the expression levels, the subject may be identified as having a cancer.
[0063] The expression levels may be the expression levels of genetic aberrations of one or more genes. For example, the expression level may be of a fusion gene or rearrangement. The expression levels may be the expression levels of one or more splice variants. Genes may be splice differently or incorrectly and the splice variants may be associated with the presence of cancer. Detection of presence or the expression level of the splice variant may indicate the presence of cancer. For example, splice variants in the androgen receptor (AR) may be indicative of cancer. The splice variant may be an AR-V form such as AR-V1, AR-V7, or AR-V12.
[0064] In various aspects, sets of biomarkers are processed and data corresponding to the biomarkers are generated. The sets of biomarkers may comprise quantitative measures from a set of cancer-associated genomic loci. The cancer-associated genomic loci may correspond to a set of genes.
[0065] The sets of biomarkers may correspond to genetic aberration of a genetic locus. The genetic aberration may a tumor associated alteration. The genetic aberration may be copy number alterations (CNAs), copy number losses (CNLs), copy number gains, single nucleotide variants (SNVs), insertions or deletions (indels), fusion genes, and rearrangements. The set of biomarkers may comprise splice variants. The set of biomarkers may be identified in a variety of nucleic acid types. For example, the tumor associated alteration may be identified in cfDNA or cfRNA. The tumor associated alteration may comprise changes in allelic expression, or gene expression. Methods and systems disclosed herein may allow for gene expression profiling and identification of changes to the expression levels of gene. [0066] In various aspects, the methods may comprise identifying the presence of a cancer or a cancer parameter. The methods may comprises determining a probability or a likelihood of the presence of cancer or a cancer parameter. For example, instead of a binary output indicating a presence or absence, an output may be generated that indicates a probability that subject has cancer. This probability may be determined based on algorithms as described elsewhere herein. Similarly, a probability or likely of response to a particular treatment or a probability of relapse may be outputted.
[0067] The increased cfRNA transcriptional expression of drug resistance-related gene alterations or splicing variants may serve as predictive biomarker, identifying the response or resistance to therapy. Specifically, in the case of prostate cancer, the increased cfRNA transcriptional expression of drug resistance-related AR mutations such as W742C/L and F877L or splicing variants such as AR-V7 or AR-V9, may serves as predictive biomarker, identifying the response or resistance to anti-androgen therapy.
[0068] Compared to the use of cfDNA, blood ctRNA-based variant detection (including fusion) can be used to be more effectively to identify known and novel variants especially fusions in cancer. For instance, blood cfRNA-based detection of TMPRSS2-ERG may provide higher detection sensitivity in prostate cancer.
[0069] The increased ratio of blood-based cancer variants versus urine-based cancer variants may serve as a prognostic biomarker in GU cancers, indicating the disease aggressiveness and guide clinical treatment decision making. Specifically, in the case of muscle-invasive bladder cancer (MIBC), the increased level of blood-based cancer variants versus urine-based cancer variants may serve as a prognostic biomarker in patients with MIBC and provide evidence for clinical decision making. These cancer variants may include ctDNA, cfRNA, microRNA, methylation, among others.
[0070] Together with cfDNA-based variant detection through genomics and epigenomics, cfRNA and/or microRNA can also be used either alone or in combination with genomic and epigenomic biomarkers for minimal residual disease (MRD) detection, therapy monitoring and early cancer detection.
[0071] In various aspects, the sets of biomarkers are processed using an algorithm. The algorithm may be a trained algorithm. The trained algorithms may use the sets of biomarkers as an input and generate an output regarding the presence or absence of a cancer. The output may be specific to a type of cancer or subtype of cancer. For example, the output may indicate the presence of a castrate-resistant prostate cancer. For example, the expression levels of one or more genes may be processed using an algorithm (e.g., a trained algorithm). The algorithm (e.g., a trained algorithm) may use to the expression levels of one or more genes to generate an output regarding the presence or absence of a cancer. For example, a trained algorithm may be trained on expression levels of healthy samples, non-cancer samples, or cancer samples. The trained algorithm may then be able to analyze a sample and expression levels of one or more genes in the sample to determine that the subject has cancer.
[0072] The trained algorithm may be trained on multiple samples. For example, the trained algorithm may be trained using at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 300, 400, 500 , 600 ,700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, or more independent training samples. The trained algorithm may be trained using no more 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 300, 400, 500 , 600 ,700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, or less, independent training samples. The training samples may be associated with a presence or an absence of the cancer.
The training samples may be associated with a relapse of cancer. The training samples may compromise health or a non-cancer sample. The training samples may be associated with cancer that is resistant to a particular drug or treatment. An individual training sample may be positive for a particular cancer. An individual training sample may be negative for a particular cancer. By using training samples, the trained algorithm may be able to detect a cancer, determine a probability of recurrence or relapse of a cancer, or determine if a cancer comprises a set of biomarkers may be resistant to a treatment. The training sample may be associated with additional clinical health data of a subject. For example, additional clinical health data may comprise the gender, weight, height, or levels of metabolites or antibodies in a subject.
Additional clinical health data may comprise indication of other diseases, disorders, or diseases conditions.
[0073] The trained algorithms may be trained using multiple sets of training samples. The sets may comprise training samples as described elsewhere herein. For example, the training may be performed using a first set of independent training samples associated with a presence of the cancer and a second set of independent training samples associated with an absence of the cancer. Similarly, a first set may be associated with relapse and a second sample may be associated with the absence of relapse.
[0074] The trained algorithm may also process additional clinical health data of the subject. For example, additional clinical health data may comprise the gender, weight, height, or levels of metabolites or antibodies in a subject. Additional clinical health data may comprise indication of other diseases, disorders, or diseases conditions that the subject may suffer from. By using the additional clinical health data, in conjunction with the biomarkers, the trained algorithm may output a presence or absences of cancer, probability of relapse, or resistance to drug treatment, that may be different from the output of an algorithm that does not process additional clinical health.
[0075] The trained algorithm may be an unsupervised machine learning algorithm. For example, the unsupervised machine learning algorithm may utilize cluster analysis to identify attributes of interest. The trained algorithm may be a supervised machine learning algorithm. For example, the algorithm may be inputted with training data such to generate an expected or desired output. The supervised learning algorithm may comprise a deep learning algorithm, a support vector machine (SVM), a neural network, or a Random Forest. Via the machine learning algorithm, the trained algorithm may be able to identify relationships of biomarkers to a particular cancer prognosis or diagnosis. Without the trained algorithm, it may otherwise be difficult to identify relationships of the biomarkers to accurately identify the presence of a cancer or other parameters associated with the cancer.
[0076] In various aspects, the systems and methods may comprise an accuracy, sensitivity, or specificity of detection of the cancer or a parameter of the cancer. For example, the methods or systems may comprise detecting the presence or the absence of cancer (or the presence of a parameter of the cancer, such as recurrence, relapse, or drug resistance) in the subject at an accuracy of at least about 60%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or at least about 99%. The methods or systems may comprise detecting the presence or the absence of cancer (or the presence of a parameter of the cancer, such as recurrence, relapse, or drug resistance) in the subject at a sensitivity of at least about 60%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or at least about 99%. The methods or systems may comprise detecting the presence or the absence of cancer (or the presence of a parameter of the cancer, such as recurrence, relapse, or drug resistance) in the subject at a specificity of at least about 60%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or at least about 99%. The methods or systems may comprise detecting the presence or the absence of cancer (or the presence of a parameter of the cancer, such as recurrence, relapse, or drug resistance) in the subject at a positive predictive value of at least about 60%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or at least about 99%. The methods or systems may comprise detecting the presence or the absence of cancer (or the presence of a parameter of the cancer, such as recurrence, relapse, or drug resistance) in the subject at a negative predictive value of at least about 60%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or at least about 99%.
[0077] In various aspects as described herein, a clinical intervention or a therapy may be identified at least in part based on the identification of the presences of cancer, or the presence of a parameter of cancer. The clinical intervention may be a plurality of clinical interventions. The clinical intervention may be selected from a plurality of clinical interventions. The clinical intervention may be a surgical tumor removal (e.g., surgical resection), chemotherapy, radiotherapy, immunotherapy, adjuvant therapy, neoadjuvant therapy, androgen deprivation therapy, or a combination thereof. In some cases, the clinical interventions may be administered to the subject. After administration of the clinical intervention, a sample may be obtained or derived from the subject such to monitor the cancer or cancer parameters. As such, the methods and systems disclosed herein may be performed iteratively such that monitoring of a cancer can be performed. Additionally, by performing the methods or systems iteratively, therapies or clinical interventions may be updated based at least in part on the results of the methods. The monitoring of the cancer may include an assessment as well as a difference in assessment from a previously generated assessment. The difference in an assessment of cancer in the subject among a plurality of time points (or samples) may be indicative of one or more clinical indications such as a diagnosis of the cancer, a prognosis of the cancer, or an efficacy or non-efficacy of a course of treatment for treating the cancer of the subject. The prognosis may comprise expected progression-free survival (PFS), overall survival (OS), or other metrics relating the severity or survivability of a cancer.
[0078] Computer systems
[0079] The present disclosure provides computer systems that are programmed to implement methods of the disclosure. FIG. 10 shows a computer system 1001 that is programmed or otherwise configured to perform analysis or steps of the methods, for example determine a likelihood of the presence of a cancer based at least in part on a set of biomarkers of an individual or run an algorithm. The computer system 1001 can regulate various aspects of methods and systems of the present disclosure, such as, for example, perform an algorithm, input training data, analyze sets of biomarkers, or output a result for the user as to the presence or absence of cancer. The computer system 1001 can be an electronic device of a user or a computer system that is remotely located with respect to the electronic device. The electronic device can be a mobile electronic device.
[0080] The computer system 1001 includes a central processing unit (CPU, also “processor” and “computer processor” herein) 1005, which can be a single core or multi core processor, or a plurality of processors for parallel processing. The computer system 1001 also includes memory or memory location 1010 (e.g., random-access memory, read-only memory, flash memory), electronic storage unit 1015 (e.g., hard disk), communication interface 1020 (e.g., network adapter) for communicating with one or more other systems, and peripheral devices 1025, such as cache, other memory, data storage and/or electronic display adapters. The memory 1010, storage unit 1015, interface 1020 and peripheral devices 1025 are in communication with the CPU 1005 through a communication bus (solid lines), such as a motherboard. The storage unit 1015 can be a data storage unit (or data repository) for storing data. The computer system 1001 can be operatively coupled to a computer network (“network”) 1030 with the aid of the communication interface 1020. The network 1030 can be the Internet, an internet and/or extranet, or an intranet and/or extranet that is in communication with the Internet. The network 1030 in some cases is a telecommunication and/or data network. The network 1030 can include one or more computer servers, which can enable distributed computing, such as cloud computing. The network 1030, in some cases with the aid of the computer system 1001, can implement a peer-to- peer network, which may enable devices coupled to the computer system 1001 to behave as a client or a server.
[0081] The CPU 1005 can execute a sequence of machine-readable instructions, which can be embodied in a program or software. The instructions may be stored in a memory location, such as the memory 1010. The instructions can be directed to the CPU 1005, which can subsequently program or otherwise configure the CPU 1005 to implement methods of the present disclosure. Examples of operations performed by the CPU 1005 can include fetch, decode, execute, and writeback. [0082] The CPU 1005 can be part of a circuit, such as an integrated circuit. One or more other components of the system 1001 can be included in the circuit. In some cases, the circuit is an application specific integrated circuit (ASIC).
[0083] The storage unit 1015 can store files, such as drivers, libraries and saved programs. The storage unit 1015 can store user data, e.g., user preferences and user programs. The computer system 1001 in some cases can include one or more additional data storage units that are external to the computer system 1001, such as located on a remote server that is in communication with the computer system 1001 through an intranet or the Internet.
[0084] The computer system 1001 can communicate with one or more remote computer systems through the network 100. For instance, the computer system 1001 can communicate with a remote computer system of a user (e.g., a medical professional or patient). Examples of remote computer systems include personal computers (e.g., portable PC), slate or tablet PC’s (e.g., Apple® iPad, Samsung® Galaxy Tab), telephones, Smart phones (e.g., Apple® iPhone, Android- enabled device, Blackberry®), or personal digital assistants. The user can access the computer system 1001 via the network 1030.
[0085] Methods as described herein can be implemented by way of machine (e.g., computer processor) executable code stored on an electronic storage location of the computer system 1001, such as, for example, on the memory 1010 or electronic storage unit 1015. The machine executable or machine readable code can be provided in the form of software. During use, the code can be executed by the processor 1005. In some cases, the code can be retrieved from the storage unit 1015 and stored on the memory 1010 for ready access by the processor 1005. In some situations, the electronic storage unit 1015 can be precluded, and machine-executable instructions are stored on memory 1010.
[0086] The code can be pre-compiled and configured for use with a machine having a processer adapted to execute the code, or can be compiled during runtime. The code can be supplied in a programming language that can be selected to enable the code to execute in a precompiled or as-compiled fashion.
[0087] Aspects of the systems and methods provided herein, such as the computer system 1001, can be embodied in programming. Various aspects of the technology may be thought of as “products” or “articles of manufacture” typically in the form of machine (or processor) executable code and/or associated data that is carried on or embodied in a type of machine readable medium. Machine-executable code can be stored on an electronic storage unit, such as memory (e.g., read-only memory, random-access memory, flash memory) or a hard disk. “Storage” type media can include any or all of the tangible memory of the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide non-transitory storage at any time for the software programming. All or portions of the software may at times be communicated through the Internet or various other telecommunication networks. Such communications, for example, may enable loading of the software from one computer or processor into another, for example, from a management server or host computer into the computer platform of an application server. Thus, another type of media that may bear the software elements includes optical, electrical and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links. The physical elements that carry such waves, such as wired or wireless links, optical links or the like, also may be considered as media bearing the software. As used herein, unless restricted to non-transitory, tangible “storage” media, terms such as computer or machine “readable medium” refer to any medium that participates in providing instructions to a processor for execution.
[0088] Hence, a machine readable medium, such as computer-executable code, may take many forms, including but not limited to, a tangible storage medium, a carrier wave medium or physical transmission medium. Non-volatile storage media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s) or the like, such as may be used to implement the databases, etc. shown in the drawings. Volatile storage media include dynamic memory, such as main memory of such a computer platform. Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system. Carrier-wave transmission media may take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications. Common forms of computer-readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a ROM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer may read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution. [0089] The computer system 1001 can include or be in communication with an electronic display 1035 that comprises a user interface (UI) 1040 for providing, for example, an input of biomarkers or sequencing data, or a visual output relating to a detection, diagnosis, or prognosis. Examples of UI’s include, without limitation, a graphical user interface (GUI) and web-based user interface.
[0090] Methods and systems of the present disclosure can be implemented by way of one or more algorithms. An algorithm can be implemented by way of software upon execution by the central processing unit 1005. The algorithm can, for example, determine a presence or absence of a cancer or cancer parameter based on a set of input sequencing data from a sample derived from a subject.
EXAMPLES
[0091] Example 1: Sequencing of transcripts after use of depletion probes
[0092] Using methods and compositions of the present disclosure, sequencing of cfRNA was performed demonstrating the advantages of the methods and compositions of the present disclosure. Samples of blood comprising cfRNA are subjected to library preparation. cfRNA was subjected to a reverse transcription to generate cDNA. Sequencing adapters are ligated to the cDNA and PCR using primers specific to the sequencing adapters is performed to amplify the cDNA. The resulting cDNA is split into four equal parts to generate four identical libraries. Each library is subjected to an exome panel and different amount of depletion probes: no depletion probes, IX depletion probes, 10X depletion probes, and 100X depletion probes. The depletion probes were able to bind HBB, HBA2, HBA1, HBD, B2M, and CD74. The libraries are then subjected to a pulldown and wash to remove unbound nucleic acids and nucleic acid bound to the depletion probes. The resulting libraries were then sequenced and analyzed. Table 1 shows the resulting transcript counts for the genes that were targeted by the depletion probes. Specifically, with increasing amounts of depletion probes, the TPM (transcripts per million) were shown to decrease, indicating the successful removal of transcripts from the final sequencing reads. Transcripts that were not targeted by depletion probes (e.g., ACTB, TMSB4X, and FTH1), were shown to not maintain the number of transcripts and not be depleted.
[0093] Table 1: Transcripts per Million for samples with or without depletion
Figure imgf000028_0001
Figure imgf000029_0001
[0094] FIGs. 2A-2C show plots of transcripts per million (TPM) for a sample without depletion and samples with depletion. Each gene is shown as a point with the y axis indicating the TPM for a sample of IX (FIG. 2A), 10X (FIG. 2B), or 100X (FIG. 2C) of depletion probes, with the x- axis indicating the TPM for a sample with no depletion probes. The plots show a linear relationship with a slope near to 1, demonstrating that the depletion probes generally do not affect the sample processi.ng for transcripts that are not targeted with the depletion probes.
[0095] Example 2: Depletion of transcripts from whole transcriptome libraries of cfRNA [0096] Samples of plasma comprising cfRNA are subjected to library preparation. cfRNA was subjected to a reverse transcription to generate cDNA. Sequencing adapters were ligated to the cDNA and PCR using primers specific to the sequencing adapters was performed to amplify the cDNA. The library was then split into two identical libraries, with depletion probes added to one of the libraries. The depletion probes were able to bind HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, RPS21.
[0097] The libraries were then subjected to a pulldown and wash to remove unbound nucleic acids and nucleic acid bound to the depletion probes. The resulting libraries were then amplified and sequenced with NovaSeq X Plus using 2x150 bp paired end reads and analyzed. FIG. 3 A shows the expression levels of genes depleted in the library with or without the depletion. As shown, the library with no depletion shows a large number of transcripts that correspond to the genes that are to be depleted. In contrast, the library that was subject to depletion shows little to no reads corresponding to the depletion probes. This indicates the successful removal of transcripts from the final sequencing reads.
[0098] This depletion allows for detection of 10-15% more different transcripts in the plasma samples. By depleting the abundant transcripts in the samples, and reducing the sequencing throughput that is allocated to the abundant samples, the other transcripts are able to be sequenced. FIG. 3B shows this increase in transcripts. Three different samples were used and generated 3 different libraries. Each of these libraries were divided in half, in which one half was subjected to depletion. Each of the libraries demonstrated an increase in number of unique transcripts for the half subjected to depletion.
[0099] Moreover, there is minimal change in the overall gene expression profile of libraries with depletion compared to without. When comparing the WTS library that has not been subjected to depletion probes with the WTS library that has been subjected to depletion probes, there is a high concordance (Pearson R = .963). FIG. 4 shows a chart of this concordance. This indicates that the depletion probes remove unwanted transcripts without adverse effects on the library as a whole. [00100] Example 3: cfRNA expression profiling of clinical samples
[00101] 20 plasma samples derived from patients with cancer were obtained. These samples were subjected to cfDNA sequencing assays to identify copy number variations or fusions. Separately, the cfRNA was obtained from theses samples in order to conduct cfRNA profiling. The samples include multiple cancer types such as prostate cancer and other carcinomas. In parallel, 8 plasma samples from healthy subjects were also assayed for copy number variations and fusions, and a library for cfRNA sequencing was also prepared. cfRNA libraries were then subjected to depletion via depletion probes, amplified and then sequenced. FIG. 5 shows the exonic rates of the cfRNA WTS libraries, demonstrating greater than 90% exonic reads (i.e., reads mapped to exon regions) in both libraries from healthy and cancer samples. FIGS. 6A-6E shows the numbers of transcripts detected for the various genes that were targeted with the depletion probes. Across every targeted gene there was a significantly lower number of reads (e.g., transcripts per million) in the libraries subjected to depletion vs libraries that were not subjected to depletion.
[00102] The resulting sequencing data obtained from the libraries subjected to depletion were analyzed and healthy samples and cancer samples were compared. Differential clustering of gene between healthy and cancer samples were observed, with characteristic cancer genes, such as KLK3, TP53, TMPRSS2, and DSC3, contributing significantly to the differential clustering. FIG. 7 shows data relating to differentially expressed genes in cancer samples (Padj < 0.01 as determined be DESeq2) as compared to healthy normal samples. It shows there are a significant number of genes that are down regulated or upregulated in the cancer sample as compared to the normal control. Principal component analysis also corroborated these results, showing a clear separation between normal and cancer samples (FIG. 8).
[00103] Analysis of the sequencing reads from the libraries subjected to depletion were also analyzed for alternative splice variants. Different androgen receptor splice variants (referred to as AR-V forms) were analyzed for the normal and cancer samples. FIG. 9 shows a chart representing the expression of AR-V forms and full-length AR (AR-FL) in various samples. In normal samples, low levels or no expression of AR-V forms were observed. In contrast, the cancer samples showed higher and more dynamic expression of the AR-V forms. The sequencing reads were also analyzed for gene fusion RNA. For example, TMPRSS2::ERG RNA was identified in some cancer clinical samples.
[00104] As the samples were also subjected to cfDNA analysis and sequencing, genomic changes may be correlated with the expression profiles. Copy number status (e.g., gain or loss) was analyzed for each sample using cfDNA sequencing and was compared to the cfRNA WTS results from the depleted cfRNA libraries.
[00105] Table 2 shows results of select genes of this analysis. As shown in Table 2, copy number gain was correlated with an upregulation of genes, whereas copy number loss was correlated with a downregulation of genes.
[00106] Table 2. Correlation of CNV in cfDNA sequencing and differential expression analysis in cfRNA WTS
Figure imgf000032_0001
[00107] Overall, these assays demonstrate that the depletion probes can be used to generate cfRNA libraries that accurately measure expression profiles, identify splice variants and gene fusions and additionally can be used in conjunction with cfDNA sequencing assays. These assays allow for improved liquid biopsy-based cancer diagnostics.
[00108] While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. It is not intended that the invention be limited by the specific examples provided within the specification. While the invention has been described with reference to the aforementioned specification, the descriptions and illustrations of the embodiments herein are not meant to be construed in a limiting sense. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. Furthermore, it shall be understood that all aspects of the invention are not limited to the specific depictions, configurations or relative proportions set forth herein which depend upon a variety of conditions and variables. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is therefore contemplated that the invention shall also cover any such alternatives, modifications, variations or equivalents. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.

Claims

CLAIMS WHAT IS CLAIMED IS:
1. A method of preparing a sequencing library, the method comprising:
(a) obtaining a biological sample derived from a subject, wherein the biological sample comprises nucleic acids molecules;
(b) contacting the biological sample with (i) a first set of probes comprising a pulldown moiety and (ii) a second set of probes without a pulldown moiety, to anneal the first set of probes and the second set of probes to at least a one or more subsets of the nucleic acid molecules, thereby generating (A) a first set of annealed nucleic acids comprising probes from the first set of probes and a first subset of the nucleic acid molecules annealed thereto, and (B) a second set of annealed nucleic acids comprising probes from the second set of probes and second subset of nucleic acid molecules annealed thereto, wherein one or more probes of the first set of probes and one or more probes of the second set of probes are complementary to a same target; and
(c) separating the first set of annealed nucleic acids and the second set of annealed nucleic acids.
2. The method of claim 1, wherein the separating further comprises performing a pulldown reaction at least in part by contacting the pulldown moiety with one or more pulldown moiety binding agents, thereby selectively binding the first set of annealed nucleic acids.
3. The method of claim 2, wherein the pulldown moiety binding agents comprise streptavidin, or a functional derivative thereof.
4. The method of claim 3, wherein the pulldown moiety binding agents are attached to a support.
5. The method of claim 4, wherein the support is a bead.
6. The method of claim 5, wherein the bead is a magnetic bead.
7. The method of any one of claims 2-6, wherein the separating comprises removing the second set of annealed nucleic acids.
8. The method of claim 7, wherein the removing comprises subjecting the second set of annealed nucleic acids to a wash buffer.
9. The method of any one of claims 1-8, wherein the pulldown moiety comprises biotin, or a functional derivative thereof.
10. The method of any one of claims 1-9, further comprising removing the second set of annealed nucleic acids.
11. The method of any one of claims 1-10, further comprising, prior to (b), amplifying the nucleic acids molecules.
12. The method of any one of claims 1-11, further comprising, subsequent to (c), amplifying the first set of annealed nucleic acids, thereby generating amplified nucleic acids.
13. The method of claim 11 or 12, wherein the amplifying comprises universal amplification or targeted amplification.
14. The method of claim 11 or 12, wherein the amplifying comprises polymerase chain reaction (PCR).
15. The method of claim 14, wherein the PCR comprises digital PCR, droplet PCR, or digital droplet PCR.
16. The method of claim 11 or 12, wherein the amplifying comprises reverse transcription.
17. The method of any one of claims 12-16, further comprising subjecting the amplified nucleic acids, or derivatives thereof, to a sequencing reaction thereby generating a set of sequencing reads.
18. The method of claim 17, wherein the sequencing reaction comprises whole exome sequencing.
19. The method of claim 17 or 18, wherein the sequencing reaction comprises targeted sequencing.
20. The method of any of claims 17-19, further comprising analyzing the set of sequencing reads to determine a presence or an absence of a disease, disorder, or condition of the subject.
21. The method of claim 20, wherein the disease, disorder, or condition comprises cancer.
22. The method of claim 21, further comprising detecting the presence of the cancer of the subject.
23. The method of claim 22, further comprising, responsive to detecting the presence of the cancer of the subject, administering a cancer therapy to the subject.
24. The method of claim 23, wherein the cancer therapy is selected from the group consisting of a surgical tumor removal, a chemotherapy, a radiation therapy, a targeted therapy, an immunotherapy, and a combination thereof.
25. The method of any one of claims 17-24, further comprising analyzing the set of sequencing reads to determine one or more expression levels of one or more genes.
26. The method of claim 25, further comprising comparing the one or more expression levels to a reference expression levels.
27. The method of claim 25 or 26, further comprising comparing the one or more expression levels to an expression levels derived from a non-cancer control.
28. The method of any one of claims 25-27, wherein the one or more expression levels comprise expression levels of a splice variant.
29. The method of claim 28, wherein the splice variant comprises a splice variant of androgen receptor (AR).
30. The method of claim 29, wherein the splice variant of AR is an AR-V1, AR-V7, AR-V12, or combination thereof.
31. The method of any one claims 25-30, wherein the one or more expression levels comprise expression levels of a gene fusion or rearrangement.
32. The method of any one of claims 1-31, further comprising, prior to (b), attaching sequencing adapters to the cell-free nucleic acids, or derivatives thereof.
33. The method of claim 32, wherein the attaching comprises ligation.
34. The method of any one of claims 1-33, wherein the first set of probes are complementary to one or more exomes.
35. The method of claim 34, wherein the second set of probes are complementary to a subset of the one or more exomes.
36. The method of any one of claims 1-35, wherein the second set of probes are complementary to one or more genes selected from the group consisting of HBA1, HBA2, HBB, HBD, B2M, CD74, and RN7SL.
37. The method of any one of claims 1-36, wherein the second set of probes are complementary to one or more genes selected from the group consisting of HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, and RPS21.
38. The method of any one of claims 1-37, wherein one or more probes of the first set of probes and one or more probes of the second set of probes comprise a same sequence.
39. The method of any one of claims 1-38, wherein a probe of the first set of probe and probe of the second set of probe comprise different sequences and target a same exome.
40. The method of any one of claims 1-39, wherein the second set of probes comprises one or more probes configured to bind to transcripts that are deemed unwanted or uninformative.
41. The method of any one of claims 1-40, wherein the nucleic acids comprises cell free RNA (cfRNA).
42. The method of any one of claims 1-41, wherein the biological sample is selected from the group consisting of: a cell-free deoxyribonucleic acid (cfDNA) sample, a cell-free ribonucleic acid (cfRNA) sample, a plasma sample, a serum sample, a buffy coat sample, a peripheral blood mononuclear cell (PBMC) sample, a red blood cell sample, a urine sample, a urine cell pellet sample, a saliva sample, tissue biopsy, pleural fluid sample, peritoneal fluid sample, amniotic fluid sample, cerebrospinal fluid sample, bile sample, lymphatic fluid sample, sweat sample, tear sample, semen sample, or any derivative thereof, and any combination thereof.
43. The method of any one of claims 1-42, wherein the biological sample comprises the plasma sample.
44. The method of any one of claims 1-43, wherein the biological sample comprise a blood sample.
45. The method of any one of claims 1-44, wherein the biological sample is obtained or derived from the subject using an ethylenediaminetetraacetic acid (EDTA) collection tube, a cell-free RNA collection tube, or a cell-free deoxyribonucleic acid (DNA) collection tube, other blood collection tube, and CTC collection tubes.
46. The method of any one of claims 1-45, wherein the sample further comprise cell-free DNA (cfDNA) molecules.
47. The method of claim 46, further comprising amplifying the cfDNA molecules thereby generating amplified DNA molecules.
48. The method of claim 47, further comprising subjecting the amplified DNA molecules, or derivatives thereof, to a DNA sequencing reaction thereby generating a set of DNA sequencing reads.
49. The method of claim 48, wherein the DNA sequencing reaction comprises a whole genome sequencing reaction.
50. The method of claim 49, further comprising processing the DNA sequencing reads to determine a copy number parameter.
51. The method of claim 50, further comprising, based at least on the copy parameter, identifying the subject as having a copy number gain or a copy number loss at loci as compared to a reference copy number.
52. A composition comprising:
(i) a first set of probes comprising a pulldown moiety, and
(ii) a second set of probes without a pulldown moiety, wherein the first set of probes are complementary to one or more human exomes, and wherein the second set of probes comprises one or more probes complementary to HBA1, HBA2, HBB, HBD, B2M, CD74, and RN7SL.
53. A composition comprising: (i) a first set of probes comprising a pulldown moiety, and
(ii) a second set of probes without a pulldown moiety, wherein the first set of probes are complementary to one or more human exomes, and wherein the second set of probes comprises one or more probes complementary to HBB, HBA1, HBA2, HBD, B2M, CD74, ACTB, PF4, PPBP, RPS12, RPS11, RPS23, RPS20, RPL3, RPS18, RPS27, RPS24, and RPS21.
PCT/US2024/051531 2023-10-17 2024-10-16 Methods and compositions for generation of sequencing libraries Pending WO2025085495A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363591012P 2023-10-17 2023-10-17
US63/591,012 2023-10-17

Publications (1)

Publication Number Publication Date
WO2025085495A1 true WO2025085495A1 (en) 2025-04-24

Family

ID=95448882

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2024/051531 Pending WO2025085495A1 (en) 2023-10-17 2024-10-16 Methods and compositions for generation of sequencing libraries

Country Status (1)

Country Link
WO (1) WO2025085495A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150354000A1 (en) * 2012-12-28 2015-12-10 MAX-PLANCK-Gesellschaft zur Förderung der Wissenschaften e.V. Method of analysis of composition of nucleic acid mixtures
WO2020141144A1 (en) * 2018-12-31 2020-07-09 Qiagen Gmbh Enrichment method for sequencing
US20220356203A1 (en) * 2015-07-28 2022-11-10 Caris Science, Inc. Therapeutic oligonucleotides

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150354000A1 (en) * 2012-12-28 2015-12-10 MAX-PLANCK-Gesellschaft zur Förderung der Wissenschaften e.V. Method of analysis of composition of nucleic acid mixtures
US20220356203A1 (en) * 2015-07-28 2022-11-10 Caris Science, Inc. Therapeutic oligonucleotides
WO2020141144A1 (en) * 2018-12-31 2020-07-09 Qiagen Gmbh Enrichment method for sequencing

Similar Documents

Publication Publication Date Title
JP7119014B2 (en) Systems and methods for detecting rare mutations and copy number variations
US20220351805A1 (en) Systems and methods for detecting cellular pathway dysregulation in cancer specimens
JP7696975B2 (en) Tumor mutation burden normalization
Sinicropi et al. Whole transcriptome RNA-Seq analysis of breast cancer recurrence risk using formalin-fixed paraffin-embedded tumor tissue
US20210358626A1 (en) Systems and methods for cancer condition determination using autoencoders
WO2018151601A1 (en) Swarm intelligence-enhanced diagnosis and therapy selection for cancer using tumor- educated platelets
CN113748467B (en) Allele frequency-based loss-of-function computational model
US20250122563A1 (en) Methods and compositions of nucleic acid molecule enrichment for sequencing
Wang et al. Terminal modifications independent cell-free RNA sequencing enables sensitive early cancer detection and classification
US20240279745A1 (en) Systems and methods for multi-analyte detection of cancer
JP2025028203A (en) Correction of deamination-induced sequence errors
Donfack et al. Mass spectrometry-based cDNA profiling as a potential tool for human body fluid identification
JP2025013900A (en) Methods and systems for detecting allelic imbalance in cell-free nucleic acid samples - Patents.com
WO2025085495A1 (en) Methods and compositions for generation of sequencing libraries
JP2021536232A (en) Methods and systems for detecting contamination between samples
JP2025521123A (en) Systems and methods for cancer treatment monitoring - Patents.com
JP2023527761A (en) Nucleic acid sample enrichment and screening methods
EP4665871A2 (en) Systems and methods for minimal residual disease analysis
WO2024173242A2 (en) Systems and methods for minimal residual disease analysis
EP4599091A1 (en) Systems and methods for multi-analyte detection of cancer
WO2025160265A1 (en) Systems and methods of fragmentomics analysis in cancer
WO2025213034A1 (en) Systems and methods for multiple biomarker analysis in cancer
WO2023220602A1 (en) Detecting degradation based on strand bias
CN118098339A (en) Application of marker in gastric cancer immune combined chemotherapy, construction method of detection model and detection device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24880490

Country of ref document: EP

Kind code of ref document: A1