[go: up one dir, main page]

WO2025038512A1 - Methods for detecting pcr chimeras - Google Patents

Methods for detecting pcr chimeras Download PDF

Info

Publication number
WO2025038512A1
WO2025038512A1 PCT/US2024/041863 US2024041863W WO2025038512A1 WO 2025038512 A1 WO2025038512 A1 WO 2025038512A1 US 2024041863 W US2024041863 W US 2024041863W WO 2025038512 A1 WO2025038512 A1 WO 2025038512A1
Authority
WO
WIPO (PCT)
Prior art keywords
pcr
pools
end barcode
chimeras
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2024/041863
Other languages
French (fr)
Inventor
Arthur Fridman
Karin VROOM
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Merck Sharp and Dohme LLC
Original Assignee
Merck Sharp and Dohme LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Merck Sharp and Dohme LLC filed Critical Merck Sharp and Dohme LLC
Publication of WO2025038512A1 publication Critical patent/WO2025038512A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Definitions

  • PCR Polymerase chain reaction
  • PCR chimeras generated by multiple amplification of a sample can contribute to inaccurate interpretation of sequencing data.
  • samples are inherently limited in nucleic acid template quantify, such samples may require the use of multiple rounds of PCR. Many rounds of PCR can generate PCR chimeras that do not represent actual sequences from the samples.
  • PCR chimera detection involves algorithms (e.g., PyroNoise, UChime) relying on one or more databases of chimera and/or chimera-free sequences.
  • algorithms e.g., PyroNoise, UChime
  • the present disclosure is based, in part, on our finding that PCR chimeras can be identified without depending on any database.
  • methods for detecting PCR chimeras that include steps of splitting a solution of different nucleic acids (e.g., homologous nucleic acid molecules intended to be amplified) into multiple pools and attaching a unique pair of barcodes (unique sequence identifiers) to the nucleic acids.
  • the method comprises steps of performing one or more PCR amplifications on the sample, sequencing products of the PCR amplifications, and detecting the PCR chimeras by the presence of a pairing of a 5’-end barcode and 3’-end barcode that is different from that in each of the plurality of pools by the absence or reduction of the unique pair of the barcodes.
  • methods for detecting PCR chimeras that include steps of (i) preparing a sample by (a) splitting a solution comprising different nucleic acids into a plurality of pools, (b) attaching a 5 ’-end barcode and a 3 ’-end barcode to the nucleic acids, wherein each of the plurality of pools has a unique pair of the 5 ’-end barcode and the 3 ’-end barcode, and (c) mixing the plurality of pools, thereby making the sample, (ii) performing one or more PCR amplifications on the sample; (iii) sequencing products of the PCR amplifications; and (iv) detecting the PCR chimeras by analyzing sequence pairs of the 5 '-end barcode and the 3 ’-end barcode of each of the products, wherein the sequence pairs of the 5 '-end barcode and the 3 ’-end barcode of the PCR chimeras are different from that in each of the plurality of
  • methods for sequencing different nucleic acids that include steps of: (i) preparing a sample by (a) splitting a solution comprising the different nucleic acids into a plurality of pools, (b) attaching a 5’-end barcode and a 3'-end barcode to the nucleic acids, wherein each of the plurality of pools has a unique pair of the 5 ’-end barcode and the 3 ’-end barcode and (c) mixing the plurality of pools, thereby making the sample; (ii) performing one or more PCR amplifications on the sample; (iii) sequencing products of the PCR amplifications; (iv) detecting PCR chimeras by analyzing sequence pairs of the 5 ’-end barcode and the 3 ’-end barcode of each of the products, wherein the sequence pairs of the 5 ’-end barcode and the 3 '-end barcode of the PCR chimeras are different from that in each of the plurality of pools; and (v)
  • the step of (iv) detecting does not comprise comparing sequences of the products with sequences in any database.
  • the plurality of pools comprises no less than 3 pools. In some embodiments, the plurality’ of pools comprises 3, 4, 5, 6, 7, 8, 9 or 10 pools. In some embodiments, the wherein the plurality of pools comprises 4 pools. In some embodiments, the plurality of pools comprises 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 pools.
  • the solution comprises homologous molecules.
  • the homologous molecules represent a mixture of antibodies from a polyclonal antibody sample from an immunized subject.
  • the polyclonal antibody sample is splenocyte RNA or DNA.
  • the solution comprises B cell receptor repertoire.
  • the solution comprises T cell receptor repertoire.
  • the PCR amplifications comprise a one-step PCR. In some embodiments, the PCR amplifications comprises a two-step PCR.
  • the PCR amplifications utilize a polymerase selected from the group consisting of Platinum "Pfx. GoTaq®, Q5®, and KAPA HiFi.
  • the step of (i) preparing the sample further comprises performing one or more PCR amplifications before the step of (a) splitting or (b) attaching.
  • the method detects about (N-l)/N of the PCR chimeras, when N is a number of the plurality of pools.
  • the nucleic acids in the solution are cDNAs transcribed from RNAs.
  • the RNAs are from a lysate.
  • the lysate is a lymphocyte lysate.
  • Figure 1 illustrates schematic of synthetic templates used for sequencing library' preparation.
  • Figure 2 shows DNA gel images of PCR products from both one step and two step PCR reactions.
  • LC represents light chain DNA template mixture
  • HC means heavy chain DNA template mixture. Bands of expected sizes seen for all samples between 500 and 600 base pairs (bp).
  • Figure 3 demonstrates quantification of PCR chimeras after both one round (PCR1) and two rounds of PCR (PCR2) using four different commercially available polymerases: Pfx.
  • H represents chimeras between two heavy chains.
  • K means chimeras between two kappa chains.
  • HK shows chimeras between a heavy chain and a kappa chain.
  • Figure 4 illustrates a process flow chart for unique dual barcode (UDB) indexing PCR reactions for detecting PCR chimeras.
  • UDB unique dual barcode
  • Figure 5 shows DNA gel image of the round two PCR products for the heavy chain using the unique dual barcoding approach. Lane 1 contains a 100 bp DNA ladder as a reference.
  • Figure 6 shows Promega nuclear factor of activated T cells (NF AT) luciferase assay functional characterization of the original NG006.38E7 antibody compared to three variants isolated using the unique dual bar code NGS approach.
  • NF AT activated T cells
  • the present disclosure provides methods for detecting PCR chimeras, that include steps of splitting a solution of different nucleic acids (e.g., homologous molecules) into multiple pools and attaching a unique pair of barcodes to the nucleic acids.
  • nucleic acids e.g., homologous molecules
  • PCR chimeras contribute to the false perception of diversity that is often interpreted as evidence of hypermutation during immune response, or evidence of new microbial species in 16S rRNA analysis or new MHC haplotypes in MHC haplotyping.
  • the present invention provides methods for detecting PCR chimeras.
  • the methods comprise splitting a solution having different nucleic acids into multiple pools.
  • the methods comprise attaching unique dual barcodes (UDB) to the nucleic acids, wherein each of the pools has a unique pair of the 5’-end barcode and the 3 ’-end barcode.
  • UDB unique dual barcodes
  • PCR chimeras with a pairing of the 5’ -end and 3 ’-end barcodes that is different from that in each of the plurality of pools can be recognized and discarded during data analysis.
  • PCR can refer to a reaction for the in vitro amplification of specific DNA or RNA sequences by the simultaneous primer extension of complementary strands of DNA or RNA.
  • PCR can encompass derivative forms of the reaction, including but not limited to, RT-PCR, real-time PCR, nested PCR, quantitative PCR, multiplexed PCR, digital PCR, and assembly PCR.
  • nucleic acid in its broadest sense, refers to any compound and/or substance that is or can be incorporated into an oligonucleotide chain.
  • a nucleic acid is a compound and/or substance that is or can be incorporated into an oligonucleotide chain via a phosphodiester linkage.
  • nucleic acid refers to individual nucleic acid residues (e.g., nucleotides and/or nucleosides); in some embodiments, “nucleic acid” refers to an oligonucleotide chain comprising individual nucleic acid residues.
  • a “nucleic acid” is or comprises RNA; in some embodiments, a “nucleic acid” is or comprises DNA. In some embodiments, a nucleic acid is, comprises, or consists of one or more natural nucleic acid residues. In some embodiments, a nucleic acid is. comprises, or consists of one or more nucleic acid analogs. In some embodiments, a nucleic acid analog differs from a nucleic acid in that it does not utilize a phosphodiester backbone.
  • a nucleic acid is, comprises, or consists of one or more "peptide nucleic acids", which are known in the art and have peptide bonds instead of phosphodiester bonds in the backbone, are considered within the scope of the present disclosure.
  • a nucleic acid has one or more phosphorothioate and/or 5' -N-phosphoramidite linkages rather than phosphodiester bonds.
  • a nucleic acid is, comprises, or consists of one or more natural nucleosides (e.g., adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxy cytidine).
  • adenosine thymidine, guanosine, cytidine
  • uridine deoxyadenosine
  • deoxythymidine deoxyguanosine
  • deoxy cytidine deoxy cytidine
  • a nucleic acid is, comprises, or consists of one or more nucleoside analogs (e.g., 2-aminoadenosine, 2- thiothymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, 5 -methylcytidine, C-5 propynyl- cytidine, C-5 propynyl-uridine, 2-aminoadenosine, C5-bromouridine. C5-fluorouridine, C5- iodouridine, C5-propynyl-uridine, C5-propynyl-cytidine, C5-methylcytidine. 2-aminoadenosine.
  • nucleoside analogs e.g., 2-aminoadenosine, 2- thiothymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, 5 -methylcytidine, C-5 prop
  • a nucleic acid comprises one or more modified sugars (e.g., 2'-fluororibose, ribose, 2'-deoxyribose, arabinose, and hexose) as compared with those in natural nucleic acids.
  • a nucleic acid has a nucleotide sequence that encodes a functional gene product such as an RNA or protein.
  • a nucleic acid includes one or more introns.
  • nucleic acids are prepared by one or more of isolation from a natural source, enzymatic synthesis by polymerization based on a complementary template in vivo or in vitro), reproduction in a recombinant cell or system, and chemical synthesis.
  • homologous as used herein means nucleic acid or polypeptides having about 100%, about 99%. about 98%, about 97%, about 96%, about 95%, about 94%. about 93%, about 92%, about 91%. about 90%, about 88%, about 85%, about 80%, about 75%. about 70%, about 65% or about 60% sequence identity to each other. In some embodiments, homologous molecules have between about 60% and about 100% sequence identity. In some embodiments, homologous molecules have between about 65% and about 100% sequence identity. In some embodiments, homologous molecules have between about 70% and about 100% sequence identity. In some embodiments, homologous molecules have between about 75% and about 100% sequence identity.
  • % sequence identity as it applies to homologous nucleic acid molecules is defined as the percentage of nucleotides in the candidate nucleic acid sequence that are identical with the residues in the nucleic acid sequence of a second sequence after aligning the sequences and introducing gaps, if necessary, to achieve the maximum percent identity. Methods and computer programs for the alignment are well known in the art. Identity depends on a calculation of percent identity but may differ in value due to gaps and penalties introduced in the calculation. Generally, variants of a particular polynucleotide have at least 60%, 65%, 70%, 75%, 80%, 85%, 88%. 90%. 91%. 92%. 93%. 94%. 95%. 96%. 97%. 98%.
  • sequence alignment programs and parameters known to those skilled in the art can be performed by aligning the two sequences for optimal comparison purposes (e.g., gaps can be introduced in one or both of a first and a second nucleic acid sequences for optimal alignment and non-identical sequences can be disregarded for comparison purposes).
  • the nucleotides at corresponding nucleotide positions are then compared. When a position in the first sequence is occupied by the same nucleotide as the corresponding position in the second sequence, then the molecules are identical at that position.
  • the percent identity between the two sequences is a function of the number of identical positions shared by the sequences, accounting for the number of gaps and the length of each gap, which needs to be introduced for optimal alignment of the tw o sequences.
  • the comparison of sequences and determination of percent identity between two sequences can be accomplished using a mathematical algorithm in an alignment tool (e.g., the Needleman-Wunsch algorithm in an online tool).
  • the method comprises preparing a sample by: (a) splitting a solution comprising different nucleic acids into a plurality of pools: (b) attaching a 5 ’-end barcode and a 3’-end barcode to the nucleic acids: and (c) mixing the plurality of pools, thereby making the sample.
  • each of the plurality of pools has a unique pair of the 5’-end barcode and the 3’-end barcode.
  • the method further comprises performing one or more PCR amplifications on the sample.
  • the method further comprises sequencing products of the PCR amplification.
  • the method further comprises detecting the PCR chimeras by analyzing sequences of the 5 ’-end barcode and the 5’-end barcode of each of the products.
  • the 5’-end barcode and the 3 ’-end barcode of the PCR chimeras are different from that in each of the plurality of originated from different pools.
  • the step of detecting does not comprise comparing sequences of the products with sequences in any database.
  • the 5’-end barcode and the 3’-end barcode of the PCR chimeras are different from that in each of the plurality of pools. For example, if recombination occurs between molecules originally from pools x and y, where y x, the chimeric molecule will contain one barcode from x and one barcode from y . For example, it may contain the left barcode from x and the right barcode from y.
  • (N-l)/N*100% of the total number of PCR chimeras is created from different pools statistically. Therefore, the majority of PCR chimeras involve molecules that originated from different pools, if N is greater than 2.
  • the plurality of pools comprises no less than 2 pools. In some embodiments, the plurality of pools comprises 2 to 100 pools. In some embodiments, the plurality of pools comprises 2 to 50 pools. In some embodiments, the plurality of pools comprises 2 to 40 pools. In some embodiments, the plurality of pools comprises 2 to 30 pools. In some embodiments, the plurality of pools comprises 2 to 20 pools. In some embodiments, the plurality of pools comprises 2. 3, 4, 5. 6. 7, 8. 9 or 10 pools. In some embodiments, the plurality of pools comprises 1 1, 12, 13, 14, 15, 16, 17, 17, 18, 19 or 20 pools. In some embodiments, In some embodiments, the plurality of pools comprises 2 pools. In some embodiments, the plurality' of pools comprises 3 pools.
  • the plurality of pools comprises 4 pools. In some embodiments, the plurality of pools comprises 5 pools. In some embodiments, the plurality’ of pools comprises 6 pools. In some embodiments, the plurality of pools comprises 7 pools. In some embodiments, the plurality' of pools comprises 8 pools. In some embodiments, the plurality' of pools comprises 9 pools. In some embodiments, the plurality of pools comprises 10 pools. In some embodiments, the plurality of pools comprises 11 pools. In some embodiments, the plurality of pools comprises 12 pools. In some embodiments, the plurality of pools comprises 13 pools. In some embodiments, the plurality’ of pools comprises 14 pools. In some embodiments, the plurality’ of pools comprises 15 pools. In some embodiments, the plurality' of pools comprises 16 pools. In some embodiments, the plurality of pools comprises 17 pools. In some embodiments, the plurality of pools comprises 18 pools. In some embodiments, the plurality of pools comprises 19 pools. In some embodiments, the plurality of pools comprises 20 pools.
  • provided methods are used for the traditional antibody discovery approaches (e.g., hybridoma fusion, B-cell culture, B-cell cloning, etc.).
  • the methods provide access to antigen specific antibody sequences generated by immunization in an animal, while minimizing known errors that can be introduced during the sequencing process.
  • the methods for sequencing nucleic acids further comprises removing sequences of detected PCR chimeras. In some embodiments, this step is performed during a data analysis stage.
  • the methods comprise performing one or more PCR amplifications before the step of splitting a solution or attaching barcodes.
  • the PCR amplifications comprise 2-10 cycles. In some embodiments, the PCR amplifications comprise 2 cycles. In some embodiments, the PCR amplifications comprise 3 cycles. In some embodiments, the PCR amplifications comprise 4 cycles. In some embodiments, the PCR amplifications comprise 5 cycles. In some embodiments, the PCR amplifications comprise 6 cycles. In some embodiments, the PCR amplifications comprise 7 cycles. In some embodiments, the PCR amplifications comprise 8 cycles. In some embodiments, the PCR amplifications comprise 9 cycles. In some embodiments, the PCR amplifications comprise 10 cycles.
  • the initial solution and/or the prepared sample comprise homologous molecules. In some embodiments, the initial solution and/or the prepared sample comprise the B cell receptor repertoire. In some embodiments, the initial solution and/or the prepared sample comprise the B cell receptor repertoire.
  • the initial solution and/or the prepared sample comprise cDNAs transcribed from RNAs.
  • the method further comprises conducting one or more cDNA synthesis reactions to produce one or more cDNA copies.
  • the RNAs are from a lysate. In some embodiments, the RNAs are from a lymphocyte lysate.
  • the methods disclosed here is used for identifying a plurality of antibody candidates (e.g., using one of the traditional antibody discovery approaches).
  • the method comprises searching for clonal relatives (e.g., antibodies with similar CDR sequences) of these sequences via sequencing of antigen specific B-cells sorted in bulk from the same immunized animal.
  • the clonal relatives of an antibody with know n binding or functional behavior may have beneficial properties such as affinity differences, developability benefits, and different functional properties.
  • the method may increase the number of unique antibody sequences generated for a specific target and application from a given immunized animal.
  • barcodes (e.g., 5 ’-end barcode, 3 ’-end barcode) comprise a unique sequence and a common sequence.
  • each barcode has a unique sequence that can differentiate pools, each barcode also has a common sequence that serves as a primer binding site for PCR.
  • a barcode comprises 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 30, 31, 40, 50 nucleotides.
  • association or attachment of barcodes to nucleic acids comprises hybridization of a barcode’s target recognition region to a complementary portion of the target nucleic acid molecule. In some embodiments, association or attachment of barcodes to nucleic acids comprises ligation of a barcode’s target recognition region and a portion of the target nucleic acid molecule.
  • J region primers are used for PCR. In some embodiments. J region primers allow' for the introduction of a unique dual barcode indexing tag.
  • the primer has a sequence selected from Tables 2 and 3. In some embodiments, the index primer has a sequence selected from Table 4.
  • the methods disclosed herein comprises performing a PCR amplification on the nucleic acids.
  • the amplicon is a double-stranded molecule (e.g., a double-stranded RNA molecule, a double-stranded DNA molecule, or a RNA molecule hybridized to a DNA molecule).
  • the labeled amplicon is a single-stranded molecule (e g., DNA, RNA, or a combination thereof).
  • amplification comprises use of one or more non-natural nucleotides.
  • non-natural nucleotides comprise photolabile or triggerable nucleotides.
  • examples of non-natural nucleotides can include, but are not limited to, peptide nucleic acid (PNA), morpholino and locked nucleic acid (LNA), as well as glycol nucleic acid (GNA) and threose nucleic acid (TNA).
  • PNA peptide nucleic acid
  • LNA morpholino and locked nucleic acid
  • NAA glycol nucleic acid
  • TAA threose nucleic acid
  • Non-natural nucleotides may be added to one or more cycles of an amplification reaction. In some embodiments, the addition of the non-natural nucleotides is used to identify products as specific cycles or time points in the amplification reaction.
  • amplification of the nucleic acids comprises exponential amplification. In some embodiments, amplification of the nucleic acids comprises linear amplification.
  • the method comprises repeatedly amplifying the nucleic acid to produce multiple amplicons.
  • the methods disclosed herein comprise performing at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29 or 30 PCR amplifications.
  • amplification utilizes a polymerase selected from the group consisting of Platinum K Pfx. GoTaq 1 ®, Q5" . and KAPA HiFi.
  • the methods provided herein detects the PCR chimeras by analyzing sequences of the 5 ’-end barcode and the 3 ’-end barcode of each of the products from the PCR amplifications. For example, a combination of the 5'-end and 3'-end barcodes is compared to a list of initial barcode combinations before the PCR amplifications. In some embodiments, the 5’-end barcode and the 3’-end barcode of the PCR chimeras are different from that in each of the plurality of pools.
  • a library of heavy and light chain PCR products was made for next generation sequencing (NGS) using four synthetic dual indexed/barcoded mouse antibody DNA samples/templates and four different polymerases (i.e. , Platinum*Pfx, GoTaq®, Q5®. and KAPA HiFi)
  • a schematic of the synthetic PCR templates is show n in Figure 1 and the specific sequences are listed in Table 1.
  • These PCR templates were designed as two sets of four homologous synthetic templates ( ⁇ 0.5 kb in size) flanked on 5’ and 3 ? end with a unique pair of nucleotide barcodes (i.e., one set of four templates for the heavy chain and one set of four templates for the light chain.)
  • DNA mixtures were made separately for each chain type: one for the heavy chain and one for the light chain. Specifically, these mixtures were prepared by resuspending the synthetic DNA constructs listed in Table 1 using a 1 : 1 : 1 : 1 equimolar ratio of the four homologous templates (for each chain type) in 50 pL of DNAse/RNAse water for a final concentration of 0.4 ng/pL. These samples were then used as DNA templates for both a one-step PCR protocol and a two-step PCR protocol using each of the four commercially available PCR polymerases mentioned above. The one-step PCR protocol included 35 cycles of amplification from 5 ng DNA template.
  • the two-step PCR protocol had Round 1 with 30 cycles of amplification from 0.5 ng of DNA template; and Round 2 with 35 cycles of amplification from 2 pL of round 1 PCR product.
  • a detailed schematic of the PCR protocol steps and primers and the specific PCR conditions for each PCR polymerase are listed below.
  • the primer sequences are listed in Table 2.
  • the resulting PCR products were then run on DNA gels (see Figure 2), the bands were excised, and then purified using a Qiagen gel purification kit (catalog# 28704).
  • the DNA concentrations of the purified PCR products were quantified using Qubit and the dsDNA High Sensitivity assay (Thermo Scientific catalog# Q32851).
  • This Example exemplifies a method for detecting PCR chimeras.
  • ahybridoma fusion antibody discovery campaign was performed from an immunized mouse.
  • a hybridoma clone NGOO6.38E7.D11 was isolated from this fusion and shown to produce a mouse IgG antibody with a desired protein binding profile for a specific target.
  • a PCR reaction was then set up to amplify an enriched portion of the immunized B-cell repertoire for both heavy and light chain sequences with a similar J region (IGHJ2*01 and IGKJl*01).
  • the J region primers used for PCR also allowed for the introduction of a unique dual barcode indexing tag for PCR chimera tracking as shown on Figure 4.
  • RNAs were isolated from splenocytes harvested from the same immunized mouse used to isolate clone NG006.38E7.D11 using the Qiagen RNeasy kit (catalog # 74104) cDNAs were then prepared by mixing 500 ng of RNA, 0.2 pL 100 pM poly dT primer
  • second strand synthesis was performed by taking 5 pL of the cDNA reaction and mixing it with 0.75 pL 10 mM dNTPs, 5 pL 5x buffer, 0.5 pL KAPA polymerase, 1.25 pL of either 10 pM READ1S-N16-LH21 or 10 pM READ1S-N16-LH14 primer (two samples in total, one prepared with each primer, see Table 3 for primer sequences), and water up to 25 pL. The two samples were then incubated at 95 °C for 3 minutes and 68 °C for 15 minutes before cooling down to 4 °C.
  • the two purified second strand products (from cDNA generated using either the READls-N16-LH21 or the READ1S-N16-LH14 primer second strand synthesis primers) were each split into four first round PCR amplification reactions. Each first round PCR amplification reaction introduced a unique pair of dual barcode indices that were encoded in both the forward primer and the J-region reverse primers.
  • PCR reactions 4 pL of the purified second strand synthesis product was mixed with 1.5 pL of 10 mM dNTPs, 10 pL 5x KAPA polymerase buffer, 1 pL KAPA polymerase, 1.5 pL 10 pM P5-N50x-Rl primer, 1.5 pL 10 pM READl- D70x-J primer, and water up to 50 pL (see Table 3 for primer sequences and Table 4 for index sequences, the x in the primer name refers to the specific index sequence included in the individual primers).
  • These PCR mixtures were cycled using the protocol 1 listed below.
  • the antibody combinations that were shown to retain target binding were then tested in a Jurkat cell NF AT luciferase functional assay and compared to a benchmark antibody (See Figure 6).
  • the specific assay used to evaluate the functional activity of the antibody clones was the commercially available Promega NFAT luciferase assay (cat# J 1621) and the clones were compared to the commercially available benchmark antibody and the original NG006.38E7 antibody.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Analytical Chemistry (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present disclosure relates to methods for detecting PCR chimeras or sequencing nucleic acids. The present disclosure also relates to methods comprising preparing a sample for detecting PCR chimeras or sequencing nucleic acids.

Description

METHODS FOR DETECTING PCR CHIMERAS
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional Patent Application Serial No. 63/520,156 filed August 17, 2023. the entire contents of which are incorporated by reference herein.
REFERENCE TO SEQUENCE LISTING SUBMITTED ELECTRONICALLY [0002] The contents of the electronic sequence listing (25815WOPCT-SEQLIST- 26NOV2023.xml; Size: 61,558 bytes and Date of Creation November 26, 2023) are herein incorporated by reference in their entirety.
BACKGROUND
[0003] Polymerase chain reaction (PCR) is a method widely used to rapidly make many copies of a specific DNA sample, allowing scientists to take a very small sample of DNA and amplify it to a large enough amount to study in detail. A PCR chimera is an artifact resulting from PCR amplification. It typically occurs when the extension of an amplicon is aborted, and the prematurely terminated amplicon functions as a primer in the next PCR cycle. The prematurely terminated amplicon anneals to a different template and continues to extend, thereby creating a chimeric sequence sourced from two different parental sequences.
SUMMARY
[0004] The present disclosure extends an insight that PCR chimeras generated by multiple amplification of a sample can contribute to inaccurate interpretation of sequencing data. For example, when samples are inherently limited in nucleic acid template quantify, such samples may require the use of multiple rounds of PCR. Many rounds of PCR can generate PCR chimeras that do not represent actual sequences from the samples.
[0005] Various strategies currently employed for PCR chimera detection involve algorithms (e.g., PyroNoise, UChime) relying on one or more databases of chimera and/or chimera-free sequences. The present disclosure is based, in part, on our finding that PCR chimeras can be identified without depending on any database.
[0006] In one aspect, provided are methods for detecting PCR chimeras, that include steps of splitting a solution of different nucleic acids (e.g., homologous nucleic acid molecules intended to be amplified) into multiple pools and attaching a unique pair of barcodes (unique sequence identifiers) to the nucleic acids. In some embodiments, the method comprises steps of performing one or more PCR amplifications on the sample, sequencing products of the PCR amplifications, and detecting the PCR chimeras by the presence of a pairing of a 5’-end barcode and 3’-end barcode that is different from that in each of the plurality of pools by the absence or reduction of the unique pair of the barcodes.
[0007] In one aspect, provided are methods for detecting PCR chimeras that include steps of (i) preparing a sample by (a) splitting a solution comprising different nucleic acids into a plurality of pools, (b) attaching a 5 ’-end barcode and a 3 ’-end barcode to the nucleic acids, wherein each of the plurality of pools has a unique pair of the 5 ’-end barcode and the 3 ’-end barcode, and (c) mixing the plurality of pools, thereby making the sample, (ii) performing one or more PCR amplifications on the sample; (iii) sequencing products of the PCR amplifications; and (iv) detecting the PCR chimeras by analyzing sequence pairs of the 5 '-end barcode and the 3 ’-end barcode of each of the products, wherein the sequence pairs of the 5 '-end barcode and the 3 ’-end barcode of the PCR chimeras are different from that in each of the plurality of pools.
[0008] In one aspect, provided are methods for sequencing different nucleic acids that include steps of: (i) preparing a sample by (a) splitting a solution comprising the different nucleic acids into a plurality of pools, (b) attaching a 5’-end barcode and a 3'-end barcode to the nucleic acids, wherein each of the plurality of pools has a unique pair of the 5 ’-end barcode and the 3 ’-end barcode and (c) mixing the plurality of pools, thereby making the sample; (ii) performing one or more PCR amplifications on the sample; (iii) sequencing products of the PCR amplifications; (iv) detecting PCR chimeras by analyzing sequence pairs of the 5 ’-end barcode and the 3 ’-end barcode of each of the products, wherein the sequence pairs of the 5 ’-end barcode and the 3 '-end barcode of the PCR chimeras are different from that in each of the plurality of pools; and (v) removing sequences of the PCR chimeras from sequences of the products.
[0009] In some embodiments, the step of (iv) detecting does not comprise comparing sequences of the products with sequences in any database.
[0010] In some embodiments, the plurality of pools comprises no less than 3 pools. In some embodiments, the plurality’ of pools comprises 3, 4, 5, 6, 7, 8, 9 or 10 pools. In some embodiments, the wherein the plurality of pools comprises 4 pools. In some embodiments, the plurality of pools comprises 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 pools.
[0011] In some embodiments, the solution comprises homologous molecules. In specific embodiments, the homologous molecules represent a mixture of antibodies from a polyclonal antibody sample from an immunized subject. In specific embodiments, the polyclonal antibody sample is splenocyte RNA or DNA. In some embodiments, the solution comprises B cell receptor repertoire. In some embodiments, the solution comprises T cell receptor repertoire.
[0012] In some embodiments, the PCR amplifications comprise a one-step PCR. In some embodiments, the PCR amplifications comprises a two-step PCR.
[0013] In some embodiments, the PCR amplifications utilize a polymerase selected from the group consisting of Platinum "Pfx. GoTaq®, Q5®, and KAPA HiFi.
[0014] In some embodiments, the step of (i) preparing the sample further comprises performing one or more PCR amplifications before the step of (a) splitting or (b) attaching.
[0015] In some embodiments, the method detects about (N-l)/N of the PCR chimeras, when N is a number of the plurality of pools.
[0016] In some embodiments, the nucleic acids in the solution are cDNAs transcribed from RNAs. In some embodiments, the RNAs are from a lysate. In some embodiments, the lysate is a lymphocyte lysate.
BRIEF DESCRIPTION OF THE DRAWINGS
[0017] Figure 1 illustrates schematic of synthetic templates used for sequencing library' preparation.
[0018] Figure 2 shows DNA gel images of PCR products from both one step and two step PCR reactions. LC represents light chain DNA template mixture, and HC means heavy chain DNA template mixture. Bands of expected sizes seen for all samples between 500 and 600 base pairs (bp).
[0019] Figure 3 demonstrates quantification of PCR chimeras after both one round (PCR1) and two rounds of PCR (PCR2) using four different commercially available polymerases: Pfx.
KAPA, GoTaq, and Q5. H represents chimeras between two heavy chains. K means chimeras between two kappa chains. HK shows chimeras between a heavy chain and a kappa chain.
[0020] Figure 4 illustrates a process flow chart for unique dual barcode (UDB) indexing PCR reactions for detecting PCR chimeras.
[0021] Figure 5 shows DNA gel image of the round two PCR products for the heavy chain using the unique dual barcoding approach. Lane 1 contains a 100 bp DNA ladder as a reference. [0022] Figure 6 shows Promega nuclear factor of activated T cells (NF AT) luciferase assay functional characterization of the original NG006.38E7 antibody compared to three variants isolated using the unique dual bar code NGS approach. DET AILED DESCRIPTIONS
[0023] In one aspect, the present disclosure provides methods for detecting PCR chimeras, that include steps of splitting a solution of different nucleic acids (e.g., homologous molecules) into multiple pools and attaching a unique pair of barcodes to the nucleic acids.
[0024] Traditional antibody discovery approaches (e.g.. hybridoma fusion, B-cell culture/cloning) typically involve limited access to a large fraction of the sequences present in the B-cell repertoire of immunized animals. It is because of the low fusion efficiency of the hybridoma fusion reaction and/or the inability to activate and culture all antigen specific B-cells generated by the animal. Typically, these approaches allow for the isolation of tens or hundreds of antibody sequences from one experiment while thousands of antigen specific antibody sequences are generated by the animal.
[0025] Exponential amplification of a mixture of homologous molecules, such as T-cell receptor (TCR) or B-cell receptor (BCR) or 16S rRNA genes, can generate PCR chimeras. Together with PCR and sequencing errors, PCR chimeras contribute to the false perception of diversity that is often interpreted as evidence of hypermutation during immune response, or evidence of new microbial species in 16S rRNA analysis or new MHC haplotypes in MHC haplotyping. Software tools exist that attempt to identify PCR chimeras in 16S rRNA data and MHC data from a database of known haplotypes; however, these tools are not applicable to antibody repertoire sequencing.
[0026] In one aspect, the present invention provides methods for detecting PCR chimeras. In some embodiments, the methods comprise splitting a solution having different nucleic acids into multiple pools. In some embodiments, the methods comprise attaching unique dual barcodes (UDB) to the nucleic acids, wherein each of the pools has a unique pair of the 5’-end barcode and the 3 ’-end barcode. PCR chimeras with a pairing of the 5’ -end and 3 ’-end barcodes that is different from that in each of the plurality of pools can be recognized and discarded during data analysis.
1 . Definition
[0027] Certain technical and scientific terms are specifically defined below. Unless specifically defined elsewhere in this document, all other technical and scientific terms used herein have the meaning commonly understood by one of ordinary skill in the art to which this disclosure relates. [0028] “About” when used to modify a numerically defined parameter means that the parameter is within 20%, within 15%, within 10%, within 9%, within 8%, within 7%, within 6%, within 5%, within 4%, within 3%, within 2%, within 1%, or less of the stated numerical value or range for that parameter; where appropriate, the stated parameter may be rounded to the nearest whole number.
[0029] As used herein, PCR can refer to a reaction for the in vitro amplification of specific DNA or RNA sequences by the simultaneous primer extension of complementary strands of DNA or RNA. As used herein, PCR can encompass derivative forms of the reaction, including but not limited to, RT-PCR, real-time PCR, nested PCR, quantitative PCR, multiplexed PCR, digital PCR, and assembly PCR.
[0030] As used herein, the term "‘nucleic acid,” in its broadest sense, refers to any compound and/or substance that is or can be incorporated into an oligonucleotide chain. In some embodiments, a nucleic acid is a compound and/or substance that is or can be incorporated into an oligonucleotide chain via a phosphodiester linkage. As will be clear from context, in some embodiments, “nucleic acid” refers to individual nucleic acid residues (e.g., nucleotides and/or nucleosides); in some embodiments, “nucleic acid” refers to an oligonucleotide chain comprising individual nucleic acid residues. In some embodiments, a “nucleic acid” is or comprises RNA; in some embodiments, a “nucleic acid” is or comprises DNA. In some embodiments, a nucleic acid is, comprises, or consists of one or more natural nucleic acid residues. In some embodiments, a nucleic acid is. comprises, or consists of one or more nucleic acid analogs. In some embodiments, a nucleic acid analog differs from a nucleic acid in that it does not utilize a phosphodiester backbone. For example, in some embodiments, a nucleic acid is, comprises, or consists of one or more "peptide nucleic acids", which are known in the art and have peptide bonds instead of phosphodiester bonds in the backbone, are considered within the scope of the present disclosure. Alternatively or additionally, in some embodiments, a nucleic acid has one or more phosphorothioate and/or 5' -N-phosphoramidite linkages rather than phosphodiester bonds. In some embodiments, a nucleic acid is, comprises, or consists of one or more natural nucleosides (e.g., adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxy cytidine). In some embodiments, a nucleic acid is, comprises, or consists of one or more nucleoside analogs (e.g., 2-aminoadenosine, 2- thiothymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, 5 -methylcytidine, C-5 propynyl- cytidine, C-5 propynyl-uridine, 2-aminoadenosine, C5-bromouridine. C5-fluorouridine, C5- iodouridine, C5-propynyl-uridine, C5-propynyl-cytidine, C5-methylcytidine. 2-aminoadenosine. 7-deazaadenosine, 7-deazaguanosine, 8 -oxoadenosine, 8-oxoguanosine, O(6)-methylguanine, 2- thiocytidine, methylated bases, intercalated bases, and combinations thereof). In some embodiments, a nucleic acid comprises one or more modified sugars (e.g., 2'-fluororibose, ribose, 2'-deoxyribose, arabinose, and hexose) as compared with those in natural nucleic acids. In some embodiments, a nucleic acid has a nucleotide sequence that encodes a functional gene product such as an RNA or protein. In some embodiments, a nucleic acid includes one or more introns. In some embodiments, nucleic acids are prepared by one or more of isolation from a natural source, enzymatic synthesis by polymerization based on a complementary template in vivo or in vitro), reproduction in a recombinant cell or system, and chemical synthesis.
[0031] “Homologous” as used herein means nucleic acid or polypeptides having about 100%, about 99%. about 98%, about 97%, about 96%, about 95%, about 94%. about 93%, about 92%, about 91%. about 90%, about 88%, about 85%, about 80%, about 75%. about 70%, about 65% or about 60% sequence identity to each other. In some embodiments, homologous molecules have between about 60% and about 100% sequence identity. In some embodiments, homologous molecules have between about 65% and about 100% sequence identity. In some embodiments, homologous molecules have between about 70% and about 100% sequence identity. In some embodiments, homologous molecules have between about 75% and about 100% sequence identity.
[0032] “% sequence identity” as it applies to homologous nucleic acid molecules is defined as the percentage of nucleotides in the candidate nucleic acid sequence that are identical with the residues in the nucleic acid sequence of a second sequence after aligning the sequences and introducing gaps, if necessary, to achieve the maximum percent identity. Methods and computer programs for the alignment are well known in the art. Identity depends on a calculation of percent identity but may differ in value due to gaps and penalties introduced in the calculation. Generally, variants of a particular polynucleotide have at least 60%, 65%, 70%, 75%, 80%, 85%, 88%. 90%. 91%. 92%. 93%. 94%. 95%. 96%. 97%. 98%. or 99% but less than 100% sequence identity7 to that particular reference polynucleotide as determined by sequence alignment programs and parameters known to those skilled in the art. Calculation of the percent identity of two polynucleic acid sequences, for example, can be performed by aligning the two sequences for optimal comparison purposes (e.g., gaps can be introduced in one or both of a first and a second nucleic acid sequences for optimal alignment and non-identical sequences can be disregarded for comparison purposes). The nucleotides at corresponding nucleotide positions are then compared. When a position in the first sequence is occupied by the same nucleotide as the corresponding position in the second sequence, then the molecules are identical at that position. The percent identity between the two sequences is a function of the number of identical positions shared by the sequences, accounting for the number of gaps and the length of each gap, which needs to be introduced for optimal alignment of the tw o sequences. The comparison of sequences and determination of percent identity between two sequences can be accomplished using a mathematical algorithm in an alignment tool (e.g., the Needleman-Wunsch algorithm in an online tool).
2. PCR Chimera detection
[0033] In some embodiments, provided are methods for detecting PCR chimeras or sequencing nucleic acids. In some embodiments, the method comprises preparing a sample by: (a) splitting a solution comprising different nucleic acids into a plurality of pools: (b) attaching a 5 ’-end barcode and a 3’-end barcode to the nucleic acids: and (c) mixing the plurality of pools, thereby making the sample. In some embodiments, each of the plurality of pools has a unique pair of the 5’-end barcode and the 3’-end barcode. In some embodiments, the method further comprises performing one or more PCR amplifications on the sample. In some embodiments, the method further comprises sequencing products of the PCR amplification. In some embodiments, the method further comprises detecting the PCR chimeras by analyzing sequences of the 5 ’-end barcode and the 5’-end barcode of each of the products. In some embodiments, the 5’-end barcode and the 3 ’-end barcode of the PCR chimeras are different from that in each of the plurality of originated from different pools. In some embodiments, the step of detecting does not comprise comparing sequences of the products with sequences in any database.
[0034] In some embodiments, the 5’-end barcode and the 3’-end barcode of the PCR chimeras are different from that in each of the plurality of pools. For example, if recombination occurs between molecules originally from pools x and y, where y x, the chimeric molecule will contain one barcode from x and one barcode from y . For example, it may contain the left barcode from x and the right barcode from y.
When a sample is split into N pools, (N-l)/N*100% of the total number of PCR chimeras is created from different pools statistically. Therefore, the majority of PCR chimeras involve molecules that originated from different pools, if N is greater than 2. The methods disclosed herein may detect about (N-I)/N*I00% of the total number of PCR chimeras by analyzing the 5’-end barcode and the 3’-end barcode of each nucleic acid. For example, when N=4, the methods recognize 75% of the PCR chimeras.
[0035] In some embodiments, the plurality of pools comprises no less than 2 pools. In some embodiments, the plurality of pools comprises 2 to 100 pools. In some embodiments, the plurality of pools comprises 2 to 50 pools. In some embodiments, the plurality of pools comprises 2 to 40 pools. In some embodiments, the plurality of pools comprises 2 to 30 pools. In some embodiments, the plurality of pools comprises 2 to 20 pools. In some embodiments, the plurality of pools comprises 2. 3, 4, 5. 6. 7, 8. 9 or 10 pools. In some embodiments, the plurality of pools comprises 1 1, 12, 13, 14, 15, 16, 17, 17, 18, 19 or 20 pools. In some embodiments, In some embodiments, the plurality of pools comprises 2 pools. In some embodiments, the plurality' of pools comprises 3 pools. In some embodiments, the plurality of pools comprises 4 pools. In some embodiments, the plurality of pools comprises 5 pools. In some embodiments, the plurality’ of pools comprises 6 pools. In some embodiments, the plurality of pools comprises 7 pools. In some embodiments, the plurality' of pools comprises 8 pools. In some embodiments, the plurality' of pools comprises 9 pools. In some embodiments, the plurality of pools comprises 10 pools. In some embodiments, the plurality of pools comprises 11 pools. In some embodiments, the plurality of pools comprises 12 pools. In some embodiments, the plurality of pools comprises 13 pools. In some embodiments, the plurality’ of pools comprises 14 pools. In some embodiments, the plurality’ of pools comprises 15 pools. In some embodiments, the plurality' of pools comprises 16 pools. In some embodiments, the plurality of pools comprises 17 pools. In some embodiments, the plurality of pools comprises 18 pools. In some embodiments, the plurality of pools comprises 19 pools. In some embodiments, the plurality of pools comprises 20 pools.
[0036] In some embodiments, provided methods are used for the traditional antibody discovery approaches (e.g., hybridoma fusion, B-cell culture, B-cell cloning, etc.). The methods provide access to antigen specific antibody sequences generated by immunization in an animal, while minimizing known errors that can be introduced during the sequencing process.
[0037] In some embodiments, the methods for sequencing nucleic acids further comprises removing sequences of detected PCR chimeras. In some embodiments, this step is performed during a data analysis stage.
[0038] In some embodiments, the methods comprise performing one or more PCR amplifications before the step of splitting a solution or attaching barcodes. In some embodiments, the PCR amplifications comprise 2-10 cycles. In some embodiments, the PCR amplifications comprise 2 cycles. In some embodiments, the PCR amplifications comprise 3 cycles. In some embodiments, the PCR amplifications comprise 4 cycles. In some embodiments, the PCR amplifications comprise 5 cycles. In some embodiments, the PCR amplifications comprise 6 cycles. In some embodiments, the PCR amplifications comprise 7 cycles. In some embodiments, the PCR amplifications comprise 8 cycles. In some embodiments, the PCR amplifications comprise 9 cycles. In some embodiments, the PCR amplifications comprise 10 cycles.
[0039] In some embodiments, the initial solution and/or the prepared sample comprise homologous molecules. In some embodiments, the initial solution and/or the prepared sample comprise the B cell receptor repertoire. In some embodiments, the initial solution and/or the prepared sample comprise the B cell receptor repertoire.
[0040] In some embodiments, the initial solution and/or the prepared sample comprise cDNAs transcribed from RNAs. In some embodiments, the method further comprises conducting one or more cDNA synthesis reactions to produce one or more cDNA copies. In some embodiments, the RNAs are from a lysate. In some embodiments, the RNAs are from a lymphocyte lysate.
[0041] In some embodiments, the methods disclosed here is used for identifying a plurality of antibody candidates (e.g., using one of the traditional antibody discovery approaches). In some embodiments, the method comprises searching for clonal relatives (e.g., antibodies with similar CDR sequences) of these sequences via sequencing of antigen specific B-cells sorted in bulk from the same immunized animal. The clonal relatives of an antibody with know n binding or functional behavior may have beneficial properties such as affinity differences, developability benefits, and different functional properties. In some embodiments, the method may increase the number of unique antibody sequences generated for a specific target and application from a given immunized animal.
[0042] In some embodiments, barcodes (e.g., 5 ’-end barcode, 3 ’-end barcode) comprise a unique sequence and a common sequence. For example, while each barcode has a unique sequence that can differentiate pools, each barcode also has a common sequence that serves as a primer binding site for PCR.
[0043] In some embodiments, a barcode comprises 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 30, 31, 40, 50 nucleotides.
[0044] In some embodiments, association or attachment of barcodes to nucleic acids comprises hybridization of a barcode’s target recognition region to a complementary portion of the target nucleic acid molecule. In some embodiments, association or attachment of barcodes to nucleic acids comprises ligation of a barcode’s target recognition region and a portion of the target nucleic acid molecule.
[0045] In some embodiments, J region primers are used for PCR. In some embodiments. J region primers allow' for the introduction of a unique dual barcode indexing tag.
[0046] In some embodiments, the primer has a sequence selected from Tables 2 and 3. In some embodiments, the index primer has a sequence selected from Table 4.
[0047] In some embodiments, the methods disclosed herein comprises performing a PCR amplification on the nucleic acids. In some embodiments, the amplicon is a double-stranded molecule (e.g., a double-stranded RNA molecule, a double-stranded DNA molecule, or a RNA molecule hybridized to a DNA molecule). In some embodiments, the labeled amplicon is a single-stranded molecule (e g., DNA, RNA, or a combination thereof).
[0048] In some embodiments, amplification comprises use of one or more non-natural nucleotides. In some embodiments, non-natural nucleotides comprise photolabile or triggerable nucleotides. Examples of non-natural nucleotides can include, but are not limited to, peptide nucleic acid (PNA), morpholino and locked nucleic acid (LNA), as well as glycol nucleic acid (GNA) and threose nucleic acid (TNA). Non-natural nucleotides may be added to one or more cycles of an amplification reaction. In some embodiments, the addition of the non-natural nucleotides is used to identify products as specific cycles or time points in the amplification reaction.
[0049] In some embodiments, amplification of the nucleic acids comprises exponential amplification. In some embodiments, amplification of the nucleic acids comprises linear amplification.
[0050] In some embodiments, the method comprises repeatedly amplifying the nucleic acid to produce multiple amplicons. In some embodiments, the methods disclosed herein comprise performing at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29 or 30 PCR amplifications.
[0051] In some embodiments, amplification utilizes a polymerase selected from the group consisting of Platinum K Pfx. GoTaq1®, Q5" . and KAPA HiFi.
[0052] In some embodiments, the methods provided herein detects the PCR chimeras by analyzing sequences of the 5 ’-end barcode and the 3 ’-end barcode of each of the products from the PCR amplifications. For example, a combination of the 5'-end and 3'-end barcodes is compared to a list of initial barcode combinations before the PCR amplifications. In some embodiments, the 5’-end barcode and the 3’-end barcode of the PCR chimeras are different from that in each of the plurality of pools.
EXAMPLES
Example 1
[0053] This example demonstrates the propensity of PCR chimera generation after PCR amplifications.
[0054] A library of heavy and light chain PCR products was made for next generation sequencing (NGS) using four synthetic dual indexed/barcoded mouse antibody DNA samples/templates and four different polymerases (i.e. , Platinum*Pfx, GoTaq®, Q5®. and KAPA HiFi)
[0055] A schematic of the synthetic PCR templates is show n in Figure 1 and the specific sequences are listed in Table 1. These PCR templates were designed as two sets of four homologous synthetic templates (~0.5 kb in size) flanked on 5’ and 3? end with a unique pair of nucleotide barcodes (i.e., one set of four templates for the heavy chain and one set of four templates for the light chain.)
[0056] Equimolar mixtures of each template w ere amplified using a panel of commercial polymerases: GoTaq®, KAPA HiFi, Platinum®Pfx, and Q5®. These synthetic templates were used in standard one or two step PCR reactions as detailed below. The PCR products were gel- purified, and then sequenced on an Illumina MiSeq. The NGS data was analyzed to detect PCR chimeras, unnatural combinations of dual barcodes from the original synthetic DNA templates.
Table 1. DNA sequences of synthetic DNA templates
Figure imgf000012_0001
Figure imgf000013_0001
Figure imgf000014_0001
Figure imgf000015_0001
PCR Reaction Protocol:
[0057] DNA mixtures were made separately for each chain type: one for the heavy chain and one for the light chain. Specifically, these mixtures were prepared by resuspending the synthetic DNA constructs listed in Table 1 using a 1 : 1 : 1 : 1 equimolar ratio of the four homologous templates (for each chain type) in 50 pL of DNAse/RNAse water for a final concentration of 0.4 ng/pL. These samples were then used as DNA templates for both a one-step PCR protocol and a two-step PCR protocol using each of the four commercially available PCR polymerases mentioned above. The one-step PCR protocol included 35 cycles of amplification from 5 ng DNA template. The two-step PCR protocol had Round 1 with 30 cycles of amplification from 0.5 ng of DNA template; and Round 2 with 35 cycles of amplification from 2 pL of round 1 PCR product. A detailed schematic of the PCR protocol steps and primers and the specific PCR conditions for each PCR polymerase are listed below. The primer sequences are listed in Table 2. The resulting PCR products were then run on DNA gels (see Figure 2), the bands were excised, and then purified using a Qiagen gel purification kit (catalog# 28704). The DNA concentrations of the purified PCR products were quantified using Qubit and the dsDNA High Sensitivity assay (Thermo Scientific catalog# Q32851).
PCR protocol summary
PCR Round 1
Forward Primer
• READ1 Reverse Primers:
• AF_IgGl/2 nested + AF_IgG3 nested (to amplify heavy chain template)
• Tri_IgK nested (to amplify7 the light chain template)
PCR Round 2 or One-Step PCR Forward primer
• P5-i5-Rl
Reverse primers:
• P7-i7-REA2-AF_IgGl/2a/2b + P7-READ2-AF_IgG3 (to amplify heavy chain template)
• P7-i7-READ2-AF_IgK (to amplify7 light chain template)
PCR conditions for each polymerase
GoTAQ Polymerase (Promega, cat # M7422)
• 5 ng of DNA template mixture or 2 pL of Round 1 PCR product • 0.75 JJ.L 10 pM of forward primer
• 0.75 pL 10 pM of each reverse primer
• 12.5 pL of GoTAQ 2x master mix
• Up to 25 pL with water
• PCR Cycles
1. 94 °C 5 min
2. 94 °C 30s
3. 55 °C 45s
4. 72 °C 1 :30 min
5. Repeat steps 2-4 for a total of 30 or 35 cycles
6. 72 °C 10:00 min
Pfx (Thermo, cat # 11708013)
• 5 ng of DNA template mixture or 2 pL of Round 1 PCR product
• 0.75 pL 10 mM dNTPs
• 0.5 pL MgSCh
• 2.5 pL lOx Pfx buffer included with Plx kit
• 5 pL enhancer solution included with Pfx kit
• 0.2 pL Plx polymerase
• 0.75 pL 10 pM forward primer
• 0.75 pL 10 pM of each reverse primer
• Up to 25 pL of water
• PCR Cycles:
1. 94 °C for 5 min
2. 94 °C for 15s
3. 55 °C for 30s
4. 68 °C for 1.5 min
5. Repeat steps 2-4 for a total of 30 or 35 cycles
6. 68 °C for 4 min
KAPA (KAPA cat #KK2501)
• 5 ng of DNA template mixture or 2 pL of Round 1 PCR product
• 0.75 pL of 10 mM dNTPs
• 5 pL 5x buffer included with KAPA polymerase kit
• 0.5 pL KAPA polymerase
• 0.75 pL 10 pM forward primer
• 0.75 pL 10 pM of each reverse primer
• Up to 25 pL of water
• PCR Cycles:
1. 95 °C for2 mm
2. 98 °C for 10s
3. 65 °C for 20s
4. 72 °C for 50s
5. Repeat steps 2-4 for a total of 30 or 35 cycles
6. 72 °C for 1 min
Q5
• 5 ng of DNA template mixture or 2 pL of Round 1 PCR product
• 5 pL 5x buffer included with Q5 kit
• 0.5 pL of 10 mM dNTPs
• 0.25 pL Q5 polymerase • 1.25 pL 10 pM forward primer
• 1.25 pL 10 pM of each reverse primer
• Up to 25 pL of water
• PCR Cycles:
1. 95 °C for 1 mm
2. 98 °C for 10s
3. 68 °C for 20s
4. 72 °C for 50s
5. Repeat steps 2-4 for a total of 30 or 35 cycles
6. 72 °C for 1 min
Table 2. PCR Primers
Figure imgf000017_0001
NGS Results
[0058] The quantified NGS library preparations were sequenced using an Illumina MiSeq. The sequencing data was investigated for the prevalence of PCR chimeras in the sequencing data. Bioinformatic analysis was used to detect unnatural pairings of the expected dual indices/barcodes from the original synthetic DNA templates. A summary of this analysis is shown in Figure 3. A large number of PCR chimeras was detected after both 1 and 2 rounds of PCR. After one round of PCR, the four different commercial polymerases resulted in varying levels of PCR chimeras with the following order: Pfx < KAPA < GoTaq < Q5. Following two rounds of PCR, greater than 50% of the antibody sequences generated by NGS for using all four PCR polymerases were PCR chimeras. Example 2
[0059] This Example exemplifies a method for detecting PCR chimeras.
First, ahybridoma fusion antibody discovery campaign was performed from an immunized mouse. A hybridoma clone NGOO6.38E7.D11 was isolated from this fusion and shown to produce a mouse IgG antibody with a desired protein binding profile for a specific target. A PCR reaction was then set up to amplify an enriched portion of the immunized B-cell repertoire for both heavy and light chain sequences with a similar J region (IGHJ2*01 and IGKJl*01). The J region primers used for PCR also allowed for the introduction of a unique dual barcode indexing tag for PCR chimera tracking as shown on Figure 4. Integrated into this process is enrichment using second strand cDNA synthesis primers and restriction enzyme digestion of undesired antibody sequences. The protocol for this approach is outlined below for just the heavy chain PCR amplification reactions, a parallel protocol was also performed for the light chain. The overall schematic for the process is outlined in Figure 4.
[0060] First, RNAs were isolated from splenocytes harvested from the same immunized mouse used to isolate clone NG006.38E7.D11 using the Qiagen RNeasy kit (catalog # 74104) cDNAs were then prepared by mixing 500 ng of RNA, 0.2 pL 100 pM poly dT primer
(.TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTVN SEQ ID NQ 17^ |
Figure imgf000018_0001
(Thermo Fisher, cat # R0192). and nuclease free water was added to bring the volume up to a total of 9.5 pL. These components were mixed and then incubated at 65 °C for 5 minutes and then placed on ice. After 2 minutes on ice, the sample was mixed with 4 pL 5x SuperScript™ IV (SSIV) first-strand buffer, 0.2 pL RNAseOUT (Thermo, cat # 10777019), 1 pL 100 mM dithiothreitol (DTT). 4 pL 5M betaine (Sigma, cat # B0300), 0.12 pL IM MgCh (Sigma, cat #M1028-10XlML), 0.2 pL 100 pM TSOa primer (see Table 3 for primer sequence for this template switching oligo), and 1 pL SSIV reverse transcriptase (Thermo, cat #18090050). Once mixed, the reaction was incubated at 53 °C for 20 min, 80 °C for 10 min, and then placed at 4 °C. [0061] Next, second strand synthesis was performed by taking 5 pL of the cDNA reaction and mixing it with 0.75 pL 10 mM dNTPs, 5 pL 5x buffer, 0.5 pL KAPA polymerase, 1.25 pL of either 10 pM READ1S-N16-LH21 or 10 pM READ1S-N16-LH14 primer (two samples in total, one prepared with each primer, see Table 3 for primer sequences), and water up to 25 pL. The two samples were then incubated at 95 °C for 3 minutes and 68 °C for 15 minutes before cooling down to 4 °C. After cooling, the products were purified using Agencourt Ampure XP beads (Beckman Coulter, cat. #A63881) with the bead to sample ratio of 0.8 and eluted in 40 pL of water. Table 3. PCR Primer Sequences
Figure imgf000019_0001
Table 4. Index primer sequences introduced during PCR reactions for each sample.
Figure imgf000019_0002
Figure imgf000020_0001
[0062] The two purified second strand products (from cDNA generated using either the READls-N16-LH21 or the READ1S-N16-LH14 primer second strand synthesis primers) were each split into four first round PCR amplification reactions. Each first round PCR amplification reaction introduced a unique pair of dual barcode indices that were encoded in both the forward primer and the J-region reverse primers. For these PCR reactions, 4 pL of the purified second strand synthesis product was mixed with 1.5 pL of 10 mM dNTPs, 10 pL 5x KAPA polymerase buffer, 1 pL KAPA polymerase, 1.5 pL 10 pM P5-N50x-Rl primer, 1.5 pL 10 pM READl- D70x-J primer, and water up to 50 pL (see Table 3 for primer sequences and Table 4 for index sequences, the x in the primer name refers to the specific index sequence included in the individual primers). These PCR mixtures were cycled using the protocol 1 listed below.
Protocol 1
• 95 °C for 2 min
• 20 repeated cycles of: 1. 98 °C for 10s
2. 65 °C for 20s
3. 72 °C for 1:30 min
[0063] The four separate aliquots for each sample were then pooled together and purified using Agencourt Ampure XP beads (Beckman Coulter, cat. #A63881) using the bead to sample ratio of 0.8 and eluted in 80 pL of water. Each purified sample was then split into two aliquots, where one aliquot was treated with endonuclease and the other aliquot was not treated. The endonuclease reaction was prepared by mixing 35 pL of purified DNA. 5 pL of lOx cutsmart buffer, 1 pL of Dralll (NEB cat# R3510S), 1 pL BccI (NEB cat# R0704S), 1 pL MluCI (NEB cat# R0538S), and water up to 50 pL, and then incubated for 1 hour at 37 °C. Only samples digested with endonuclease were bead purified again with Agencourt Ampure XP beads (Beckman Coulter, cat. #A63881) using the bead to sample ratio of 0.8 and eluted in 40 pL of water.
[0064] All samples were then used as templates in a second PCR amplification reaction (four total samples: endonuclease treated or non-treated round 1 PCR products originating from cDNA using either READ1S-N16-LH21 or 10 pM READ1S-N16-LH14 primer.) These PCR reactions were prepared by mixing 17.25 pL of the round 1 PCR products, 0.75 pL of 10 mM dNTPs, 5 pL 5x KAPA polymerase buffer, 0.5 pL KAPA polymerase, 0.75 pL 10 pM P5os primer, 0.75 pL 10 pM P7-N7xx-READ2os primer, and water up to 25 pL in volume (see Table 1 for primer sequences.) The PCR reaction mixtures were cycled using the following protocol 2.
Protocol 2
• 95 °C for 2 min
• 30 repeated cycles of:
1. 98 °C for 10s
2. 65 °C for 20s 3. 72 °C for 1:30 min
[0065] The resulting PCR products were then run on a DNA gel. Both the endonuclease digested and undigested UDB PCR products using both second strand primer sequences resulted in similar PCR products of the expected size (~ 550 bp, see Figure 5). These 550 bp bands were gel purified using a Qiagen gel purification kit (cat# 28704.) The resulting DNA was then analyzed for concentration using Qubit and the dsDNAHigh Sensitivity assay (Thermo cat# Q32851.) The concentration values were used to mix the samples together to generate an NGS sequencing library.
[0066] Our analysis identified 2910 unique heavy chain and 244 unique kappa chain sequences clonally related to the hybridoma NG006.38E7. The sequences were then prioritized for recombinant expression based on frequency, lack of glycosylation sites, and other properties. A final number of 15 heavy chain sequence variants and 2 light chain variants were selected for recombinant expression pairs. Specifically, the 15 heavy chain variants were each paired for recombinant expression with the 1 light chain variant and the original wild-type light chain from NG006.38E7. Also, the original wild-type heavy chain of NG006.38E7 was also paired with a second light chain variant.
[0067] The resulting purified antibodies generated using the different heavy/light chain sequence pairs were then tested for binding by ELISA to the original immunization target used to generate the hybridoma clone NG006.38E7. The ELISA binding results showed that some combinations of the heavy and light chains retained binding to the intended target, while some knocked out binding.
[0068] The antibody combinations that were shown to retain target binding were then tested in a Jurkat cell NF AT luciferase functional assay and compared to a benchmark antibody (See Figure 6). The specific assay used to evaluate the functional activity of the antibody clones was the commercially available Promega NFAT luciferase assay (cat# J 1621) and the clones were compared to the commercially available benchmark antibody and the original NG006.38E7 antibody.

Claims

1. A method of detecting PCR chimeras, the method comprising the steps of:
(i) preparing a sample by:
(a) splitting a solution comprising different nucleic acids into a plurality of pools;
(b) attaching a 5’-end barcode and a 3’-end barcode to the nucleic acids; wherein each of the plurality of pools has a unique pair of the 5’-end barcode and the 3 ’-end barcode,
(c) mixing the plurality of pools, thereby making the sample,
(ii) performing one or more PCR amplifications on the sample;
(iii) sequencing products of the PCR amplifications; and
(iv) detecting the PCR chimeras by analyzing sequence pairs of the 5 ’-end barcode and the 3 '-end barcode of each of the products, wherein the sequence pairs of the 5 ’-end barcode and the 3 ’-end barcode of the PCR chimeras are different from that in each of the plurality of pools.
2. A method of sequencing different nucleic acids, the method comprising the steps of:
(i) preparing a sample by:
(a) splitting a solution comprising the different nucleic acids into a plurality of pools;
(b) attaching a 5 ’-end barcode and a 3 ’-end barcode to the nucleic acids; wherein each of the plurality of pools has a unique pair of the 5 '-end barcode and the 3 ’-end barcode,
(c) mixing the plurality of pools, thereby making the sample,
(ii) performing one or more PCR amplifications on the sample;
(iii) sequencing products of the PCR amplifications;
(iv) detecting PCR chimeras by analyzing sequence pairs of the 5 ’-end barcode and the
3 ’-end barcode of each of the products, wherein the sequence pairs of the 5 ’-end barcode and the 3 ’-end barcode of the PCR chimeras are different from that in each of the plurality of pools; and
(v) removing sequences of the PCR chimeras from sequences of the products.
3. The method of claim 1 or 2, wherein the step of (iv) detecting does not comprise comparing sequences of the products with sequences in any database.
4. The method of any one of claims 1-3, wherein the plurality of pools comprises no less than 3 pools.
5. The method of claim 4, wherein the plurality of pools comprises 3, 4. 5, 6, 7, 8. 9 or 10 pools.
6. The method of claim 4, wherein the wherein the plurality of pools comprises 4 pools.
7. The method of claim 4, wherein the plurality of pools comprises 11, 12, 13, 14, 15, 16,
17, 18, 19 or 20 pools.
8. The method of claims 1-7, wherein the solution comprises homologous molecules.
9. The method of any one of claims 1-8, wherein the solution comprises a mixture of antibodies from a polyclonal antibody sample from an immunized subject.
10. The method of any one of claims 1-9, wherein the polyclonal antibody sample is splenocyte RNA or DNA.
11. The method of any one of claims 1-8, wherein the solution comprises B cell receptor repertoire.
12. The method of any one of claims 1-8, wherein the solution comprises T cell receptor repertoire.
13. The method of any one of claims 1-12, wherein the PCR amplifications comprise a one- step PCR.
14. The method of any one of claims 1-13, wherein the PCR amplifications comprise a two- step PCR.
15. The method of any one of claims 1-14, wherein the PCR amplifications utilize a polymerase selected from the group consisting of Platinum®Pfx, GoTaq®, Q5®, and KAPA HiFi.
16. The method of any one of claims 1-15, wherein the step of (i) preparing the sample further comprises performing one or more PCR amplifications before the step of (a) splitting or (b) attaching.
17. The method of any one of claims 1-16, wherein the method detects about (N-l)/N of the PCR chimeras, when N is a number of the plurality of pools.
18. The method of any one of claims 1-17, wherein the nucleic acids in the solution are cDNAs transcribed from RNAs.
19. The method of claim 18, wherein the RNAs are from a lysate.
20. The method of claim 19. wherein the lysate is a lymphocyte lysate.
21. The method of any one of claims 1-20, wherein the PCR amplifications uses a J region primer.
PCT/US2024/041863 2023-08-17 2024-08-12 Methods for detecting pcr chimeras Pending WO2025038512A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363520156P 2023-08-17 2023-08-17
US63/520,156 2023-08-17

Publications (1)

Publication Number Publication Date
WO2025038512A1 true WO2025038512A1 (en) 2025-02-20

Family

ID=94633094

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2024/041863 Pending WO2025038512A1 (en) 2023-08-17 2024-08-12 Methods for detecting pcr chimeras

Country Status (1)

Country Link
WO (1) WO2025038512A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200392479A1 (en) * 2017-10-23 2020-12-17 The Broad Institute, Inc. Single cell cellular component enrichment from barcoded sequencing libraries
US20220034902A1 (en) * 2018-12-11 2022-02-03 Merck Sharp & Dohme Corp. Method for identifying high affinity monoclonal antibody heavy and light chain pairs from high throughput screens of b-cell and hybridoma libraries
WO2022173714A2 (en) * 2021-02-12 2022-08-18 Merck Sharp & Dohme Llc Antibodies that bind metapneumovirus, antigenic metapneumovirus proteins, and uses thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200392479A1 (en) * 2017-10-23 2020-12-17 The Broad Institute, Inc. Single cell cellular component enrichment from barcoded sequencing libraries
US20220034902A1 (en) * 2018-12-11 2022-02-03 Merck Sharp & Dohme Corp. Method for identifying high affinity monoclonal antibody heavy and light chain pairs from high throughput screens of b-cell and hybridoma libraries
WO2022173714A2 (en) * 2021-02-12 2022-08-18 Merck Sharp & Dohme Llc Antibodies that bind metapneumovirus, antigenic metapneumovirus proteins, and uses thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SHENG JENNY, HOD ELDAD A., VLAD GEORGE, CHAVEZ ALEJANDRO: "Quantifying protein abundance on single cells using split-pool sequencing on DNA-barcoded antibodies for diagnostic applications", SCIENTIFIC REPORTS, NATURE PUBLISHING GROUP, US, vol. 12, no. 1, US , XP093282787, ISSN: 2045-2322, DOI: 10.1038/s41598-022-04842-7 *

Similar Documents

Publication Publication Date Title
US20210047638A1 (en) Methods for Preparing a Next Generation Sequencing (NGS) Library from a Ribonucleic Acid (RNA) Sample and Compositions for Practicing the Same
JP6982087B2 (en) Building a Next Generation Sequencing (NGS) Library Utilizing Competitive Chain Substitution
WO2018089550A1 (en) Methods of producing amplified double stranded deoxyribonucleic acids and compositions and kits for use therein
TW201321518A (en) Method of micro-scale nucleic acid library construction and application thereof
EP3356552B1 (en) High molecular weight dna sample tracking tags for next generation sequencing
US11319576B2 (en) Methods of producing nucleic acid libraries and compositions and kits for practicing same
CN110869515B (en) Sequencing methods for genomic rearrangement detection
EP4592386A2 (en) Methods of targeted sequencing
AU2015209103B2 (en) Isothermal methods and related compositions for preparing nucleic acids
WO2016016639A1 (en) Improved nucleic acid sample analysis using convertible tags
US20230235391A1 (en) B(ead-based) a(tacseq) p(rocessing)
Gazestani et al. circTAIL-seq, a targeted method for deep analysis of RNA 3′ tails, reveals transcript-specific differences by multiple metrics
US20210032677A1 (en) Methods to Improve the Sequencing of Polynucleotides with Barcodes Using Circularisation and Truncation of Template
WO2025038512A1 (en) Methods for detecting pcr chimeras
Neiman et al. Decoding a substantial set of samples in parallel by massive sequencing
EP4259826A1 (en) Methods for sequencing polynucleotide fragments from both ends
US20250163407A1 (en) Methods selectively depleting nucleic acid using rnase h
US11999994B2 (en) Methods for production and quantification of unique molecular identifier-labeled beads
US20240076803A1 (en) Method for Library Preparation in Next Generation Sequencing by Enzymatic DNA Fragmentation
US20240344117A1 (en) A method for single-cell dna sequencing via in situ genomic amplification and combinatorial barcoding
JP2024543250A (en) Target enrichment and quantification using isothermal linear amplified probes
AU2023385733A1 (en) High-throughput amplification of targeted nucleic acid sequences
AU2021468499A1 (en) Methods for producing dna libraries and uses thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24854747

Country of ref document: EP

Kind code of ref document: A1