WO2025012226A1 - Procédé, composants et logiciel pour détecter une erreur systématique dans un système de détection de protéines - Google Patents
Procédé, composants et logiciel pour détecter une erreur systématique dans un système de détection de protéines Download PDFInfo
- Publication number
- WO2025012226A1 WO2025012226A1 PCT/EP2024/069234 EP2024069234W WO2025012226A1 WO 2025012226 A1 WO2025012226 A1 WO 2025012226A1 EP 2024069234 W EP2024069234 W EP 2024069234W WO 2025012226 A1 WO2025012226 A1 WO 2025012226A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- pattern
- protein
- predetermined
- values
- array
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/68—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
- G01N33/6803—General methods of protein analysis not limited to specific proteins or families of proteins
- G01N33/6845—Methods of identifying protein-protein interactions in protein mixtures
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Definitions
- the present invention relates to quality control (QC) of data produced from a protein detection system and in particular to methods, components and software for detecting systematic errors in a protein detection system.
- QC quality control
- Modern proteomics methods require the ability to detect a large number of different proteins (or protein complexes) in a small sample volume. To achieve this, multiplex analysis may be performed. Common methods by which multiplex detection of proteins in a sample may be achieved include dual -recognition immunoassays. Dualrecognition immunoassays build on a concept developed by Ulf Landegren and coworkers and described in Fredriksson et al., Nature Biotechnology, vol. 20, 2002, pp. 473-477 and W001/61037.
- Dual -recognition immunoassay methods include Proximity Extension Assay (PEA), commercially available from Olink Proteomics AB (Uppsala, Sweden).
- PEA Proximity Extension Assay
- WO 03/044231 WO 2004/094456, WO 2005/123963, WO 2006/137932, WO 2013/113699, WO 2021/191442, WO 2021/191448, WO 2021/191449, WO 2021/191450, and WO 2022/112300
- Lundberg et al. Molecular & Cellular Proteomics 10: 10.1074/mcp.Ml 10.004978, 1-10, 2011; and Wik et al., 2021, Mol Cell Proteomics 20, 100168 (https://doi.Org/10.1016/j.mcpro.2021.100168), all incorporated herein by reference in their entirety.
- PLA Proximity Ligation Assay
- RCA Rolling Circle Amplification
- Duolink® A PLA-based method for multiplex detection of proteins is described in WO 2021/113290.
- PEA and PLA are dual -recognition assays, which rely on the principle of “proximity probing”.
- an analyte is detected by the binding of multiple (i.e., two or more, generally two or three) probes, which when brought into proximity by binding to the analyte (hence “proximity probes") allow a signal to be generated.
- the proximity probes comprises a nucleic acid domain (or moiety) linked to the analyte-binding domain (or moiety) of the probe, and generation of the signal involves an interaction between the nucleic acid moieties and/or a further functional moiety which is carried by the other probe(s).
- signal generation is dependent on an interaction between the probes (more particularly between the nucleic acid or other functional moieties/domains carried by them) and hence only occurs when the necessary probes have bound to the analyte, thereby lending improved specificity to the detection system.
- nucleic acid moieties linked to the analyte-binding domains of a probe pair hybridise to one another when the probes are in close proximity (i.e., when bound to the same target molecule), and are then extended using a nucleic acid polymerase.
- the extension product forms a reporter nucleic acid, detection of which demonstrates the presence of a particular analyte (the analyte bound by the relevant probe pair) in a sample of interest.
- nucleic acid moieties linked to the analyte-binding domains of a probe pair come into proximity when the probes of the probe pair bind their target, and may be ligated together, or alternatively they may together template the ligation of separately added oligonucleotides which are able to hybridise to the nucleic acid domains when they are in proximity.
- the ligation product is then amplified, acting as a reporter nucleic acid.
- Multiplex analyte detection using PEA or PLA may be achieved by including one or more unique barcode sequences in the nucleic acid moiety of each probe.
- Oligonucleotides comprising a barcode sequence unique to a specific sample may further be added to the respective sample and incorporated into all reporter molecules generated from that sample.
- a reporter nucleic acid molecule corresponding to a particular analyte, and optionally a particular sample, may then be identified by the barcode sequences it contains.
- the methods of the present invention find particular utility in multiplex PEA and PLA methods.
- Panels of proximity assays are commercially available from Olink Proteomics AB (Uppsala, Sweden) under the trademark Olink® Target, Olink® Focus, Olink® Explore, and Olink® Flex. These are panels of up to 92 assays (Olink® Target, Olink® Focus and Olink® Flex) or up to -3,000 assays split over eight different panels (Olink® Explore).
- a panel may further be divided into a number, typically four, of “abundance blocks” and the samples may be diluted based on their predicted abundance prior to being incubated with the assay probes of the respective abundance block.
- Each panel generally includes assays for proteins that have known functions within certain biological or physiological areas, pathways or organs in the body, such as inflammation, organ-specific proteins, cardiovascular, neurology etc. It is also possible for a user to select a specific combination of protein assays that are of particular interest to create a tailor-made panel.
- Proximity based assays produce a number of reporter molecules with a certain barcode set-up, which number correlates to the amount of protein with the corresponding barcode. This number can be quantified by either quantitative Polymerase Chain Reaction (qPCR) or sequencing, preferably Next Generation Sequencing (NGS). qPCR produces a Ct value that corresponds to the amount of protein in the sample and NGS produces an actual number, called “counts”, of reporter molecules and the counts correlate with the amount of protein in the sample.
- qPCR quantitative Polymerase Chain Reaction
- NGS Next Generation Sequencing
- Olink uses an arbitrary relative quantification unit called NPX that may be calculated with software adapted for use with the above panels.
- a measure of relative protein quantity, such as NPX can be calculated from counts as described in Wik et al., cited above.
- the internal controls used for QC are an incubation control and an amplification control.
- the incubation control comprises PEA probes measuring a fixed concentration of nonhuman green fluorescent protein (GFP), added to each sample.
- the amplification control consists of a synthetic double-stranded DNA template and is used in QC to monitor the PCR steps in the protocol.
- External controls used for QC comprise a negative control (buffer only) run in triplicate, and a biological sample control, run in duplicate.
- the QC assessment is performed at two levels; run QC and sample QC.
- run QC each of the abundance blocks for each panel and sample plate should fulfil the following criteria: (i) mean absolute deviation (MAD) in internal controls may not exceed a certain threshold of relative protein quantity (such as 0.3 NPX) and (ii) deviation on the sample QC level is allowed for a maximum of one out of six samples.
- the median of at least 90% of the assays in plate and negative control samples must be in the accepted range from predefined values set during validation.
- the performance of each sample is assessed individually by the internal controls that should be within a predefined range of relative protein quantities (such as ⁇ 0.3 NPX) from the median level across the abundance block.
- CV coefficient of variation
- the current QC evaluations may not always be able to automatically detect quality issues that arise from malfunctioning equipment or, more frequently, from human errors. These types of quality issues are often categorized as systematic effects or systematic errors. Examples of such systematic errors include instrument calibration issues and reagent composition issues.
- a computer implemented method for detecting a systematic error in a protein detection system configured to determining an amount of each protein of a plurality of proteins in a plurality of samples, wherein each sample is provided to the protein detection system in an array of reaction containers at a random position of the array, the method comprising: for at least some proteins of the plurality of proteins: receiving, from the protein detection system, an amount of the protein in each sample; determining a pattern of the array for the protein by thresholding the determined amount for each sample, such that the determined amount of the protein for each sample is represented by one of a plurality of predetermined discrete values in the pattern; comparing the pattern with a plurality of predetermined patterns, wherein each predetermined pattern corresponding to a systematic error of a plurality of predetermined systematic errors of the protein detection system, and determining a similarity value for each of the predetermined patterns;
- the method further comprise, upon the data structure of candidate systematic errors indicates a first threshold number of instances of a first systematic error of the plurality of systematic errors, indicating, to a user of the protein detection system, that the array exhibits the first systematic error.
- an array of reaction containers should, in the context of present specification, be understood as a set of individual vessels or wells each of which can hold a separate reaction mixture.
- a microtiter plate microplate
- These plates come in various formats, with the number of wells typically being 6, 12, 24, 48, 96, 384, or even 1536.
- Each well in the plate acts as a separate reaction container, and the grid format allows many reactions to be conducted simultaneously, for example to determine an amount of each protein of a plurality of proteins (for example 1500-3000 proteins) in a plurality of samples (reaction mixtures).
- Using an array of reaction containers may be particularly useful for high-throughput screening, large-scale studies, or when sample volume is limited.
- the amounts of protein may for a sample may be received from the protein detection system as measure of relative protein quantity for the samples in the array, such as NPX.
- Relative protein quantity can be calculated from counts as described in Wik et al., cited above.
- an expected pattern of a measurement of an amount of protein in the samples provided in the array is a random pattern.
- technical errors in the workflow such as instrument calibration issues and reagent composition issues, may result in that such a pattern deviate from a random pattern and instead is similar to a predefined pattern associated with a technical error in the workflow.
- a pattern of the protein measurements (i.e., the amount of the currently analysed protein for each of the samples in the array) may be determined.
- the pattern thus represents the underlying biological concentration of the protein and the randomized position of samples on the array.
- This pattern may be automatically compared with a plurality of predetermined patterns, wherein each predetermined pattern corresponding to a systematic error of a plurality of predetermined systematic errors of the protein detection system.
- the method thus uses a library of patterns, e.g., represented as integer matrices, specific to systematic effects that may occur in the analysis workflow. Each library pattern thus represents one systematic effect arising from a problem/error/malfunctioning in the workflow.
- the systematic error corresponding to the similar predetermined pattern is marked as a candidate systematic error for the entire analysis of the array. This process is repeated for a plurality of proteins until either a threshold number of proteins have resulted in a pattern that is similar enough to a certain systematic error, which means that the array exhibits this systematic error, or otherwise until all patterns for all proteins measurable in the samples have been analysed (which means that the array does not exhibits any systematic error).
- the array exhibits a systematic error
- this is indicated to a user of the protein detection system, that may then take any suitable actions to e.g., verify the systematic error, remedy the systematic error, exclude the array, perform a rerun of the lab workflow for the array, etc.
- data structure of candidate systematic errors should, in the context of present specification, be understood as any suitable data structure for keeping track of which systematic errors that are candidate systematic errors for the array (i.e., possible systematic errors, determined based on the pattern similarities as discussed herein), and for how many proteins that each systematic error has been assessed as being a candidate systematic error.
- the data structure may for example be implemented as a table with each row corresponding to a systematic error of the plurality of systematic errors, and having a number representing a count of how many proteins whose patterns has been determined similar enough to the pattern corresponding to the systematic error.
- the present method provides an efficient and automatic way of analysing an array of samples for systematic errors. For example, by implementing the first threshold, computational resources and time may be saved since not all proteins of the plurality of proteins may need to be analysed before indicating, to a user of the protein detection system, that the array exhibits the first systematic error. Moreover, by determining a pattern of the array for the protein by thresholding the determined amount for each sample, such that the determined amount of the protein for each sample is represented by one of a plurality of predetermined discrete values in the pattern, a low complexity way of determining a pattern for measurement values of amounts of proteins may be achieved. A further advantage may be that using historical data, it may be possible to determine a library of predetermined patterns, where each predetermined pattern corresponds to a certain systematic error.
- the first threshold number is between 5-15% of the number of proteins.
- the first threshold may be set to reliably indicate that the array exhibits a certain systematic error. Moreover, such an interval may result in an advantageous balance between certainty (reliability) of the assessment that the array exhibits a certain systematic error and saving computational resource and/or analysis time.
- the step of thresholding comprises thresholding the determined amount of the protein for each sample, such that each determined amount of the protein is represented by one of: -1, 0, and +1 in the pattern, wherein each predetermined pattern consists of values selected from -1 and +1.
- determining a similarity value between the pattern and a predetermined pattern may be simplified. Moreover, using relatively few possible values in the pattern may still provide the possibility to determine predetermined patterns for systematic errors arising from the workflow of a protein detection system.
- the step of thresholding comprises calculating a median of the determined amount of the protein for each sample, wherein an amount exceeding the median with more than a threshold amount is set to +1, wherein an amount falling below the median with more than the threshold amount is set to -1, and an amount being within the threshold amount from the median is set to 0.
- the threshold amount may be calculated based on the median, such that the threshold is larger for a larger value indicating the median (generally larger amounts of the measured protein in the array), compared to for a lower value indicating the median (generally lower amounts of the measured protein in the array).
- a fixed threshold e.g. 0.45 NPX, 0.5 NPX, 0.63 NPX, etc.
- the amounts are thus “normalized” before being represented by 1, 0 or -1 in the pattern.
- average may be used instead of median.
- standard deviation may be used as a cutoff point for 1, 0 and -1.
- the method further comprises, for a determined pattern, counting the number of samples represented by -1 or 1, wherein the step of comparing the pattern with a plurality of predetermined patterns is only performed for patterns where the number of samples represented by -1 or 1 meets or exceeds a second threshold number.
- computer resources may be saved.
- reliability of the pattern comparison and the result thereof may be increased since a large enough number of protein measurements (i.e., amount of a certain protein in the samples of the array) must result in a 1 or -1 in order for the pattern comparison step to be performed.
- the second threshold number is between 40-66% of the number of samples.
- the second threshold may correspond to 40/96 samples being represented by a -1 or 1 for a certain protein for the pattern comparison for that protein to be performed. If less samples than the threshold number are represented by -1 or 1, further analysis (e.g., pattern matching) of the protein may be skipped and thus not resulting in any candidate systematic errors.
- the array of reaction containers consists of columns and rows, wherein at least a subset of predetermined patterns among the plurality of patterns have a same number of columns and rows as the number of columns and rows of reaction containers in the array, wherein the step of calculating the similarity value comprises, for each predetermined pattern from the subset of predetermined patterns, an elementwise multiplication between the pattern and the predetermined pattern to determine a similarity matrix, such that each element in the similarity matrix having the value 1, 0 or -1, and wherein the similarity value indicates a difference between a count of elements in the similarity matrix having the value 1 and a count of elements in the similarity matrix having the value -1.
- An elementwise multiplication between a matrix having 1, 0 or -1 as values (i.e., the pattern) and a matrix having -1 and 1 as values (i.e., the predetermined pattern) will produce a result matrix (similarity matrix) with a value 1 for indexes where the value of the pattern and the predetermined pattern have the same value (i.e., 1x1, or -lx-1), a value -1 for indexes where the value of the pattern and the predetermined pattern have the opposite value (i.e., lx-1, or -1x1), and zero otherwise (for indexes where the pattern has 0 as a value).
- the similarity value may thus advantageously indicate a difference between a count of elements in the similarity matrix having the value 1 and a count of elements in the similarity matrix having the value -1 which thus compares the number of corresponding values with the number of opposite values in the two patterns.
- the zeros in the similarity matrix may advantageously be ignored when calculating the similarity value to reduce noise as discussed above.
- the similarity value is calculated by dividing an absolute value of the difference with a count of elements in the similarity matrix having the value 1 or -1. Consequently, also a pattern that is (similar enough to) an inverted version of a predetermined pattern will give a high similarity score, which advantageously reduces the number of predetermined patterns needed to represent possible systematic errors. Generally, a predetermined pattern and an inverted version of the predetermined pattern may both represent a certain systematic effect.
- a predetermined pattern from the subset of predetermined patterns is one of: a vertically striped pattern with alternating columns having values 1 and -1, a horizontally striped pattern with alternating rows having values 1 and -1, a diagonally striped pattern with alternating diagonals having values 1 and -1, or, a regional pattern, wherein the pattern consist of two regions, wherein one region of the pattern has 1 as values and the other region of the pattern has -1 as values.
- the array of reaction containers consists of columns and rows, and wherein a first predetermined pattern among the plurality of patterns indicates a row in the pattern having at least a third threshold number of 1 :s as values, or at least the third threshold of -1 :s as values, wherein the step of calculating the similarity value between the pattern and the first predetermined pattern comprises: counting a number of rows in the pattern having at least the third threshold number of 1 :s as values, or at least the third threshold of -l :s as values.
- the array of reaction containers consists of columns and rows, and wherein a second predetermined pattern among the plurality of patterns indicates a column in the pattern having at least a fourth threshold number of l :s as values, or at least the fourth threshold of -1 :s as values, wherein the step of calculating the similarity value between the pattern and the second predetermined pattern comprises: counting a number of columns in the pattern with values having at least the fourth threshold number of 1 :s as values, or at least the fourth threshold of -1 :s as values.
- a second predetermined pattern among the plurality of patterns indicates a column in the pattern having at least a fourth threshold number of l :s as values, or at least the fourth threshold of -1 :s as values
- the step of calculating the similarity value between the pattern and the second predetermined pattern comprises: counting a number of columns in the pattern with values having at least the fourth threshold number of 1 :s as values, or at least the fourth threshold of -1 :s as values.
- a protein detection system configured to determining an amount of each protein of a plurality of proteins in a plurality of samples, wherein each sample is provided to the protein detection system in an array of reaction containers at a random position of the array
- the protein detection system comprising an error detection component comprising: one or more processors; and one or more non-transitory computer-readable media storing computer executable instructions that, when executed by the one or more processors, cause the system to perform actions comprising: for at least some proteins of the plurality of proteins: receiving, from the protein detection system, an amount of the protein in each sample; determining a pattern of the array for the protein by thresholding the determined amount for each sample, such that the determined amount of the protein for each sample is represented by one of a plurality of predetermined discrete values in the pattern; comparing the pattern with a plurality of predetermined patterns, wherein each predetermined pattern corresponding to a systematic error of a plurality of predetermined systematic errors of the protein detection system, and
- the instructions may further cause the system to perform the action of upon the data structure of candidate systematic errors indicates a first threshold number of instances of a first systematic error of the plurality of systematic errors, indicating, to a user of the protein detection system, that the array exhibits the first systematic error.
- the protein detection system is configured to determining an amount of each protein of a plurality of proteins in a plurality of samples using an immunoassay based technology. It should be noted that the QC techniques described herein may be used for other suitable technologies within proteomics, such as other affinity-based methods.
- the above object is achieved by a non-transitory computer-readable storage medium having stored thereon instructions for implementing the method according to the first aspect when executed on a device having processing capabilities.
- the second and third aspects may generally have the same features and advantages as the first aspect. It is further noted that the disclosure relates to all possible combinations of features unless explicitly stated otherwise. Further features and advantages of the invention will become apparent from the following description of preferred embodiments of the invention, given by way of example only, which is made with reference to the accompanying drawings.
- Figure 1 shows a protein detection system according to embodiments
- Figure 2 shows a plot of protein amounts for an array of reaction containers according to embodiments
- Figure 3 shows four examples of predetermined patterns, each corresponding to a systematic error, according to embodiments,
- Figure 4 shows a comparison between a pattern of the array for a protein and a predetermined pattern from figure 3, resulting in a similarity matrix that can be used to calculate a similarity score, according to embodiments,
- Figure 5 shows a flow chart of a method for detecting a systematic error in a protein detection system, according to embodiments.
- Quality control is critical in all areas of laboratory research, including immunoassay based technologies.
- Multiplexed immunoassays is a group of high- throughput techniques used for analysing multiple proteins in small sample volumes, often used in proteomics studies. It involves multiple steps, as exemplified above. Each of these steps has the potential to introduce variability or errors, which could affect the final results. Therefore, careful QC is necessary to ensure that the assay is working as expected. For example, given its high-throughput nature, small systematic errors in multiplexed immunoassays can affect a large number of measurements, leading to potentially significant impacts on the final results.
- the present disclosure describes the present invention primarily with reference to its application in Proximity Extension Assay (PEA), but the invention is equally applicable to other types of multiplexed protein detection assays performed in an arrayed set of reaction containers, such as multiplexed Proximity Ligation Assay (PLA).
- PPA Proximity Extension Assay
- the present disclosure uses the applicant’s unit NPX as a measure of relative protein quantity. While this is the present standard unit for relative protein quantification, other measures of relative protein quantity may be used as appreciated by the skilled person.
- Figure 1 shows by way of example a protein detection system 102, for example using immunoassay based technologies to determine an amount of each protein of a plurality of proteins in a plurality of samples, the plurality of samples provided to the protein detection system 102 using an array of reaction containers.
- the protein detection system 102 comprises a protein counting component 104, which may be configured to measure a concentration (count) of each protein of the plurality of proteins for all samples provided in the array. The counts for a protein thus correlate with the amount of that protein in the samples.
- the protein detection system 102 may further comprise a NPX calculation unit 106 configured to determine NPX values (arbitrary relative quantification unit) for the proteins for each sample provided in the array.
- NPX calculation unit is not part of the protein detection system, and QC is performed directly on the counts from the protein counting component 104.
- the protein detection system 102 further comprises an error detection component 108.
- the purpose of the error detection component 108 is to perform QC on data produced from the protein counting component 104 and/or the NPX calculation component 106. In particular, it may be a purpose of the error detection component 108 to identify systematic errors that may have influenced the protein counts of the samples. This may be done by separately analysing the amounts of each protein, i.e., protein by protein from the plurality of proteins, as outputted by the multiplex detection of proteins in a sample.
- error detection component 108 identifies that the array may exhibit a certain systematic error, this may be indicated to a user/operator of the protein detection system 102 as a QC warning using an error indication component 110.
- the error indication component 108 may for example log the systematic error in a QC log file, show the QC warning on display of the protein detection system, indicate the QC warning using audio or light (using speakers, light emitters, etc.), or by any other suitable means.
- the relative protein quantity (e.g., NPX values) or protein count for each protein (assay) can be presented for all samples in an array in a plot, herein referred to as a plate plot.
- Figure 2 shows by way of example a plate plot representing a measured concentration of a protein over a 96-well plate, meaning that the array of reaction containers consists of 12 columns and 8 rows.
- This type of pattern will vary for every array and protein.
- the darkness of each square indicates the NPX (darker means a higher NPX value), and the position of the square indicates the sample.
- Systematic errors in the workflow may affect the protein counts and result in systematically erroneous NPX values in the Plate Plot.
- This disclosure provides an automatic method for detecting if such systematic effects have occurred.
- the inventors have realized that pattern that occurs for plate plots influenced by some technical errors may be derivable from the technical error (i.e., a certain pattern may indicate a certain systematic error) and deviate from a random pattern as seen in Figure 2.
- Figure 5 shows computer implemented method 500 for detecting a systematic error in a protein detection system (e.g., the protein detection system 102 in figure 2). Parts of the method 500 may be performed protein by protein until it can be determined whether the array do or do not exhibit any of a set of predetermined systematic errors. In other words, if enough assays/proteins are affected, the array may get a systematic effects warning corresponding to that effect.
- a protein detection system e.g., the protein detection system 102 in figure 2
- Parts of the method 500 may be performed protein by protein until it can be determined whether the array do or do not exhibit any of a set of predetermined systematic errors. In other words, if enough assays/proteins are affected, the array may get a systematic effects warning corresponding to that effect.
- the method 500 comprises receiving S502, from the protein detection system (e.g., from the protein counting component 104 or from the NPX calculation unit 106 in figure 1), an amount of the protein in each sample.
- the amount may be received in the form of a NPX value, calculated as set out in Wik et al as described above.
- the NPX values may be visualized in a plate plot as shown in figure 2.
- the received amounts are thresholded.
- the thresholded amounts may be used to determine S504 a pattern if the array for the protein, such that determined amount of the protein for each sample is represented by one of a plurality of predetermined discrete values in the pattern.
- the predetermined discrete values are -1, 0 and +1, such that each determined amount of the protein is represented by one of: -1, 0, and +1 in the pattern, but other scales and granularity of the predetermined discrete values may be possible.
- the thresholding comprises calculating a median of the determined amount of the protein for each sample.
- the amounts determined for a protein are thus median centred.
- the median centred NPX used specifically for systematic effects calculations, is below denoted mcNPX a s (median centered NPX).
- plate _median a s medlan ⁇ ExtN PX a s
- - mcNPX a s ExtNPX a s — plate _median a s ; a e ⁇ assay ⁇ , s e ⁇ SAMPLE ⁇
- ExtNPX a s The NPX values for a certain sample and protein (assay) is denoted ExtNPX a s . It should be noted that within a plate, ExtNPX a s has the same properties as NPX and is in effect the same as NPX. a denotes all assays (proteins), and s denotes all samples on the plate.
- the corresponding thresholded value is set to +1. If the amount is falling below the median with more than the threshold amount, the corresponding thresholded value is set to -1. In all other cases, when the amount being within the threshold amount from the median, the corresponding thresholded value is set to 0.
- SYS_EFFECT_SIZE denotes the threshold amount
- the threshold amount SYS_EFFECT_SIZE may for example be set to 0.5 NPX, 0.3 NPX, etc. Any other suitable threshold may be used depending on the requirements of the implementation of the techniques described herein.
- the pattern of the array for the amounts of a certain protein may thus be determined S504.
- Figure 4 shows an example of such a pattern 402 where black squares represents the thresholded value 1, white squares represent the thresholded value -1, and grey squares represents the thresholded value 0.
- the method comprises counting the number of samples represented by -1 or 1 (samples where a systematic effect/error may be suspected), and if the number (denoted by # samples below) does not meet or exceed a second threshold number (such as 38, 40, 45, 52 for an array of 96 reaction containers all holding a sample, denoted by N SAMPLES below), it may be determined that the protein as such does not seems to exhibit a systematic effect in the array and the method may proceed by analysing the next protein.
- a second threshold number such as 38, 40, 45, 52 for an array of 96 reaction containers all holding a sample, denoted by N SAMPLES below
- Figure 3 shows a plurality of predetermined patterns 302, 304, 306, 308.
- Each pattern 302, 304, 306, 308 corresponds to a systematic error of a plurality of predetermined systematic errors of the protein detection system. It should be noted that the patterns in figure 3 is shown by way of example, and that more patterns, a combination of the patterns in figure 3, or less patterns may be implemented. All patterns 302, 304, 306, 308 in figure 3 corresponds to arrays of reaction containers with 12 columns and 8 rows. For protein detection systems using arrays with other sizes, the size of the patterns needs to be changed accordingly. In the patterns 302, 304, 306, 308, black represents a value of 1, and white represents a value of -1.
- Figure 3 shows a horizontally striped pattern 302 with alternating rows having values 1 and -1.
- a horizontally striped pattern may comprise alternating sets of rows having values 1, and -1, such as 2 rows with value 1, and then 2 rows with value -1, etc.
- Figure 3 shows a vertically striped pattern 304 with alternating columns having values 1 and -1.
- a vertically striped pattern 304 may comprise alternating sets of columns having values 1, and -1, such as 2 columns with value 1, and then 2 columns with value -1, etc.
- Figure 3 further shows a diagonally striped pattern 306 with alternating diagonals having values 1 and -1.
- a diagonally striped pattern 303 may comprise alternating sets of diagonals having values 1, and -1, such as 2 diagonals with value 1, and then 2 diagonals with value -1, etc.
- figure 3 shows a regional pattern 308, wherein the pattern consists of two regions, wherein one region of the pattern has 1 as values and the other region of the pattern has -1 as values.
- a regional pattern may be divided into the two regions in a vertical direction (like shown in figure 3), in a horizontal direction, and in a diagonal direction.
- Each of the patterns 302, 304, 306, 308 thus corresponds to a systematic error of a plurality of predetermined systematic errors of the protein detection system. It should be noted that it is possible for a single effect to be represented by several patterns, and that a single array can exhibit more than one systematic error. For example, a systematic effect resulting in a diagonally striped pattern may result in any of four different patterns due to the shifts in the diagonal. A protein (assay) may be considered to display a diagonal effect if any of the four patterns is determined to be similar enough to the pattern determined for the protein.
- Figure 4 shows by way of example how a similarity between a predetermined pattern 302, 304, 306, 308 and a pattern 402 of a protein may be determined. This is done by comparing S506 the pattern 402 of the protein with the predetermined patterns 302, 304, 306, 308 and determining a similarity value for each of the predetermined patterns.
- the array of reaction containers analysed by the protein detection system consists of 12 columns and 8 rows.
- the predetermined patterns 302, 304, 306, 308 each has a same number of columns and rows as the number of columns and rows of reaction containers in the array.
- Comparison S506 may be accomplished by an elementwise multiplication between the pattern and the predetermined pattern to determine a similarity matrix 404.
- the similarity matrix is denoted by mNPX a s and the predetermined pattern is denoted by mask.
- mNPX a s stdNPX a s * mask
- a result matrix (similarity matrix 404) with a value 1 for indexes where the value of the pattern 402 and the predetermined pattern 302 have the same value (i.e., 1x1, or -lx-1), a value -1 for indexes where the value of the pattern 402 and the predetermined pattern 302 have the opposite value (i.e., lx-1, or -1x1), and zero otherwise (for indexes where the pattern 402 has 0 as a value).
- the similarity matrix 404 is visualized using black to represent values being 1, white to represent values being -1, and grey to represent values being 0.
- the similarity value may then be determined based on a difference between a count of elements in the similarity matrix 404 having the value 1 and a count of elements in the similarity matrix 404 having the value -1.
- the similarity value is calculated by dividing an absolute value of the difference with a count of elements in the similarity matrix having the value 1 or -1.
- the similarity value is then compared to a pattern specific similarity threshold. If the similarity value does not meet or exceed the similarity threshold, it is determined that the amounts of the protein as determined by the protein detection system for the samples in the array does not indicate the systematic error that the predetermined pattern 302 of the comparison represents.
- a predetermined pattern may include the value 0 as well, to indicate areas of the array which for any reason is not interesting or will obscure the effect that is looked for.
- a zero in the predetermined pattern will result in a zero in the similarity matrix no matter what value the array (and thus the pattern representing the array) has for the corresponding parts.
- the above equation for calculating the similarity value effects_ratio a s still applies.
- a predetermined pattern used to identify a systematic error do not necessarily represent a full plate array.
- a predetermined pattern may indicate a row in the pattern having only 1 or only -1 as values, or alternatively at least a threshold number (such as 7, 8, 10 etc. for an array with 12 columns) of l:s or -l:s.
- the step of calculating the similarity value between the pattern and such a row-based predetermined pattern comprises counting a number of rows in the pattern with values having only 1 or only -1 as values, or rows in the pattern having at least the threshold number of l:s or -l:s.
- a predetermined pattern may indicate a column in the pattern having only 1 or only -1 as values, or alternatively at least a threshold number (such as 6, 7, 8, etc. for an array with 8 rows) of 1 :s or -1 :s.
- the step of calculating the similarity value between the pattern and the second predetermined pattern comprises counting a number of columns in the pattern with values having only 1 or only -1 as values, or columns in the pattern having at least the threshold number of 1 :s or -1 :s.
- the systematic error corresponding to the predetermined pattern may be indicated S510 in a data structure of candidate systematic errors.
- the data structure of candidate systematic errors may be compared to a log of all systematic errors that has been indicated for proteins as described above.
- the data structure may be represented by a table or a list, or any other way of keeping track of how many proteins for which the amounts measured by the protein detection system indicate a certain systematic error.
- This data structure is used to determine if it should be indicated to a user/operator of the protein detection system that the array exhibits a systematic error.
- the decision is taken based on a threshold, such as for example 7%, 10%, 12%, 15% of the proteins have been identified as indicating a certain systematic error (e.g., the similarity value between the pattern of the protein and the predetermined patterns corresponding to the certain systematic error have met or exceeded the pattern specific similarity threshold for at least the threshold number of proteins).
- the data structure may be checked regularly, for example after each protein have been analysed, or after every second, third etc. protein such that the analysis process of the remaining proteins may be aborted as soon as it is determined that the array exhibits a systematic error. In other embodiments, all proteins are analysed, and the data structure is examined for systematic errors meeting or exceeding the threshold at the end.
- the method 500 shown in figure 5 may be implemented by the protein detection system shown in figure 1, or in a separate device connected to a protein detection system.
- the method 500 is implemented in the cloud, e.g., as a SaaS (Software as a Service) solution.
- SaaS Software as a Service
- the device/devices implementing the method 500 and other functionality described herein may comprise circuitry which is configured to implement the method, and some/all of the components 104, 106, 108, 110 and, more specifically, their functionality.
- the features described herein can be implemented advantageously in one or more computer programs that are executable on a programmable system including at least one programmable processor.
- Suitable processors for the execution of a program of instructions include, by way of example, both general and special purpose microprocessors, and the sole processor or one of multiple processors or cores, of any kind of computer.
- the processors can be supplemented by, or incorporated in, ASICs (application-specific integrated circuits).
- a priority order of the predetermined systematic errors may be used.
- An example of such priority order is that full plate (mask) patterns (as shown in figure 3) may take precedence over the row/column effects. If the amounts of a certain protein are determined to show a pattern corresponding to a mask, no analysis of row/column effects for the amounts of that protein may be done.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Molecular Biology (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Biotechnology (AREA)
- Theoretical Computer Science (AREA)
- Hematology (AREA)
- Immunology (AREA)
- Evolutionary Biology (AREA)
- Urology & Nephrology (AREA)
- Chemical & Material Sciences (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biomedical Technology (AREA)
- Medicinal Chemistry (AREA)
- Databases & Information Systems (AREA)
- Analytical Chemistry (AREA)
- General Physics & Mathematics (AREA)
- Pathology (AREA)
- Artificial Intelligence (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Biochemistry (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Food Science & Technology (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Microbiology (AREA)
- Cell Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Genetics & Genomics (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
La présente invention concerne un contrôle de qualité (QC) de données produites par un système de détection de protéines et en particulier des procédés, des composants et un logiciel pour détecter des erreurs systématiques dans un système de détection de protéines. Les procédés consiste à recevoir (S502) une quantité de la protéine dans chaque échantillon ; déterminer (S504) un modèle de la série pour la protéine par seuillage de la quantité déterminée pour chaque échantillon, de telle sorte que la quantité déterminée de la protéine pour chaque échantillon est représentée par l'une d'une pluralité de valeurs discrètes prédéterminées dans le modèle ; comparer (S506) le modèle à une pluralité de modèles prédéterminés, chaque modèle prédéterminé correspondant à une erreur systématique dans le système de détection de protéines, et déterminer une valeur de similarité pour chacun des modèles prédéterminés ; lorsque la valeur de similarité dépasse (S508) un seuil de similarité spécifique de modèle, indiquer (S510) l'erreur systématique dans une structure de données d'erreurs systématiques candidates.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP23185080.1 | 2023-07-12 | ||
| EP23185080 | 2023-07-12 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025012226A1 true WO2025012226A1 (fr) | 2025-01-16 |
Family
ID=87280193
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP2024/069234 Pending WO2025012226A1 (fr) | 2023-07-12 | 2024-07-08 | Procédé, composants et logiciel pour détecter une erreur systématique dans un système de détection de protéines |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025012226A1 (fr) |
Citations (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2001061037A1 (fr) | 2000-02-18 | 2001-08-23 | Ulf Landegren | Procedes et trousses de detection de proximite |
| WO2003044231A1 (fr) | 2001-11-23 | 2003-05-30 | Simon Fredriksson | Procede et kit pour le sondage de proximite au moyen de sondes de proximite polyvalentes |
| WO2004094456A2 (fr) | 2003-04-18 | 2004-11-04 | Becton, Dickinson And Company | Amplification immunologique |
| WO2005123963A2 (fr) | 2004-06-14 | 2005-12-29 | The Board Of Trustees Of The Leland Stanford Junior University | Procedes et compositions destines a la detection d'analytes au moyen de sondes de proximite |
| WO2006137932A2 (fr) | 2004-11-03 | 2006-12-28 | Leucadia Technologies, Inc. | Detection homogene de substance a analyser |
| US20090218401A1 (en) * | 2006-05-11 | 2009-09-03 | Singular Id Pte Ltd | Method of identifying an object, an identification tag, an object adapted to be identified, and related device and system |
| WO2013113699A2 (fr) | 2012-01-30 | 2013-08-08 | Olink Ab | Dosage d'extension par sonde de proximité avec une adn polymérase hyperthermophile |
| WO2021003470A1 (fr) * | 2019-07-03 | 2021-01-07 | Nautilus Biotechnology, Inc. | Approches de décodage pour l'identification de protéines et de peptides |
| WO2021113290A1 (fr) | 2019-12-03 | 2021-06-10 | Alamar Biosciences, Inc. | Dosage immunologique-sandwich lié à un acide nucléique (nulisa) |
| WO2021191448A1 (fr) | 2020-03-27 | 2021-09-30 | Olink Proteomics Ab | Procédé de détection d'analytes |
| WO2021191450A1 (fr) | 2020-03-27 | 2021-09-30 | Olink Proteomics Ab | Commandes pour dosages de détection de proximité |
| WO2022112300A1 (fr) | 2020-11-25 | 2022-06-02 | Olink Proteomics Ab | Procédé de détection d'analyte utilisant des concatémères |
-
2024
- 2024-07-08 WO PCT/EP2024/069234 patent/WO2025012226A1/fr active Pending
Patent Citations (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2001061037A1 (fr) | 2000-02-18 | 2001-08-23 | Ulf Landegren | Procedes et trousses de detection de proximite |
| WO2003044231A1 (fr) | 2001-11-23 | 2003-05-30 | Simon Fredriksson | Procede et kit pour le sondage de proximite au moyen de sondes de proximite polyvalentes |
| WO2004094456A2 (fr) | 2003-04-18 | 2004-11-04 | Becton, Dickinson And Company | Amplification immunologique |
| WO2005123963A2 (fr) | 2004-06-14 | 2005-12-29 | The Board Of Trustees Of The Leland Stanford Junior University | Procedes et compositions destines a la detection d'analytes au moyen de sondes de proximite |
| WO2006137932A2 (fr) | 2004-11-03 | 2006-12-28 | Leucadia Technologies, Inc. | Detection homogene de substance a analyser |
| US20090218401A1 (en) * | 2006-05-11 | 2009-09-03 | Singular Id Pte Ltd | Method of identifying an object, an identification tag, an object adapted to be identified, and related device and system |
| WO2013113699A2 (fr) | 2012-01-30 | 2013-08-08 | Olink Ab | Dosage d'extension par sonde de proximité avec une adn polymérase hyperthermophile |
| WO2021003470A1 (fr) * | 2019-07-03 | 2021-01-07 | Nautilus Biotechnology, Inc. | Approches de décodage pour l'identification de protéines et de peptides |
| WO2021113290A1 (fr) | 2019-12-03 | 2021-06-10 | Alamar Biosciences, Inc. | Dosage immunologique-sandwich lié à un acide nucléique (nulisa) |
| WO2021191448A1 (fr) | 2020-03-27 | 2021-09-30 | Olink Proteomics Ab | Procédé de détection d'analytes |
| WO2021191442A1 (fr) | 2020-03-27 | 2021-09-30 | Olink Proteomics Ab | Procédé de détection d'analytes d'abondance variable |
| WO2021191449A1 (fr) | 2020-03-27 | 2021-09-30 | Olink Proteomics Ab | Procédé de détection d'analytes |
| WO2021191450A1 (fr) | 2020-03-27 | 2021-09-30 | Olink Proteomics Ab | Commandes pour dosages de détection de proximité |
| WO2022112300A1 (fr) | 2020-11-25 | 2022-06-02 | Olink Proteomics Ab | Procédé de détection d'analyte utilisant des concatémères |
Non-Patent Citations (8)
| Title |
|---|
| ASSARSSON ET AL., PLOS, vol. 1, no. 9, 4, 2014, pages e95192 |
| CARAUS IURIE ET AL: "Detecting and removing multiplicative spatial bias in high-throughput screening technologies", vol. 33, no. 20, 15 October 2017 (2017-10-15), GB, pages 3258 - 3267, XP093109227, ISSN: 1367-4803, Retrieved from the Internet <URL:https://academic.oup.com/bioinformatics/article-pdf/33/20/3258/49042915/bioinformatics_33_20_3258.pdf> DOI: 10.1093/bioinformatics/btx327 * |
| FREDRIKSSON ET AL., NATURE BIOTECHNOLOGY, vol. 20, 2002, pages 473 - 477 |
| KEVORKOV DMYTRO ET AL: "Statistical Analysis of Systematic Errors in High-Throughput Screening", vol. 10, no. 6, 1 September 2005 (2005-09-01), pages 557 - 567, XP093109431, ISSN: 2472-5552, Retrieved from the Internet <URL:https://doi.org/10.1177/1087057105276989> DOI: 10.1177/1087057105276989 * |
| LECLERCQ MICKAEL ET AL: "High-Throughput Screening well correction", 1 January 2009 (2009-01-01), XP093109220, Retrieved from the Internet <URL:http://www.info2.uqam.ca/~makarenkov_v/BIF7002/Rapport_Makarenkov_2009/#tag0> [retrieved on 20231205] * |
| LUNDBERG ET AL., MOLECULAR & CELLULAR PROTEOMICS, 2011, pages 1 - 10 |
| MAZOURE BOGDAN ET AL: "Identification and correction of spatial bias are essential for obtaining quality data in high-throughput screening technologies", SCIENTIFIC REPORTS, vol. 7, no. 1, 1 December 2017 (2017-12-01), US, XP093109223, ISSN: 2045-2322, DOI: 10.1038/s41598-017-11940-4 * |
| WIK ET AL., MOL CELL PROTEOMICS, vol. 20, 2021, pages 100168, Retrieved from the Internet <URL:https://doi.org/10.1016/j.mcpro.2021.100168> |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| RU2015123570A (ru) | Цифровой анализ молекулярных анализируемых веществ с использованием одномолекулярного обнаружения | |
| US11474107B2 (en) | Digital analysis of molecular analytes using electrical methods | |
| IL315032A (en) | Analysis of cell-free dna in urine and other samples | |
| KR102402002B1 (ko) | 미세부수체 불안정성을 결정하기 위한 시스템 및 방법 | |
| Whale et al. | Digital PCR can augment the interpretation of RT-qPCR Cq values for SARS-CoV-2 diagnostics | |
| Doan et al. | MIPP-Seq: ultra-sensitive rapid detection and validation of low-frequency mosaic mutations | |
| EP3090372A1 (fr) | Détection et correction de sauts dans des signaux pcr en temps réel | |
| CN113450871A (zh) | 基于低深度测序的鉴定样本同一性的方法 | |
| Suzuki et al. | Establishment of preanalytical conditions for microRNA profile analysis of clinical plasma samples | |
| JP5810078B2 (ja) | 核酸定量方法 | |
| WO2025012226A1 (fr) | Procédé, composants et logiciel pour détecter une erreur systématique dans un système de détection de protéines | |
| Cheranova et al. | RNA-seq analysis of transcriptomes in thrombin-treated and control human pulmonary microvascular endothelial cells | |
| Dai et al. | Precision and Accuracy in Quantitative Measurement of Gene Expression from Single-Cell/Nuclei RNA Sequencing Data | |
| Marín-Romero et al. | MAGPIX and FLEXMAP 3D Luminex platforms for direct detection of miR-122-5p through dynamic chemical labelling | |
| Ahmed et al. | Performance evaluation of cardiac troponin I assay: a comparison between the point-of-care testing radiometer AQT90 FLEX and the central laboratory siemens advia centaur analyzer | |
| CA3186374A1 (fr) | Procedes et systemes de test a haut debit de pathogenes | |
| KR20140132343A (ko) | 판정 방법, 판정 장치, 판정 시스템 및 프로그램 | |
| US20150347674A1 (en) | System and method for analyzing biological sample | |
| CN104178563B (zh) | 用于核酸样品的测量方法 | |
| US20160265051A1 (en) | Methods for Detection of Fetal Chromosomal Abnormality Using High Throughput Sequencing | |
| KR20230012033A (ko) | 다형 좌위 신호의 신뢰성 값의 산출 방법 | |
| WO2025031642A1 (fr) | Procédé pour déterminer les erreurs d'exécution d'une méthode d'analyse pour la quantité d'une pluralité d'analytes dans une pluralité d'échantillons biologiques | |
| Van Paemel et al. | Minimally invasive classification of pediatric solid tumors using reduced representation bisulfite sequencing of cell-free DNA: a proof-of-principle study | |
| Kubik et al. | Guidelines for accurate genotyping of SARS-CoV-2 using amplicon-based sequencing of clinical samples | |
| Gubler | High-throughput screening data analysis |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24748266 Country of ref document: EP Kind code of ref document: A1 |