WO2024089143A1 - Determining hplc method parameters using machine learning - Google Patents
Determining hplc method parameters using machine learning Download PDFInfo
- Publication number
- WO2024089143A1 WO2024089143A1 PCT/EP2023/079867 EP2023079867W WO2024089143A1 WO 2024089143 A1 WO2024089143 A1 WO 2024089143A1 EP 2023079867 W EP2023079867 W EP 2023079867W WO 2024089143 A1 WO2024089143 A1 WO 2024089143A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- column
- hplc
- parameters
- hplc method
- compounds
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N30/00—Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
- G01N30/02—Column chromatography
- G01N30/86—Signal analysis
- G01N30/8693—Models, e.g. prediction of retention times, method development and validation
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N30/00—Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
- G01N30/02—Column chromatography
- G01N30/86—Signal analysis
- G01N30/8658—Optimising operation parameters
- G01N30/8662—Expert systems; optimising a large number of parameters
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/70—Machine learning, data mining or chemometrics
Definitions
- the present invention relates to computer-implemented methods for predicting HPLC retention time for one or more compounds using machine learning models.
- the present invention relates to the use of machine learning models for identifying a set of HPLC method parameters suitable for separating two or more compounds in a composition, and to related systems and devices.
- HPLC High-Performance Liquid Chromatography
- the present invention provides machine learning models for predicting HPLC retention time for one or more compounds, and for identifying a set of HPLC method parameters suitable for separating two or more compounds in a composition.
- a first aspect provides a method of predicting a HPLC retention time for one or more compounds, the method comprising: (a) obtaining the values of one or more structural and/or physicochemical properties of said compounds, and one or more HPLC method parameters; and (b) using a machine learning model to predict a retention time for each of said compounds when subjected to HPLC using the one or more HPLC method parameters, wherein the machine learning model has been trained using a training dataset comprising molecular structural properties and/or physicochemical properties for one or more compounds, and for each compound, one or more sets of chromatographic data comprising a retention time for the respective compound and associated HPLC method parameters.
- the present inventors have identified that it was possible to train a machine learning model to predict chromatographic data including the retention time for respective compounds when separated using associated HPLC method parameters provided as input to the machine learning model, with high prediction accuracy.
- the inventors further recognised that these highly accurate predictions that are HPLC-method could be used to identify HPLC method parameters suitable for separating two or more compounds in a composition.
- a method of identifying a set of HPLC method parameters suitable for separating two or more compounds in a composition comprising said compounds comprising: (a) performing the method of the first aspect for a plurality of sets of HPLC method parameters; (b) calculating one or more separation performance metrics using the results of step (a); and (c) identifying one or more set(s) of HPLC method parameters by applying one or more criteria on said separation performance metrics.
- a method for providing a tool for predicting a HPLC retention time for one or more compounds comprising: (a) obtaining a training dataset comprising the values of one or more structural and/or physicochemical properties for one or more compounds, and for each compound, one or more sets of chromatographic data comprising a retention time for the respective compound and associated HPLC method parameters; and (b) training a machine learning model to predict a retention time for a compound when subjected to HPLC using the one or more HPLC method parameters, using as input the values of the one or more structural and/or physicochemical properties for the compound, and values of said HPLC method parameters.
- Figure 1 shows a schematic representation of an HPLC System.
- Figure 2 depicts a flow chart diagram of an example method of predicting a HPLC retention time (A) and identifying a set of HPLC method parameters suitable for separating two or more compounds (B). Steps illustrated in boxes with dashed outlines are optional.
- Figure 3 shows the gradient of SAM-0200368 with the time points marked as tp_1 , tp_2 and tp_3.
- Figure 4 is a star schema representing the consolidated data.
- Figure 5 shows definitions and basic operations of a Genetic Algorithm according to an implementation of the methods of the disclosure.
- Flow is flow rate
- Temp is temperature
- pH pH
- Tp_1 is timepoint 1
- Tp_2 is timepoint 2
- Tp_3 timepoint 3
- Ep_1 is elution power 1
- Ep_2 is elution power 2
- Ep_3 is elution power 3
- Length is column length
- inner_diameter is the column inner diameter
- Particle size is the column particle size
- Pore_size is the column pore size
- H hydrophobicity
- S steric interaction
- A is hydrogen-bond acidity
- B hydrogen-bond basicity
- c ion-exchange capacity.
- Figure 6 shows basic operations of a Genetic Algorithm according to an implementation of the methods of the disclosure.
- Figure 7 shows prediction error plots of the evaluated machine learning models priorto feature selection using random train-test-split according to an implementation of the methods of the disclosure.
- Figure 8 shows comparison of the two splitting methods. Left: random train-test-split (80/20), right: unified train-test-slit (80/20).
- Y_train and y_test relate to the training and test data, respectively. Using the "unified" sampling approach, an 80/20 ratio between train and test samples was maintained over the entire retention time range.
- Figure 9 shows the top 20 of the most important features of the XGB model using unified train-test-split.
- the numbers are the F core values also illustrated by the length of the bar.
- Figure 10 shows evaluation of the hyper-parameters for the XGB model after performing unified train- test-split and feature selection, using different learning rates. All models reached good performance. Using a learning_rate of 0.1 and 0.05, the model reached good performance with only a few hundred estimators. The plotted score is the mean test score of the 10-fold CV.
- Figure 1 1 shows prediction error and residual plot of the final XGB model.
- the residual plot shows random error over the entire RT range indicating a good fit of the model.
- the model performs best between 2 min and 20 min, likely due to the number of observations in this range.
- Figure 12 shows prediction error and residual plot using molecular fingerprints and XGB.
- the histograms show the distribution of the test set as well as the distribution of the predicted values.
- Figure 13 shows comparison of KMeans clustering with 30 and 8 clusters using both ion-exchange parameters (c). PCA was applied before clustering. The "x" denotes the cluster centers.
- Figure 14 shows comparison of different eps values for the clustering of analytical columns using DBSCAN and c_28.
- Figure 15 shows comparison of measured and predicted RT using the SAM method parameters and the XGB model specifically trained for the optimizer.
- the numbering of the peaks is defined by the RT from Empower data (i.e. measured RT, red peaks).
- Figure 16 shows comparison of the predicted and measured results for the molecules in an example elution (labelled SAM-0114188) using the method parameters from the optimiser.
- Figure 17 shows comparison of the predicted and measured results for the molecules in an example elution (labelled SAM-0113392) using the recommended method parameters by the optimiser.
- Figure 18 shows a screenshot of a user interface for accessing the methods of the disclosure. Detailed description
- a composition as described herein may be a pharmaceutical composition which additionally comprises a pharmaceutically acceptable carrier, diluent or excipient.
- the pharmaceutical composition may optionally comprise one or more further pharmaceutically active polypeptides and/or compounds.
- Such a formulation may, for example, be in a form suitable for intravenous infusion.
- a compound as described herein may be a small molecule (e.g. a small molecule inhibitor, activator, cofactor, etc.) or a large molecule (e.g. a biologic, therapeutic protein or peptide such as an antibody or compound derived therefrom, a nucleic acid, etc.).
- a compound may be an organic compound.
- a compound may be a pharmaceutically active agent (also referred to as a drug or therapeutic agent), or a degradation product thereof.
- a computer system includes the hardware, software and data storage devices for embodying a system or carrying out a method according to the above described embodiments.
- a computer system may comprise a processing unit such as a central processing unit (CPU) and/or graphics processing unit (GPU), input means, output means and data storage, which may be embodied as one or more connected computing devices.
- the computer system has a display or comprises a computing device that has a display to provide a visual output display.
- the data storage may comprise RAM, disk drives or other computer readable media.
- the computer system may include a plurality of computing devices connected by a network and able to communicate with each other over that network. It is explicitly envisaged that computer system may consist of or comprise a cloud computer.
- the methods described herein may be provided as computer programs or as computer program products or computer readable media carrying a computer program which is arranged, when run on a computer, to perform the method(s) described herein.
- computer readable media includes, without limitation, any non-transitory medium or media which can be read and accessed directly by a computer or computer system.
- the media can include, but are not limited to, magnetic storage media such as floppy discs, hard disc storage media and magnetic tape; optical storage media such as optical discs or CD-ROMs; electrical storage media such as memory, including RAM, ROM and flash memory; and hybrids and combinations of the above such as magnetic/optical storage media.
- HPLC High-Performance Liquid Chromatography
- HPLC-MS mass spectrometry
- HPLC is also meant to encompass ultra-high performance liquid chromatography, and HPLC performed on its own or a part of an analytics or separation process, such as e.g. as part of LC-MS.
- FIG. 1 shows a schematic representation of an HPLC System.
- the eluent flow from the pump carries the sample through the HPLC system. Once the sample enters the column, it is separated into its different components. The separation mode is mainly determined by the column.
- HPLC reversed-phase
- NP-LC normal-phase
- HILIC hydrophilic interaction
- IPC ion-pair
- SEC size-exclusion
- affinity chromatography affinity chromatography
- the HPLC may be RP-LC.
- the stationary phase (referred to as the column) is non-polar.
- the column is a stainless steel tube filled with spherical particles made of porous, modified silica.
- the most common modification in RP-LC is C-18. These C-18 groups are attached to the surface of each particle providing the column with its non-polar character.
- the mobile phase consists of a polar mixture of an organic solvent (common examples include acetonitrile or methanol) and water. Different migration (“speed of travel”) of the sample molecules (referred to as analyte) causes the separation in the column.
- the migration of an analyte is determined by an equilibrium process between molecules of the same analyte present in the mobile phase and in the stationary phase at any time.
- RT retention time
- the sample solvent does not bind to the column due to its polarity and leaves the column first. After being separated on the column, the different molecules leave the column and enter a detector.
- the most common type is the ultraviolet absorption detector (LC-UV).
- a UV detector measures the fraction of light that is transmitted through the flow cell containing the sample as a function of time.
- the relation between the fraction of the transmitted light and the concentration of an analyte in a sample can be described using Beer-Lambert-Law.
- the output of an LC-UV experiment is a chromatogram, where the absorbance is plotted against the retention time (RT).
- the present invention provides a method of predicting a HPLC retention time for one or more compounds, the method comprising: (a) obtaining the values of one or more structural and/or physicochemical properties of said compounds, and one or more HPLC method parameters; and (b) using a machine learning model to predict a retention time for each of said compounds when subjected to HPLC using the one or more HPLC method parameters, wherein the machine learning model has been trained using a training dataset comprising molecular structural properties and/or physicochemical properties for one or more compounds, and for each compound, one or more sets of chromatographic data comprising a retention time for the respective compound and associated HPLC method parameters.
- the methods of the present aspect may have any of the features described in relation to any other aspect.
- the step of “obtaining” can comprise calculating the properties, receiving them from a user interface, or from a computing device or database.
- the HPLC method parameters can comprise parameters related to the column (e.g. dimensions, particle size), the eluent (e.g. pH of the eluents), or the chromatography procedure (e.g. flow rate).
- parameters related to the column e.g. dimensions, particle size
- the eluent e.g. pH of the eluents
- the chromatography procedure e.g. flow rate
- a schematic presentation is provided in Figure 2A.
- values of one or more structural and/or physicochemical properties for one or more compounds are obtained. This may comprise obtaining the identity of the compounds, for example from a user, at optional step 102. This may comprise optional step 104 of determining the value of the structural / physicochemical properties. Alternatively, these may be received from a user, computing device or database.
- the properties may comprise a molecular fingerprint and/or one or more molecular descriptors.
- the structural molecular descriptors and/or molecular fingerprints are 2D molecular descriptors or molecular fingerprints.
- the molecular descriptors are selected from a group consisting of ABCIndex, AcidBase, AdjacencyMatrix, Aromatic, AtomCount, Autocorrelation, BCUT, BalabanJ, BaryszMatrix, BertzCT, BondCount, CarbonTypes, Chi, Constitutional, DetourMatrix, Distance Matrix, EState, EccentricConnectivitylndex, ExtendedTopochemicalAtom, Fragmentcomplexity, Framework, HydrogenBond, Informationcontent, KappaShapelndex, Lipinski, McGowanVolume, MoeType, MolecularDistanceEdge, Molecularld, PathCount, Polarizability, RingCount, RotatableBond, SLogP, TopoPSA, TopologicalCharge, Topologicallndex, VdwVolumeABC, VertexAdjacencylnformation, WalkCount, Weight, Wienerindex, and Zagreblndex.
- said parameters are calculated by Mordred package.
- the molecular descriptors are selected from a group consisting of SLogP, ATSC5v, ATSC8d, ATSC3Z, ABC, VSA_EState4, ATSC5dv, and ATSC6L
- the values of one or more HPLC method parameters are obtained. These may be received from a user, computing device or database. For example, these may be received from an optimisation algorithm executed on a processor.
- sets of HPLC method parameters are generated by genetic algorithm.
- the genetic algorithm is run until one or more stopping criteria apply, optionally wherein the stopping criteria are selected from: i) a predetermined number of generations has been reached, and ii) the difference between the separation performance metrics associated with one or more sets of HPLC method parameters of a current iteration and the separation performance metrics associated with one or more sets of HPLC method parameters of a previous iteration is below a threshold.
- the genetic algorithm is initialised with a set of HPLC method parameters selected randomly, wherein the genetic algorithm is initialised with a set of HPLC method parameters selected from a predetermined set or range for each HPLC method parameter, and/or wherein the genetic algorithm is initialised with a set of HPLC method parameters provided by a user.
- one or more sets of HPLC method parameters are selected (e.g. by a genetic algorithm) from respective predetermined sets and/or ranges.
- the machine learning model is used to predict retention time for each of compounds when subjected to HPLC using the one or more HPLC method parameters.
- the machine learning model further predicts a metric indicative of peak width
- the one or more sets of chromatographic data in the training dataset further comprise a metric indicative of the width of the peak forthe respective compound.
- the predicted values are presented to a user, e.g. through a user interface, to a computing device or database.
- the machine learning model further predicts a metric indicative of peak width.
- the machine learning model is trained using one or more sets of chromatographic data that further comprise a metric indicative of the width of the peak for the respective compound.
- the one or more compounds are individually selected from: a pharmaceutically active agent, and a degradation product thereof.
- the HPLC method parameters are selected from a group consisting of: HPLC type, flow rate, temperature, pH, column length, column inner diameter, column particle size, column pore size, column steric interaction, column hydrogen bond acidity, column hydrogen bond basicity, column ion exchange capacity (c_28), column ion exchange capacity (c_70), and one or more metrics defining the elution phase gradient.
- the HPLC method parameters comprise one or more of the parameters selected from a group consisting of: column length, column inner diameter, column particle size, column pore size, column steric interaction, column hydrogen bond acidity, column hydrogen bond basicity, column ion exchange capacity (c_28), column ion exchange capacity (c_70), and one or more metrics defining the elution phase gradient, and optionally one or more parameters selected from a group consisting of: HPLC type, flow rate, temperature, pH.
- the HPLC method parameters comprise HPLC type, flow rate, temperature, pH, and one or more of the parameters selected from a group consisting of: column length, column inner diameter, column particle size, column pore size, column steric interaction, column hydrogen bond acidity, column hydrogen bond basicity, column ion exchange capacity (c_28), column ion exchange capacity (c_70), and one or more metrics defining the elution phase gradient.
- the HPLC method parameters comprise one or more of the parameters selected from a group consisting of: column length, column inner diameter, column particle size, column pore size, column steric interaction, column hydrogen bond acidity, column hydrogen bond basicity, column ion exchange capacity (c_28), column ion exchange capacity (c_70), and optionally one or more parameters selected from a group consisting of: HPLC type, flow rate, temperature, pH, and one or more metrics defining the elution phase gradient.
- the HPLC method parameters comprise i) HPLC type, flow rate, temperature, one or more metrics defining the elution phase gradient, pH, and ii) one or more of the parameters selected from a group consisting of: column length, column inner diameter, column particle size, column pore size, column steric interaction, column hydrogen bond acidity, column hydrogen bond basicity, column ion exchange capacity (c_28), column ion exchange capacity (c_70).
- the HPLC type can be e.g. normal phase (NP) chromatography, reverse phase (RP) chromatography, size exclusion chromatography, ion-exchange chromatography, hydrophilic interaction chromatography (HILIC), and affinity chromatography.
- NP normal phase
- RP reverse phase
- HPLC type is RP or NP chromatography.
- HPLC type may be RP chromatography.
- the one or more metrics defining the elution phase gradient comprise:
- metrics defining the time-dependent change in the proportion of a plurality of mobile phases during elution optionally wherein the one or more metrics defining the time-dependent change in the proportion of a plurality of mobile phase during elution comprise the value of one or more time points corresponding to changes in the rate of change of the proportion of the mobile phases, and/or the value of the rate of change of the proportion of the mobile phases at one or more time points during the elution, and/or metrics describing a function representing the change in proportion of the mobile phases in one or more of (e.g. each of) a plurality of regions in the gradient; and/or
- the mobile phase elution powers at two or more (e.g. three) different time points, optionally wherein the mobile phase comprises a plurality of mobile phases and the elution power at a time point is obtained as a sum of the elution powers of each of the plurality of mobile phases weighted by the respective proportion of each mobile phase.
- the one or more physicochemical properties comprise one or more molecular descriptors and/or wherein the one or more molecular structural properties comprise a molecular fingerprint and/or one or more structural molecular descriptor.
- the one or more physicochemical properties and/or molecular structural properties are calculated by PaDEL-Descriptor, BlueDesc, ChemoPy, PyDPI, Rcpi, Cinfony, or Dragon software, preferably by Mordred package. Any package available in the art for calculating physicochemical properties and/or molecular structural properties of molecules, for example starting from molecular formulae, 2D or3D structures, may be used.
- the structural molecular descriptors and/or molecular fingerprints are 2D molecular descriptors or molecular fingerprints.
- the molecular descriptors are selected from a group consisting of ABCIndex, AcidBase, AdjacencyMatrix, Aromatic, AtomCount, Autocorrelation, BCUT, BalabanJ, BaryszMatrix, BertzCT, BondCount, CarbonTypes, Chi, Constitutional, DetourMatrix, DistanceMatrix, EState, EccentricConnectivitylndex, ExtendedTopochemicalAtom, Fragmentcomplexity, Framework, HydrogenBond, Informationcontent, KappaShapelndex, Lipinski, McGowanVolume, MoeType, MolecularDistanceEdge, Molecularld, PathCount, Polarizability, RingCount, RotatableBond, SLogP, TopoPSA, TopologicalCharge, Topologicallndex, VdwVolumeABC, VertexAdjacencylnformation, WalkCount, Weight, Wienerindex, and Zagreblndex.
- said parameters are calculated
- the one or more structural and/or physicochemical properties comprise molecular descriptors selected from a group consisting of SLogP, ATSC5v, ATSC8d, ATSC3Z, ABC, VSA_EState4, ATSC5dv, and ATSC6L In embodiments, said parameters are calculated by Mordred package.
- the elution phase gradient (in the training data or in the HPLC method for which a retention time is predicted) is a fixed gradient.
- the HPLC methods represented in the training data and/or for which a RT is predicted have parameters selected from: a flow rate of 0.2 millilitre per minute, and an elution power of about 1.05, 1.50 and 1.95 (for the mobile phase or components thereof, for example determined as a summarised value for all components of the mobile phase at the respective time point) at 3 time points, such as time points of 0, 15 and 30 minutes.
- the HPLC method parameters for which retention time is predicted at step (b) are constrained to be within respective predetermined ranges and/or selected from respective predetermine sets of values, and/or wherein the training dataset comprises sets of chromatographic data obtained using HPLC method parameters within respective ranges and/or respective sets of values, and the method HPLC method parameters for which retention time is predicted at step (b) are within said ranges and/or selected from said sets of values.
- the elution phase gradient is a non-fixed gradient.
- the machine learning model is an Extreme Gradient Boosted model (e.g. extreme gradient-boosted trees), a Gradient Boosted model (e.g. gradient boosted trees), a Random Forest model, a Lasso regression model, or a Support Vector Machine, preferably an Extreme Gradient Boosted model.
- the machine learning model is trained to minimise differences between predicted retention times and corresponding retention times in training data.
- the machine learning model has been trained using a dataset comprising data for at least 100, at least 200, at least 300, at least 400, at least 500, or at least 600 compounds.
- the dataset comprises at least 2, 3, 4, or 5 sets of chromatographic data comprising a retention time for the respective compound and associated HPLC method parameters.
- the training data comprises a plurality of data points (e.g. at least 100, 200, 500, 1000, 1500), each data point comprising molecular/physicochemical properties for a compound, corresponding retention time and HPLC methods parameters.
- the data comprises multiple data points for the same compound with the same HPLC method, multiple data points for the same compound with different HPLC methods, and multiple compounds with the same HPLC methods. It is advantageous for the training dataset to comprise data for different compounds, such as at least 100, at least 200, at least 300, at least 400, at least 500, or at least 600 compounds.
- the training data comprises data points for different HPLC methods, such as at least 10, at least 20, at least 50, at least 100, or at least 150, at least 200 different sets of HPLC method parameters (where sets of HPLC method parameters are different if at least one method parameter differs).
- the training dataset comprises a metric indicative of the width of the peak corresponding to the respective compound (i.e. the peak associated with the retention time for the respective compound).
- the HPLC is Reversed-phase chromatography, Normal-phase chromatography, Size-exclusion chromatography, Ion-exchange chromatography, or Hydrophilic interaction liquid chromatography, optionally wherein the HPLC is Reversed-phase chromatography or Normal-phase chromatography.
- the machine learning model has been trained using a set of predictive features selected from a larger set through a feature selection process, and wherein the larger set of predictive features are each selected from: molecular structural properties or physicochemical properties (e.g. molecular descriptors or molecular fingerprints) and HPLC method parameters.
- the larger set of predictive features are each selected from: molecular structural properties or physicochemical properties (e.g. molecular descriptors or molecular fingerprints) and HPLC method parameters.
- the present invention provides a computer-implemented method of identifying a set of HPLC method parameters suitable for separating two or more compounds in a composition comprising said compounds is disclosed, the method comprising: (a) performing the method of any of embodiments of the first aspect for a plurality of sets of HPLC method parameters;
- step (b) calculating one or more separation performance metrics using the results of step (a);
- FIG. 2B A schematic presentation is provided in Figure 2B.
- values of one or more sets of HPLC method parameters are obtained. This may comprise obtaining values of one or more HPLC method parameters from optimisation algorithm.
- retention times (and optionally a metric indicative of peak width), are predicted by the method of the first aspect (e.g. as illustrated in Figure 2A).
- one or more separation performance metrics are calculated.
- one or more set(s) of HPLC method parameters are identified by applying one or more criteria on said separation performance metrics.
- the results are made available to output, e.g. by outputting them to a user, e.g. through a user interface, to a computing device or database.
- the methods of the present aspect may have any of the features described in relation to any other aspect.
- the separation performance metrics comprise one or more of i) the average retention time difference between adjacent peaks corresponding to predicted retention times for the two or more compounds, ii) the gradient duration, and iii) a predicted metric indicative of peak width, optionally comprising both i) and ii).
- the separation performance metrics comprise i) the average retention time difference between adjacent peaks corresponding to predicted retention times for the two or more compounds, wherein the criteria comprises maximising said average retention time difference; ii) the gradient duration, wherein the criteria comprises minimising said gradient duration; and/or iii) predicted peak widths, wherein the criteria comprises minimising said peak widths.
- the one or more sets of HPLC method parameters in step (a) are generated by an optimisation algorithm, optionally a genetic algorithm.
- step (a) comprises running the genetic algorithm until one or more stopping criteria apply, optionally wherein the stopping criteria are selected from: i) a predetermined number of generations has been reached, and ii) the difference between the separation performance metrics associated with one or more sets of HPLC method parameters of a current iteration and the separation performance metrics associated with one or more sets of HPLC method parameters of a previous iteration is below a threshold.
- stopping criteria are selected from: i) a predetermined number of generations has been reached, and ii) the difference between the separation performance metrics associated with one or more sets of HPLC method parameters of a current iteration and the separation performance metrics associated with one or more sets of HPLC method parameters of a previous iteration is below a threshold.
- the genetic algorithm is initialised with a set of HPLC method parameters selected randomly, wherein the genetic algorithm is initialised with a set of HPLC method parameters selected from a predetermined set or range for each HPLC method parameter, and/or wherein the genetic algorithm is initialised with a set of HPLC method parameters provided by a user.
- one or more sets of HPLC method parameters are selected (e.g. by a genetic algorithm) from respective predetermined sets and/or ranges.
- the method comprises presenting to a user one or more (e.g. 5) sets of HPLC method parameters that satisfy the one or more criteria on said separation performance metrics.
- Presenting to the user can be e.g. through a user interface, to a computing device or database.
- a method for providing a tool for predicting a HPLC retention time for one or more compounds comprising: (a) obtaining a training dataset comprising the values of one or more structural and/or physicochemical properties for one or more compounds, and for each compound, one or more sets of chromatographic data comprising a retention time for the respective compound and associated HPLC method parameters; and (b) training a machine learning model to predict a retention time for a compound when subjected to HPLC using the one or more HPLC method parameters, using as input the values of the one or more structural and/or physicochemical properties for the compound, and values of said HPLC method parameters.
- the methods of the present aspect may have any of the features described in relation to any other aspect.
- a computer program product comprising computer readable instructions which, when executed by one or more processors, cause the one or more processors to carry out the method of any of embodiments of any method described herein.
- a non-transitory computer-readable medium having stored thereon computer readable instructions which, when executed by one or more processors, cause the one or more processors to carry out the method of any of embodiments of any method.
- a system comprising: at least one processor; and at least one non- transitory computer readable medium containing instructions that, when executed by the at least one
- This work aims to evaluate the use of ML to improve the development process of HPLC methods.
- a recommendation tool is developed that suggests suitable parameters for a given sample mixture. These parameters should serve as a starting point for fine-tuning the exact values of the various HPLC parameters by a trained analyst.
- the tool makes use of a supervised ML model to predict the RT of the analytes in the sample.
- the model is trained using the following input data: i) Molecules and their molecular descriptors, ii) HPLC method parameters, iii) HPLC column parameters from the hydrophobic-subtraction model, and iv) Chromatographic data.
- the tool uses the RT predictions to evaluate different combinations of method parameters to maximise separation between the analytes.
- a single-page web application is implemented to provide easy access to a user.
- a further aim of this work is to reduce the number of different HPLC columns which would be needed during the HPLC method development process.
- an unsupervised clustering algorithm is used to group columns with similar properties.
- Chromatographic data were collected from Empower Chromatography Data System, which is a chromatography data software. Chromatographic data were downloaded in the JavaScript Object Notation (JSON) format. As every injection can be processed multiple times by the analyst, multiple results per injection exist. The assumption was made, that in the chromatogram with the latest resultjd, the peaks were properly integrated and labelled. From these results, the RT and label of every integrated peak and other relevant information such as the method were extracted and stored in a CSV file for further processing.
- JSON JavaScript Object Notation
- elution_power A variable, elution_power, was introduced. The higher the elution power of the mobile phase composition, the shorter the expected retention time of the analytes.
- elution power To calculate the “elution power” of a mobile phase, each component and its proportion in a mobile phase were extracted from the SAM documents. Using this information, mol% of each component in the mobile phase was calculated. To get the elution power as shown in Table 1 , only the main solvents were taken into account, i.e. water, acetonitrile, methanol, and isopropyl alcohol. Mol% of each solvent was multiplied by its elution strength to give the elution power. See Table 2 for the elution strengths of the solvents.
- Table 1 Example for the calculation of the elution power for each component in SAM-0200368. Gradient Information
- time_point_1 , time_point_2, time_point_3 were introduced to describe the mobile phase at three different time points. For each instrument method, these time points were manually extracted together with the mobile phase composition at this time point.
- Figure 3 shows the position of the time points for SAM-0200368. At each time point, the elution power was calculated by multiplying the portion of each mobile phase by its elution power and summing up these values.
- Table 3 the time points and elution powers for SAM- 0200368 are depicted. These six variables were used as features for the machine learning model to represent the gradient and mobile phases.
- the Analytical Column Database Compilation contains information on every HPLC and GC column in PTDC. Using the REST API of ACDC, information such as column ID, manufacturer, type of stationary phase, diameters, particle size, product number, USP code, pore size, and surface area were downloaded and stored locally in a CSV file. Accessing the USP Database
- the parameters of the hydrophobic-subtraction model for more than 750 stationary phases were downloaded from the United States Pharmacopeia (USP) webpage using a web scraper as no REST API was available. Manual download of the data was not an option as the data is constantly updated.
- the XMLHttpRequest (XHR) sent by the browser while loading the data of the requested webpage were intercepted. These XHR requests were mimicked using a python script.
- the response sent by the webpage was stored locally in a CSV file.
- the information fetched from ACDC and the USP webpage were merged to yield a dataset containing stationary phase information for the columns used in the SAM methods including features such as column dimensions, particle sizes, pore sizes, PQRI parameters as well as a number describing how often a certain column was used in a SAM. Fuzzy string matching was used to merge the data sources based on the stationary phase name.
- Integrated Roche Chemistry Information (IRCI) database is a web application holding the molecular structures of all registered molecules within Roche. Other databases of molecular structures may be used such as e.g. ChEMBL (https://www.ebi.ac.uk/chembl).
- the IRCI REST API was used to download mol files for every peak in the cleaned Empower dataset (approx. 600 molecules).
- RDKit the mol files were transformed to a specific RDKit mol file to calculate molecular descriptors using the Mordred package (Moriwaki et al., 2018). Fingerprints were generated using RDKit.
- R 2 correlation coefficient
- MAE Mean Absolute Error
- MAE Mean Absolute Percentage Error
- MAE Mean Absolute Percentage Error
- the dataset had to be split into a training set and a test set.
- the train_test_split method from Scikit-learn was used.
- the results of the models using random train-test-split showed that prediction performance decreased with higher retention time and that the random state defined in the method had an impact on the prediction score. Therefore, a unified train-test-split using an 80/20 split was implemented.
- the dataset was first sorted by the retention time and then every 5 th observation was put into the test set. In addition, it was decided to limit the retention time to 30 min., as there were too few observations with RT > 30 min to achieve a train- test-ratio of 5:1.
- MinMaxScaler provided by Scikit-learn was used to scale the input features. Usually, each feature is scaled individually, but as the time point and elution power features are in relation to each other (time points 1 -3 and elution power 1-3), a special procedure was applied. For both, the time points and the elution powers, a NumPy (Harris et al., 2020) array was created holding all values from the individual time points to fit the scaler. This way, all features have the same min and max values. After fitting the scaler, each feature was then transformed individually. Compared to linear models and SVM, tree-based models do not require scaling of the input data, as each feature is processed individually (Muller & Guido, 2016).
- a dummy model acts as a baseline to judge the results of other models. If a model achieves a higher score than the dummy regressor, one can say that the predictions are better than predicting random values (Muller & Guido, 2016). A dummy regressor predicts for every observation the same output. In this case, the mean retention time from the observations in the test set (11 .6 min) was predicted.
- clustering was evaluated to group columns with similar properties based on the hydrophobic-subtraction model.
- the goal of this approach was to reduce the variety of columns in the labs. This can be achieved by the optimiser predicting a group of columns (a cluster) in which all columns share similar properties.
- KMeans and DBSCAN were evaluated.
- the ion-exchange capacity (c in the hydrophobic-subtraction model) is pH dependent, two clusterings had to be performed.
- PCA Principal Component Analysis
- the number of clusters has to be defined by the user. This was done iteratively by changing the number of clusters and plotting the results after doing PCA of the dataset. It is important to find the ideal number of clusters. If the number is too high, no reduction in columns will be achieved while a small number leads to clusters containing columns which do not produce similar outputs. Using the defined number of clusters, the clustering was also applied to the dataset without performing PCA beforehand.
- the number of clusters is defined by the hyper-parameters eps and min_samples.
- eps defines the maximum distance between data points to be considered as “in the neighbourhood” of each other.
- min_samples is the number of observations that have to be in the same neighbourhood to form a cluster. If the number of observations in the same neighbourhood is smaller than min_samples, these observations are declared as noise (Muller & Guido, 2016).
- min_samples was fixed at 0. The following eps values were tested: [0.04, 0.06, 0.08, 0.1],
- the aim of the recommendation tool is to find suitable HPLC instrument parameters for the separation of a set of predefined molecules.
- the tool uses the ML model for retention time prediction to find the method parameter combination that maximizes the separation between the analytes. In total, there are 18 method parameters to be defined.
- a GA can be implemented by providing a fitness function to evaluate candidate solutions, defining the gene space (search space) and setting parameters to guide the optimisation process. Parameters to be defined are, among others, the number of generations, number of parents mating, solutions per generation as well as the mutation rate and crossover behaviour.
- Figure 5 shows the important elements of a genetic algorithm.
- a solution also called a chromosome, consists of genes. A single gene describes one feature, e.g. flow rate or temperature. Therefore, one solution holds all the information to create an instrument method.
- genes of solutions which reached a high fitness value parents
- a mutation happens at randomly selected genes.
- Figure 5 shows the general procedure of the optimisation process using GA.
- the first step in the process is to define molecules to be separated by providing Roche numbers. Using the Roche number, the mol file is downloaded from IRCI and molecular descriptors are calculated for each molecule.
- the GA produces the first generation of solutions by randomly selecting a value from the gene space for each gene to create a solution. The generated descriptors and the method parameters are merged to give a solution and the retention time for each analyte is predicted. Using the predicted retention times and Equation 1 , the fitness value of the current solution is evaluated.
- the fitness function considers the average retention time difference (Ai) between adjacent peaks (n) and the gradient duration (t). The higher the fitness, the better the expected separation between all analytes.
- the gradient duration in the fitness function is added as a penalty term to limit run times.
- the GA creates new candidate solutions through crossover between the best solutions of the first generation and random mutation.
- the process of calculating the fitness for all solutions of the second generation is repeated. If one of the new solutions reaches a higher fitness than the best solution from the first generation, this solution is stored as the best solution. The process is repeated until the predefined number of generations is reached.
- the best_solution is then used to set up the instrument method.
- the optimizer was set to run for 100 generations using 20 solutions each. To create a new generation, the 10 least fit solutions of the previous generation were replaced by 10 new solutions created through crossover and mutation of the 10 fittest solutions. This behaviour was achieved by using steady-state selection as parent selection type in the parameters (M. Mitchell, 1996).
- the mutation type was set to random, and the number of genes to mutate was set to one. Even though 18 method parameters had to be selected, the number of genes was set to 14. This is due to allowing only certain combinations of H, S, A, B, and c parameters, namely the cluster centers. If this constraint did not exist, the optimizer could select parameters that lead to non-existing columns.
- the gene space for the column was set between 0 and 42 which corresponded to the number of unique stationary phases available in the data set. This approach allowed for the evaluation of the optimizer by comparing the predicted retention times and the measured retention times, as every number in the gene space corresponded to exactly one stationary phase. For future use, the tool could also suggest other columns which are in the same cluster as the predicted one. The analyst would then be able to choose a column already available in his or her lab. Based on the selected cluster number, the parameters H, S, A, B, and c were added to the solution. The c value (c_28, c_70) was selected depending on the pH of the solution (e.g. c_28 denotes a pH of less than 7, e.g.
- c_70 denotes a pH of 7.0; see https://www.usp.org/resources/pqri-approach- column-equiv-tool). It was decided not to apply PCA prior to clustering so that the predicted parameters corresponded to exactly one column.
- the gene space for the column dimensions was fixed at 150 mm x 4.6 mm. The particle size was set to 3.0, which sits between the common 2.7 pm and 3.5 pm particles.
- the gene space for flow rate was set to [0.75, 1 .0, 1 .25] to avoid overpressure in the HPLC system in combination with the defined column dimensions.
- Two options to define the gene space of the gradient (time point and elution power) were defined. The gradient can be fixed. This gradient runs from 5 %B to 95 %B in 30 min. If the gradient is not fixed, no restrictions are in place and the GA recommends the gradient.
- time point 1 time point 2 ⁇ time point 3: The run time can only increase over time.
- the web application was implemented using Ploty Dash (“Plotly Technologies Inc. Collaborative data science”, 2015). It allows the analyst to enter multiple analytes by providing information enabling identification of a compound, e.g. the Roche number. In addition, the analyst can choose between the two gradient options.
- the tool runs the optimisation process multiple times (e.g. five times), as the result of the GA (and other metaheuristics) can vary. This variation is caused by the randomly generated first generation as well as the crossover and mutation behaviour of the algorithm.
- the trained analyst chooses the run with the highest fitness value overall or based on available columns or predicted RT (e.g. shorter RT).
- the linear lasso model achieved a prediction score (R 2 ) of 0.663 using the following hyper-parameters: alpha: 0.001 , tolerance: 1 e-06, maxjterations: 500.
- SVR achieved an R 2 of 0.743 and a MAE of 2.4 min using C: 1000 and gamma: 0.01.
- the tree-based model with the lowest prediction performance was random forest with an R 2 of 0.770 using bootstrap: True, max_depth: 16, min_samples leaf: 1 , num_estimators: 175.
- GB and XGB showed similar results with an R 2 of 0.802 and 0.81 1 respectively.
- a learning rate of 0.1 was used together with a max_depth of 5, min_samples leaf of 100 and 1500 estimators.
- a learning rate of 0.05, max_depth of 4, maxjeaves of 0, min_child weight of 4 and 750 estimators were used.
- the dummy regressor which was used to define a baseline score, achieved an MAE of 5.9 min by always predicting the mean retention time of the training data. This baseline score was outperformed by every evaluated model.
- the results of the evaluated models are summarized in Table 4 and the prediction error plots are depicted in Figure 7.
- the performance of the XGB model was further increased by evaluating the “unified train-test-split” and applying feature selection.
- Figure 8 shows the histograms of both sampling methods before limiting the retention time. Using this sampling approach, R 2 was increased from 0.811 to 0.830, while MAE was reduced from 1 .9 min to 1 .6 min. The same hyper-parameter settings were used as described in subsection 3.3.1.
- Figure 9 shows the 20 most important features of the XGB model when using unified train-test-split. Apart from the instrument method features, the most important features are:
- ATSC Autocorrelation of a Topological Structure, also known as centered Moreau-Broto autocorrelation. ATSC measures the distribution of atomic properties on a molecular graph (D. R. Todeschini & Consonni, 2020).
- VSA_EState Hybrid of van der Waals surface area and electrotopological state (Guha & Willighagen, 2012).
- Table 5 shows model performance using different numbers of features. The highest score was achieved using the 65 most important features. As the required features particle_size and inner_diameter were not in the top 65, these two features were added manually. Using feature selection, the number of features was reduced from 640 to 67 while increasing the prediction score. The following hyperparameters were used for the evaluation: gamma: 0, learning_rate: 0.05, max_depth: 4, ’maxjeaves: 0, min_child_weight: 4, n_estimators: 750.
- n_estimators [1, 10 , 20 , 50 , 100 , 200 , 500 , 750 , 1OOO , 1250 , 1500],
- Figure 10 shows the results with varying n_estimators, max_depth and learning_rate values. Best R2 was achieved using the following parameters: n_estimators: 750, learning_rate: 0.05, max_depth: 4, ’maxjeaves: 0, min_child_weight: 4, and gamma: 0. All other parameters used the default value. Through unified train-test-split and feature selection followed by hyper-parameter tuning, R 2 was increased from 0.811 to 0.827, MAE was decreased from 1.9 min to 1.6 min and MAPE was reduced from 26.5% to 21 .5%. In Table 6, the mean test score of the 10-fold CV as a function of the number of estimators is shown. Using more than 750 estimators, a small decrease in prediction score was observed, which is an indication of over-fitting. Figure 11 shows the prediction error and residuals plot of the final model.
- Figure 13 shows the results of the KMeans clustering after PCA was applied.
- the pH dependence of the ion-exchange parameter “c” is visible by comparing the distribution of the clusters in both plots.
- min_samples was fixed at 0 as every column should form a cluster when using DBSCAN.
- Figure 14 shows the formed cluster using DBSCAN in combination with PCA using different values for eps. It was decided to use the KMeans algorithm for the optimizer as KMeans can also be used to predict the cluster for new observations (Pedregosa et al., 2011).
- the first test was the prediction of the RT for every molecule described in one of the two SAM documents.
- the model for the optimiser was trained without a test set but every Roche number I methodjd combination from the SAMs was dropped before training the model.
- the measured RT (from the Empower data) for each molecule using the specified method and the predicted RT using the same method parameters are depicted.
- the overall accuracy of the predictions is good with MAE of 0.10 min (SAM-0114188) and 1 .09 min (SAM-0113392).
- the elution order of the predicted RT corresponded to the measured results for SAM-0114188.
- a change in elution order was observed for peak no. 2 and peak 3 in SAM-0113392 ( Figure 15b).
- Table 7 shows the output of the optimizer for the molecules used in SAM-01 14188 while not using the fixed gradient.
- the optimizer ran five times of which each solution had the same fitness.
- the method parameters differed slightly between each solution but were in the same range (e.g. suggested pH is acidic in every solution).
- the suggested method parameters of the 5 th run (index 4) were evaluated in the laboratory.
- Cluster 6 corresponded to a Waters Symmetry C18 column.
- Mobile phase A consisted of water + 0.1% TFA and mobile phase B of acetonitrile + 0.1% TFA.
- Table 8 shows the output of the optimizer for SAM-0114188 using the fixed gradient.
- Cluster 6 corresponded to a Waters Symmetry C18 column.
- Mobile phase A was water + 0.1% TFA and mobile phase B acetonitrile + 0.1% TFA.
- Table 9 shows the optimiser output for SAM-0113392 without using gradient restrictions.
- the displayed solution is the one tested in the lab.
- Cluster 1 corresponded to a Phenomenex Kinetex Biphenyl column.
- the following mobile phases were used to achieve a pH of 3: A: Water + 0.01 % TFA, B: acetonitrile + 0.01 % TFA.
- the comparison between the predicted RT by the optimiser and the measured RT are depicted in Figure 17a. While Peak 0 eluted too early compared to the predicted RT, peaks 2, 3 and 4 eluted too late without showing good separation among each other.
- Table 10 shows the optimiser output for SAM-0113392 using the fixed gradient.
- the displayed solution is the one tested in the lab.
- Cluster 1 1 corresponded to a Waters Acquity UPLC CSH C18.
- Mobile phase A consisted of 10 mM ammonium acetate in water, adjusted to pH 6 using acetic acid.
- Mobile phase B consisted of acetonitrile + 0.01 % acetic acid.
- the comparison between the predicted RT by the optimiser and the measured RT are shown in Figure 17b. Peak 1 eluted significantly earlier than predicted, whereas peak 3 and 4 changed the elution order. Compared to the results without gradient restrictions, the measured RT are more accurate and good separation between all peaks was observed.
- Figure 18 shows the graphical user interface of the web application prototype.
- the web app was created using Plotly Dash (“Plotly Technologies Inc. Collaborative data science”, 2015) and allows a user to define up to five molecules described as Roche numbers.
- the application allows the user to choose between the fixed gradient, where only the flow rate, temperature, pH and column are optimised and the unrestricted option, where every parameter is optimised except the column dimensions.
- the optimisation process can be started by clicking the start button. Once the optimisation is run, the solutions appear on the results card. From the produced solutions, a trained analyst chooses the solutions with the highest fitness. If the suggested gradient does not seem reasonable, e.g. too flat, the analyst may choose another solution. Often, many solutions will have the same fitness value. In this case, the solution can be chosen also based on available columns.
- a dataset of available HPLC data was used to train an ML model, which in turn was used to optimise analytical method development using a GA.
- the developed tool can be used to suggest HPLC method parameter settings for a set of pre-defined analytes, which can serve as a starting point for the analytical chemist.
- an XGB model was used to predict the RTs of the analytes.
- a web application prototype was implemented.
- the tool was able to find suitable settings for all evaluated cases.
- the approach of predicting RT considering method parameters has not yet been developed in other studies. Addition of more data in the training dataset is expected to increase the RT prediction accuracy of the ML model.
- the tool Once the tool is deployed, it has the potential to increase efficiency by reducing the need for a costly and time-consuming one-factor-at-a-time approach in the development process of new HPLC methods.
- a computer-implemented method of predicting a HPLC retention time for one or more compounds comprising:
- the method of embodiment 1 is disclosed, wherein the machine learning model further predicts a metric indicative of peak width, and the one or more sets of chromatographic data in the training dataset further comprise a metric indicative of the width of the peak for the respective compound.
- embodiment 1 or 2 the method of embodiment 1 or 2 is disclosed, wherein the one or more compounds are individually selected from: a pharmaceutically active agent, and a degradation product thereof.
- the method of any of embodiments 1 -3 is disclosed, wherein the HPLC method parameters are selected from a group consisting of: HPLC type, flow rate, temperature, pH, column length, column inner diameter, column particle size, column pore size, column steric interaction, column hydrogen bond acidity, column hydrogen bond basicity, column ion exchange capacity (c_28), column ion exchange capacity (c_70), and one or more metrics defining the elution phase gradient. 5.
- HPLC method parameters are selected from a group consisting of: HPLC type, flow rate, temperature, pH, column length, column inner diameter, column particle size, column pore size, column steric interaction, column hydrogen bond acidity, column hydrogen bond basicity, column ion exchange capacity (c_28), column ion exchange capacity (c_70), and one or more metrics defining the elution phase gradient. 5.
- the method of any of embodiments 1 -4 is disclosed, wherein the HPLC method parameters comprise one or more of the parameters selected from a group consisting of: column length, column inner diameter, column particle size, column pore size, column steric interaction, column hydrogen bond acidity, column hydrogen bond basicity, column ion exchange capacity (c_28), column ion exchange capacity (c_70), and one or more metrics defining the elution phase gradient, and optionally one or more parameters selected from a group consisting of: HPLC type, flow rate, temperature, pH.
- the HPLC method parameters comprise one or more of the parameters selected from a group consisting of: column length, column inner diameter, column particle size, column pore size, column steric interaction, column hydrogen bond acidity, column hydrogen bond basicity, column ion exchange capacity (c_28), column ion exchange capacity (c_70), and one or more metrics defining the elution phase gradient, and optionally one or more parameters selected from a group consisting of: HPLC type, flow rate, temperature, pH
- the method of any of embodiments 1 -5 is disclosed, wherein the HPLC method parameters comprise HPLC type, flow rate, temperature, pH, and one or more of the parameters selected from a group consisting of: column length, column inner diameter, column particle size, column pore size, column steric interaction, column hydrogen bond acidity, column hydrogen bond basicity, column ion exchange capacity (c_28), column ion exchange capacity (c_70), and one or more metrics defining the elution phase gradient.
- the HPLC method parameters comprise HPLC type, flow rate, temperature, pH, and one or more of the parameters selected from a group consisting of: column length, column inner diameter, column particle size, column pore size, column steric interaction, column hydrogen bond acidity, column hydrogen bond basicity, column ion exchange capacity (c_28), column ion exchange capacity (c_70), and one or more metrics defining the elution phase gradient.
- the method of any of embodiments 1-6 is disclosed, wherein the one or more metrics defining the elution phase gradient comprise:
- metrics defining the time-dependent change in the proportion of a plurality of mobile phases during elution optionally wherein the one or more metrics defining the time-dependent change in the proportion of a plurality of mobile phase during elution comprise the value of one or more time points corresponding to changes in the rate of change of the proportion of the mobile phases, and/or the value of the rate of change of the proportion of the mobile phases at one or more time points during the elution, and/or metrics describing a function representing the change in proportion of the mobile phases in one or more of (e.g. each of) a plurality of regions in the gradient; and/or
- the mobile phase elution powers at two or more (e.g. three) different time points, optionally wherein the mobile phase comprises a plurality of mobile phases and the elution power at a time point is obtained as a sum of the elution powers of each of the plurality of mobile phases weighted by the respective proportion of each mobile phase.
- the method of any of embodiments 1 -7 is disclosed, wherein the one or more physicochemical properties comprise one or more molecular descriptors and/or wherein the one or more molecular structural properties comprise a molecular fingerprint and/or one or more structural molecular descriptor.
- the method of embodiment 8 is disclosed, wherein the structural molecular descriptors and/or molecular fingerprints are 2D molecular descriptors or molecular fingerprints. 10. In an embodiment, the method of any of embodiments 1 -9 is disclosed, wherein the one or more structural and/or physicochemical properties comprise molecular descriptors selected from a group consisting of SLogP, ATSC5v, ATSC8d, ATSC3Z, ABC, VSA_EState4, ATSC5dv, and ATSC6L
- the method of any of embodiments 1 to 11 is disclosed, wherein the HPLC method parameters for which retention time is predicted at step (b) are constrained to be within respective predetermined ranges and/or selected from respective predetermine sets of values, and/or wherein the training dataset comprises sets of chromatographic data obtained using HPLC method parameters within respective ranges and/or respective sets of values, and the method HPLC method parameters for which retention time is predicted at step (b) are within said ranges and/or selected from said sets of values.
- the method of any of embodiments 1 -13 is disclosed, wherein the machine learning model is an Extreme Gradient Boosted model (e.g. extreme gradient-boosted trees), a Gradient Boosted model (e.g. gradient boosted trees), a Random Forest model, a Lasso regression model, or a Support Vector Machine, preferably an Extreme Gradient Boosted model.
- Extreme Gradient Boosted model e.g. extreme gradient-boosted trees
- a Gradient Boosted model e.g. gradient boosted trees
- Random Forest model e.g. gradient boosted trees
- Lasso regression model e.g. Lasso regression model
- Support Vector Machine preferably an Extreme Gradient Boosted model.
- the method of any of embodiments 1 -14 is disclosed, wherein the machine learning model is trained to minimise differences between predicted retention times and corresponding retention times in training data.
- the method of any of embodiments 1 -15 is disclosed, wherein the machine learning model has been trained using a dataset comprising data for at least 100, at least 200, at least 300, at least 400, at least 500, or at least 600 compounds.
- the method of embodiment 16 is disclosed, wherein the dataset comprises at least 2, 3, 4, or 5 sets of chromatographic data comprising a retention time for the respective compound and associated HPLC method parameters.
- the method of any of embodiments 1 -17 is disclosed, wherein the HPLC is Reversed-phase chromatography, Normal-phase chromatography, Size-exclusion chromatography, Ion-exchange chromatography, or Hydrophilic interaction liquid chromatography, optionally wherein the HPLC is Reversed-phase chromatography or Normal-phase chromatography.
- the method of any of embodiments 1 -18 is disclosed, wherein the machine learning model has been trained using a set of predictive features selected from a larger set through a feature selection process, and wherein the larger set of predictive features are each selected from: molecular structural properties or physicochemical properties (e.g. molecular descriptors or molecular fingerprints) and HPLC method parameters.
- the machine learning model has been trained using a set of predictive features selected from a larger set through a feature selection process, and wherein the larger set of predictive features are each selected from: molecular structural properties or physicochemical properties (e.g. molecular descriptors or molecular fingerprints) and HPLC method parameters.
- a computer-implemented method of identifying a set of HPLC method parameters suitable for separating two or more compounds in a composition comprising said compounds is disclosed, the method comprising:
- step (b) calculating one or more separation performance metrics using the results of step (a);
- the separation performance metrics comprise one or more of i) the average retention time difference between adjacent peaks corresponding to predicted retention times for the two or more compounds, ii) the gradient duration, and iii) a predicted metric indicative of peak width, optionally comprising both i) and ii).
- the separation performance metrics comprise i) the average retention time difference between adjacent peaks corresponding to predicted retention times for the two or more compounds, wherein the criteria comprises maximising said average retention time difference; ii) the gradient duration, wherein the criteria comprises minimising said gradient duration; and/or iii) predicted peak widths, wherein the criteria comprises minimising said peak widths.
- step (a) the method of any of embodiments 20-22 is disclosed, wherein the one or more sets of HPLC method parameters in step (a) are generated by an optimisation algorithm, optionally a genetic algorithm.
- step (a) comprises running the genetic algorithm until one or more stopping criteria apply, optionally wherein the stopping criteria are selected from: i) a predetermined number of generations has been reached, and ii) the difference between the separation performance metrics associated with one or more sets of HPLC method parameters of a current iteration and the separation performance metrics associated with one or more sets of HPLC method parameters of a previous iteration is below a threshold.
- the method of embodiment 23 or 24 is disclosed, wherein the genetic algorithm is initialised with a set of HPLC method parameters selected randomly, wherein the genetic algorithm is initialised with a set of HPLC method parameters selected from a predetermined set or range for each HPLC method parameter, and/or wherein the genetic algorithm is initialised with a set of HPLC method parameters provided by a user.
- the method of any of embodiments 1 -25 is disclosed, wherein one or more sets of HPLC method parameters are selected (e.g. by a genetic algorithm) from respective predetermined sets and/or ranges.
- the method of any of embodiments 20-26 comprises presenting to a user one or more (e.g. 5) sets of HPLC method parameters that satisfy the one or more criteria on said separation performance metrics.
- a computer program product comprising computer readable instructions which, when executed by one or more processors, cause the one or more processors to carry out the method of any of embodiments 1 -27.
- a non-transitory computer-readable medium having stored thereon computer readable instructions which, when executed by one or more processors, cause the one or more processors to carry out the method of any of embodiments 1 -27.
- a system comprising: at least one processor; and at least one non-transitory computer readable medium containing instructions that, when executed by the at least one processor, cause the at least one processor to perform the method of any of embodiments 1 to 27.
Landscapes
- Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Chemical & Material Sciences (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Analytical Chemistry (AREA)
- Biochemistry (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Crystallography & Structural Chemistry (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Databases & Information Systems (AREA)
- Treatment Of Liquids With Adsorbents In General (AREA)
Abstract
Description
Claims
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202380075548.XA CN120112790A (en) | 2022-10-28 | 2023-10-26 | Using machine learning to determine HPLC method parameters |
| KR1020257013554A KR20250099131A (en) | 2022-10-28 | 2023-10-26 | Determining HPLC method parameters using machine learning |
| EP23797783.0A EP4609189A1 (en) | 2022-10-28 | 2023-10-26 | Determining hplc method parameters using machine learning |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP22204521.3 | 2022-10-28 | ||
| EP22204521 | 2022-10-28 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024089143A1 true WO2024089143A1 (en) | 2024-05-02 |
Family
ID=84044662
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP2023/079867 Ceased WO2024089143A1 (en) | 2022-10-28 | 2023-10-26 | Determining hplc method parameters using machine learning |
Country Status (4)
| Country | Link |
|---|---|
| EP (1) | EP4609189A1 (en) |
| KR (1) | KR20250099131A (en) |
| CN (1) | CN120112790A (en) |
| WO (1) | WO2024089143A1 (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118243842A (en) * | 2024-05-28 | 2024-06-25 | 武汉智化科技有限公司 | Liquid Chromatography Retention Time Prediction Method under Different Chromatographic Conditions |
| CN121007997A (en) * | 2025-10-23 | 2025-11-25 | 杭州凯莱谱质造科技有限公司 | A preliminary separation method and system for liquid chromatography |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030102265A1 (en) * | 2001-12-03 | 2003-06-05 | Thierry Gandelheid | Method of separating compound(s) from mixture(s) |
| WO2022145590A1 (en) * | 2020-12-31 | 2022-07-07 | ㈜베르티스 | Apparatus and method for predicting retention time in chromatographic analysis of analyte |
-
2023
- 2023-10-26 WO PCT/EP2023/079867 patent/WO2024089143A1/en not_active Ceased
- 2023-10-26 KR KR1020257013554A patent/KR20250099131A/en active Pending
- 2023-10-26 CN CN202380075548.XA patent/CN120112790A/en active Pending
- 2023-10-26 EP EP23797783.0A patent/EP4609189A1/en active Pending
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030102265A1 (en) * | 2001-12-03 | 2003-06-05 | Thierry Gandelheid | Method of separating compound(s) from mixture(s) |
| WO2022145590A1 (en) * | 2020-12-31 | 2022-07-07 | ㈜베르티스 | Apparatus and method for predicting retention time in chromatographic analysis of analyte |
Non-Patent Citations (41)
| Title |
|---|
| "Collaborative data science", 2015, PLOTLY TECHNOLOGIES INC |
| BIANCHI, L.DORIGO, M.GAMBARDELLA, L. M.GUTJAHR, W. J.: "A survey on metaheuristics for stochastic combinatorial optimization", NATURAL COMPUTING, vol. 8, no. 2, 2008, pages 239 - 287, XP019685072 |
| BLUM, C.ROLI, A.: "Metaheuristics in combinatorial optimization: Overview and conceptual comparison", ACM COMPUTING SURVEYS (CSUR, vol. 35, no. 3, 2003, pages 268 - 308 |
| BOUWMEESTER, R.MARTENS, L.DEGROEVE, S: "Comprehensive and Empirical Evaluation of Machine Learning Algorithms for Small Molecule LC Retention Time Prediction", ANALYTICAL CHEMISTRY, vol. 91, no. 5, 2019, pages 3694 - 3703 |
| CHEN, T.GUESTRIN, C.: "XGBoost: A Scalable Tree Boosting System", PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, pages 785 - 794, XP058631191, DOI: 10.1145/2939672.2939785 |
| DONG, M. W.GUILLARME, D.: "Newer developments in HPLC impacting pharmaceutical analysis: A brief review", AMERICAN PHARMACEUTICAL REVIEW, vol. 16, 2013, pages 36 - 43 |
| ESTRADA, E.: "Atom-bond connectivity and the energetic of branched alkanes", CHEMICAL PHYSICS LETTERS, vol. 463, no. 4-6, 2008, pages 422 - 425, XP025466960, DOI: 10.1016/j.cplett.2008.08.074 |
| EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, vol. 205, no. 2, pages 486 - 487 |
| GAD, A. F.: "Pygad: An intuitive genetic algorithm python library", CORR, ABS/2106.06158, 2021 |
| GUHA, RWILLIGHAGEN, E.: "A Survey of Quantitative Descriptions of Molecular Structure", CURRENT TOPICS IN MEDICINAL CHEMISTRY, vol. 12, no. 18, 2012, pages 1946 - 1956, XP093073545, DOI: 10.2174/156802612804910278 |
| HADDAD, P. R.TARAJI, M.SZUCS, R.: "Prediction of Analyte Retention Time in Liquid Chromatography", ANALYTICAL CHEMISTRY, vol. 93, no. 1, 2021, pages 228 - 256 |
| HARRIS, C. R.MILLMAN, K. J.VAN DER WALT, S. J.GOMMERS, R.VIRTANEN, P.COURNAPEAU, D.WIESER, E.TAYLOR, J.BERG, S.SMITH, N. J.: "Array programming with NumPy", NATURE, vol. 585, no. 7825, 2020, pages 357 - 362, XP037247883, DOI: 10.1038/s41586-020-2649-2 |
| HEBERGER, K.: "Quantitative structure-(chromatographic) retention relationships", JOURNAL OF CHROMATOGRAPHY A, vol. 1158, no. 1-2, 2007, pages 273 - 305, XP022144818, DOI: 10.1016/j.chroma.2007.03.108 |
| HEWITT, E. F.LUKULAY, P.GALUSHKO, S.: "Implementation of a rapid and automated high performance liquid chromatography method development strategy for pharmaceutical drug candidates", JOURNAL OF CHROMATOGRAPHY A, vol. 1107, no. 1-2, 2006, pages 79 - 87, XP024968157, DOI: 10.1016/j.chroma.2005.12.042 |
| KALISZAN, R.: "Correlation between the retention indices and the connectivity indices of alcohols and methyl esters with complex cyclic structure", CHROMATOGRAPHIA, vol. 10, no. 9, 1977, pages 529 - 531 |
| KALISZAN, R.FOKS, H.: "The relationship between the RM values and the connectivity indices for pyrazine carbothioamide derivatives", CHROMATOGRAPHIA, vol. 10, no. 7, 1977, pages 346 - 349 |
| KENSERT, A.BOUWMEESTER, R.EFTHYMIADIS, K.BROECK, P. V.DESMET, G.CABOOTER, D.: "Graph Convolutional Networks for Improved Prediction and Interpretability of Chromatographic Retention Data", ANALYTICAL CHEMISTRY, vol. 93, no. 47, 2021, pages 15633 - 15641 |
| MCKINNEY, W., DATA STRUCTURES FOR STATISTICAL COMPUTING IN PYTHON, 2010, pages 56 - 61 |
| MERT OZUPEK NAZLI ET AL: "Modelling of multilinear gradient retention time of bio-sweetener rebaudioside A in HPLC analysis", ANALYTICAL BIOCHEMISTRY, ACADEMIC PRESS, AMSTERDAM, NL, vol. 627, 20 May 2021 (2021-05-20), XP086635681, ISSN: 0003-2697, [retrieved on 20210520], DOI: 10.1016/J.AB.2021.114248 * |
| MITCHELL, M.: "An introduction to genetic algorithms", vol. 32, 1996, MIT PRESS |
| MITCHELL, T. M.: "Machine Learning", 1997, MCGRAW-HILL |
| MOLNAR, I.: "Computerized design of separation strategies by reversed-phase liquid chromatography: development of DryLab software", JOURNAL OF CHROMATOGRAPHY A, vol. 965, no. 1-2, 2002, pages 175 - 194, XP004373703, DOI: 10.1016/S0021-9673(02)00731-8 |
| MORGAN, H. L.: "The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service", JOURNAL OF CHEMICAL DOCUMENTATION, vol. 5, no. 2, 1965, pages 107 - 113, XP008078624 |
| MORIWAKI, H.TIAN, Y.-S.KAWASHITA, N.TAKAGI, T.: "Mordred: a molecular descriptor calculator", JOURNAL OF CHEMINFORMATICS, vol. 10, no. 1, 2018, pages 4, XP093095095, DOI: 10.1186/s13321-018-0258-y |
| MULLER, A.GUIDO, S.: "Introduction to Machine Learning with Python: A Guide for Data Scientists", O'REILLY MEDIA. PANDAS DEVELOPMENT TEAM, T., 2016 |
| PEDREGOSA, F.VAROQUAUX, G.GRAMFORT, A.MICHEL, V.THIRION, B.GRISEL, O.BLONDEL, M.PRETTENHOFER, P.WEISS, R.DUBOURG, V.: "Scikit-learn: Machine Learning in Python", JOURNAL OF MACHINE LEARNING RESEARCH, vol. 12, 2011, pages 2825 - 2830 |
| PROBST, D.REYMOND, J.-L.: "A probabilistic molecular fingerprint for big data settings", JOURNAL OF CHEMINFORMATICS, vol. 10, no. 1, 2018, pages 66 |
| ROGERS, D.HAHN, M.: "Extended-Connectivity Fingerprints", JOURNAL OF CHEMICAL INFORMATION AND MODELING, vol. 50, no. 5, 2010, pages 742 - 754, XP055315446, DOI: 10.1021/ci100050t |
| ROSSUM, G. V, PYTHON, 2021 |
| SNYDER, L. R.: "A New Look at the Selectivity of RPC Columns", ANALYTICAL CHEMISTRY, vol. 79, no. 9, 2007, pages 3254 - 3262 |
| SNYDER, L. R.KIRKLAND, J. J.DOLAN, J. W.: "Metaheuristics. From Design to Implementation, El-Ghazali Talbi", 2009, JOHN WILEY & SONS |
| SNYMAN, J. A.WILKE, D. N.: "Introduction. In Practical mathematical optimization: Basic optimization theory and gradient-based algorithms", 2018, SPRINGER INTERNATIONAL PUBLISHING, pages: 3 - 40 |
| SZUCS, R.BROWN, R.BRUNELLI, C.HEATON, J. C.HRADSKI, J.: "Structure Driven Prediction of Chromatographic Retention Times: Applications to Pharmaceutical Analysis", INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, vol. 22, no. 8, 2021, pages 3848 |
| TODESCHINI, R.CONSONNI, V., HANDBOOK OF MOLECULAR DESCRIPTORS. METHODS AND PRINCIPLES IN MEDICINAL CHEMISTRY, 2020 |
| TODESCHINI, R.CONSONNI, V., MOLECULAR DESCRIPTORS FOR CHEMOINFORMATICS. METHODS AND PRINCIPLES IN MEDICINAL CHEMISTRY, 2020 |
| TOMASZ BA?CZEK ET AL: "Predictions of peptides' retention times in reversed-phase liquid chromatography as a new supportive tool to improve protein identification in proteomics", PROTEOMICS, vol. 9, no. 4, 1 February 2009 (2009-02-01), pages 835 - 847, XP055139166, ISSN: 1615-9853, DOI: 10.1002/pmic.200800544 * |
| TOME, T.ZIGART, N.CASAR, Z.OBREZA, A.: "Development and Optimization of Liquid Chromatography Analytical Methods by Using AQbD Principles: Overview and Recent Advances", ORGANIC PROCESS RESEARCH & DEVELOPMENT, vol. 23, no. 9, 2019, pages 1784 - 1802 |
| TRAPPE, W: "Die Trennung von biologischen Fettstoffen aus ihren naturlichen Gemischen durch Anwendung von Adsorptionssaulen. II. Mitteilung: Abtrennung der phosphor und stickstofffreien Lipoidfraktionen", BIOCHEM., vol. 305, 1940, pages 150 - 154 |
| USMAN A G ET AL: "Hybrid data-intelligence algorithms for the simulation of thymoquinone in HPLC method development", IRANIAN CHEMICAL SOCIETY. JOURNAL, IRANIAN CHEMICAL SOCIETY, IR, vol. 18, no. 7, 1 January 2021 (2021-01-01), pages 1537 - 1549, XP037476600, ISSN: 1735-207X, [retrieved on 20210101], DOI: 10.1007/S13738-020-02124-5 * |
| WILDMAN, S. A.CRIPPEN, G. M.: "Prediction of Physicochemical Parameters by Atomic Contributions", JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, vol. 39, no. 5, 1999, pages 868 - 873, XP001029973, DOI: 10.1021/ci990307l |
| WILSON, N.NELSON, M.DOLAN, J.SNYDER, L.WOLCOTT, R.CARR, P.: "Column selectivity in reversed-phase liquid chromatography I. A general quantitative relationship", JOURNAL OF CHROMATOGRAPHY A, vol. 961, no. 2, 2002, pages 171 - 193, XP004370621, DOI: 10.1016/S0021-9673(02)00659-3 |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118243842A (en) * | 2024-05-28 | 2024-06-25 | 武汉智化科技有限公司 | Liquid Chromatography Retention Time Prediction Method under Different Chromatographic Conditions |
| CN118243842B (en) * | 2024-05-28 | 2024-08-27 | 武汉智化科技有限公司 | Liquid chromatography retention time prediction method under different chromatographic conditions |
| US12362044B2 (en) | 2024-05-28 | 2025-07-15 | Wuhan Zhihua Technology Co., Ltd. | Method for predicting retention time of liquid chromatography under different chromatographic conditions |
| CN121007997A (en) * | 2025-10-23 | 2025-11-25 | 杭州凯莱谱质造科技有限公司 | A preliminary separation method and system for liquid chromatography |
Also Published As
| Publication number | Publication date |
|---|---|
| KR20250099131A (en) | 2025-07-01 |
| EP4609189A1 (en) | 2025-09-03 |
| CN120112790A (en) | 2025-06-06 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Melnikov et al. | Deep learning for the precise peak detection in high-resolution LC–MS data | |
| Collins et al. | Current challenges and recent developments in mass spectrometry–based metabolomics | |
| Mahieu et al. | Systems-level annotation of a metabolomics data set reduces 25 000 features to fewer than 1000 unique metabolites | |
| Kuhl et al. | CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets | |
| Matyushin et al. | Deep learning driven GC-MS library search and its application for metabolomics | |
| Bludau et al. | Complex-centric proteome profiling by SEC-SWATH-MS for the parallel detection of hundreds of protein complexes | |
| Neuweger et al. | MeltDB: a software platform for the analysis and integration of metabolomics experiment data | |
| Goodacre et al. | Metabolomics by numbers: acquiring and understanding global metabolite data | |
| Kensert et al. | Graph convolutional networks for improved prediction and interpretability of chromatographic retention data | |
| Menikarachchi et al. | MolFind: a software package enabling HPLC/MS-based identification of unknown chemical structures | |
| Wishart | Computational approaches to metabolomics | |
| Peironcely et al. | Automated pipeline for de novo metabolite identification using mass-spectrometry-based metabolomics | |
| WO2024089143A1 (en) | Determining hplc method parameters using machine learning | |
| Montenegro-Burke et al. | Data streaming for metabolomics: accelerating data processing and analysis from days to minutes | |
| Picache et al. | Chemical class prediction of unknown biomolecules using ion mobility-mass spectrometry and machine learning: supervised inference of feature taxonomy from ensemble randomization | |
| Woldegebriel et al. | Artificial neural network for probabilistic feature recognition in liquid chromatography coupled to high-resolution mass spectrometry | |
| Tebani et al. | Advances in metabolome information retrieval: turning chemistry into biology. Part II: biological information recovery | |
| He et al. | Comparative evaluation of proteome discoverer and FragPipe for the TMT-based proteome quantification | |
| Nuka et al. | AI-Driven Drug Discovery: Transforming Neurological and Neurodegenerative Disease Treatment Through Bioinformatics and Genomic Research | |
| Shi et al. | MS based foodomics: An edge tool integrated metabolomics and proteomics for food science | |
| Nash et al. | Characterization of electrospray ionization complexity in untargeted metabolomic studies | |
| Fine et al. | Structure Based Machine Learning Prediction of Retention Times for LC Method Development of Pharmaceuticals | |
| Szucs et al. | Impact of structural similarity on the accuracy of retention time prediction | |
| Kamedulska et al. | Toward the general mechanistic model of liquid chromatographic retention | |
| Hoffmann et al. | Nontargeted identification of tracer incorporation in high-resolution mass spectrometry |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23797783 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2025522639 Country of ref document: JP Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2025522639 Country of ref document: JP |
|
| WWE | Wipo information: entry into national phase |
Ref document number: CN202380075548X Country of ref document: CN Ref document number: 202380075548.X Country of ref document: CN |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2023797783 Country of ref document: EP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2023797783 Country of ref document: EP Effective date: 20250528 |
|
| WWP | Wipo information: published in national office |
Ref document number: 202380075548.X Country of ref document: CN |
|
| WWP | Wipo information: published in national office |
Ref document number: 1020257013554 Country of ref document: KR |
|
| WWP | Wipo information: published in national office |
Ref document number: 2023797783 Country of ref document: EP |