Disclosure of Invention
It is an object of the present disclosure to provide a method and apparatus for antibody humanization based on dynamic programming that may address one or more of the above-mentioned problems of the prior art.
According to one aspect of the present disclosure, there is provided a method of dynamic programming-based antibody humanization, comprising the steps of:
s1: obtaining a human antibody gene sequence;
s2: numbering heterologous antibody gene sequences;
s3: comparing the heterologous antibody gene sequence with the human antibody gene sequence based on a dynamic rule algorithm;
s4: and (4) carrying out amino acid substitution according to the comparison result to obtain a humanized antibody gene sequence.
In some embodiments, in step S1, the human antibody gene sequences are V, D, J and C gene sequences of a human antibody.
In some embodiments, in step S2, numbering the heterologous antibody gene sequence uses the Kabat numbering scheme.
In some embodiments, in step S3, aligning the heterologous antibody gene sequence with the human antibody gene sequence comprises:
initializing a scoring matrix, wherein the rows and columns of the matrix are respectively the base sequences of the heterologous antibody gene sequences and the human antibody gene sequences;
the score for each entry in the scoring matrix is calculated,
backtracking according to the values in the scoring matrix and acquiring a backtracking path;
and obtaining a corresponding base sequence according to the backtracking path to obtain a local optimal matching sequence.
In some embodiments, in step S4, performing an amino acid substitution comprises replacing a region of the framework region of the heterologous antibody gene sequence that is significantly different from the accessible amino acid residues on the surface of the human antibody, without replacing an amino acid residue of the complementarity determining region of the heterologous antibody gene sequence that affects the conformation of the complementarity determining region of the antibody, based on the alignment.
In accordance with another aspect of the present disclosure, there is provided an apparatus for dynamic programming-based antibody humanization, comprising,
the human antibody gene sequence acquisition module is used for acquiring a human antibody gene sequence;
the numbering module is used for numbering the heterologous antibody gene sequence;
the comparison module is used for comparing the heterologous antibody gene sequence with the human antibody gene sequence based on a dynamic rule algorithm;
and the amino acid substitution module is used for carrying out amino acid substitution according to the comparison result to obtain the humanized antibody gene sequence.
In some embodiments, in the human antibody gene sequence acquisition module, the human antibody gene sequence is V, D, J and C gene sequence of a human antibody.
In some embodiments, in the numbering module, numbering the heterologous antibody gene sequence uses the Kabat numbering scheme.
In some embodiments, in the aligning means, aligning the heterologous antibody gene sequence with the human antibody gene sequence comprises:
initializing a scoring matrix, wherein the rows and columns of the matrix are respectively the base sequences of the heterologous antibody gene sequences and the human antibody gene sequences;
the score for each entry in the scoring matrix is calculated,
backtracking according to the values in the scoring matrix and acquiring a backtracking path;
and obtaining a corresponding base sequence according to the backtracking path to obtain a local optimal matching sequence.
In some embodiments, in the amino acid substitution module, performing an amino acid substitution comprises, based on the alignment, replacing a region of the framework region of the heterologous antibody gene sequence that is significantly different from the accessible amino acid residues on the surface of the human antibody, without replacing an amino acid residue of the heterologous antibody gene sequence whose complementarity determining region affects the conformation of the complementarity determining region of the antibody.
The beneficial effect of the present disclosure is that,
according to the method and the device for antibody humanization based on dynamic programming, the humanization transformation of the surface amino acid residues of the heterologous antibody is performed through the dynamic programming, the antibody heterogeneity is reduced, the antibody activity is considered, the humanization speed is high, the sequence humanization can be performed in batches, and the humanized antibody sequence has good affinity and specificity.
In addition, in the technical solutions of the present disclosure, the technical solutions can be implemented by adopting conventional means in the art, unless otherwise specified.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
Example 1:
fig. 1 shows a method for antibody humanization based on dynamic programming according to an embodiment of the present disclosure, which includes the following steps:
s1: obtaining a human antibody gene sequence;
specifically, fasta files encoding V, D, J and C gene sequences of human antibodies can be downloaded from the IMGT database.
S2: numbering heterologous antibody gene sequences;
in particular, numbering heterologous antibody gene sequences uses the Kabat numbering scheme.
S3: comparing the heterologous antibody gene sequence with the human antibody gene sequence based on a dynamic rule algorithm;
s4: and (4) carrying out amino acid substitution according to the comparison result to obtain a humanized antibody gene sequence.
In an alternative embodiment, referring to figure 2 of the specification, aligning the heterologous antibody gene sequence with the human antibody gene sequence in step S3 comprises:
s3.1: a scoring matrix is initialized, the rows and columns of which are the base sequences of the heterologous antibody gene sequences and the human antibody gene sequences, respectively.
In particular, the heterologous antibody gene sequence is assumed to be SEQ1=a1a2a3…anAnd the human antibody gene sequence is SEQ2=b1b2b3…bm(ii) a Row i of the matrix then represents amino acid aiColumn j represents amino acid bj。
S3.2: the score for each entry in the scoring matrix is calculated.
Specifically, the score of each item in the scoring MATRIX is set as MATRIXijThen, then
MATRIXij=max{MATRIXi-1,j-1+SCORE(ai,aj),
MATRIXi-1,j-PENALTY,MATRIXi,j-1-PENALTY},
Wherein, SCORE (a)i,aj) Is amino acid aiAnd amino acid ajWhen a is similar toi=ajIs, SCORE (a)i,aj) When a is equal to ei≠ajIs, SCORE (a)i,aj) -e, wherein e is a positive integer; PENALTY is a gap PENALTY.
S3.3: and backtracking according to the values in the scoring matrix and acquiring a backtracking path.
Specifically, starting from the one with the largest score in the scoring MATRIX MATRIX, ai=bjThen go back to the top left cell, if ai≠bjAnd backtracking to the cell with the maximum value in the upper left, upper and left sides, and if the cell with the same maximum value exists, the priority is in the order of the upper left, the upper left and the left side.
S3.4: and obtaining a corresponding base sequence according to the backtracking path to obtain an alignment result, namely a local optimal matching sequence.
Specifically, if backtracking to the upper left cell, then aiTo the matching amino acid sequence SEQ1 ', add "×" to the matching amino acid sequence SEQ 2'; if backtracking to the upper cell, then aiAddition to the matching amino acid sequence A', addition of bjAddition to the matching amino acid sequence B'; if the left cell is traced back, "-" is added to the matching amino acid sequence SEQ 1', bjTo the matching amino acid sequence SEQ 2'.
In an alternative embodiment, in step S4, the amino acid substitution comprises replacing a region of the framework region of the heterologous antibody gene sequence that is significantly different from the accessible amino acid residues on the surface of the human antibody, without replacing amino acid residues of the Complementarity Determining Region (CDR) of the heterologous antibody gene sequence that affect the conformation of the complementarity determining region of the antibody, based on the alignment. Thus, antibody activity is maintained while reducing heterogeneity.
The method for humanizing the gene sequence of the heterologous antibody is specifically described by taking the heavy chain of the murine PD-1 antibody J43 as an example.
The IGH amino acid sequences are downloaded from the IMGT database.
Numbering the heavy chain amino acid sequence of the murine PD-1 antibody J43, and comparing the amino acid sequence with the downloaded IGH amino acid sequence, wherein the coding amino acid sequence of the murine PD-1 antibody J43 heavy chain is as follows:
EVRLLESGGGLVKPEGSLKLSCVASGFTFSDYFMSWVRQAPGKGL EWVAHIYTKSYNYATYYSGSVKGRFTISRDDSRSMVYLQMNNLRTEDT ATYYCTRDGSGYPSLDFWGQGTQVTVSS;
using a two-dimensional table, the murine PD-1 antibody J43 heavy chain amino acid sequence was expanded along the first row and the downloaded IGH amino acid sequence was expanded along the first column. When calculating the matrix, it needs to add a "-" before each of the two sequences to represent gap, and the grid at the top left corner is taken as the starting point and is marked as 0.
Initializing a scoring matrix, setting a basic scoring strategy, and defining scores (including gap) of two amino acids under various alignment conditions: if the two amino acids are the same, the two amino acids are perfectly matched, namely, the +1 point; if the two amino acids are different, i.e. mismatched, then score-1; if there is a gap open on either strand, then score-1.
And calculating a scoring matrix. Using MATRIXijThe score of the lattice in the ith row and the jth column is shown, then
MATRIXij=max{MATRIXi-1,j-1+SCORE(ai,aj),
MATRIXi-1,j-PENALTY,MATRIXi,j-1-PENALTY},
Wherein, SCORE (a)i,aj) Is amino acid aiAnd amino acid ajWhen a is similar toi=ajIs, SCORE (a)i,aj) When a is equal to ei≠ajIs, SCORE (a)i,aj) -e, wherein e is a positive integer; PENALTY is a gap PENALTY; in this embodiment, e is 1, i.e. when ai=ajIs, SCORE (a)i,aj) When a is 1i≠ajIs, SCORE (a)i,aj)=-1。
The calculation matrix and partial calculation results are shown in table 1.
| |
-
|
E
|
V
|
R
|
L
|
L
|
E
|
S
|
…
|
| -
|
0
|
-1
|
-2
|
-3
|
-4
|
-5
|
-6
|
-7
|
|
| E
|
-1
|
1
|
0
|
-1
|
-2
|
-3
|
-4
|
-5
|
|
| V
|
-2
|
0
|
2
|
1
|
0
|
-1
|
-2
|
-3
|
|
| Q
|
-3
|
-1
|
1
|
1
|
0
|
-1
|
-2
|
-3
|
|
| L
|
-4
|
-2
|
0
|
0
|
2
|
1
|
0
|
-1
|
|
| V
|
-5
|
-3
|
-1
|
-1
|
1
|
1
|
0
|
-1
|
|
| E
|
-6
|
-4
|
-2
|
-2
|
0
|
0
|
2
|
1
|
|
| S
|
-7
|
-5
|
-3
|
-3
|
-1
|
-1
|
1
|
3
|
|
| …
|
|
|
|
|
|
|
|
|
|
TABLE 1
And backtracking according to the values in the scoring matrix and acquiring a backtracking path.
According to the backtracking path, obtaining the corresponding base sequence, namely the alignment result, referring to the attached figure 3 of the specification.
Amino acid substitution is performed to complete antibody humanization. Referring to FIG. 3 of the specification, the replacement framework replaces the amino acid residues of the framework regions FR1, FR2, FR3 and FR4, and the resulting sequence is:
EVQLVESGGGLVQPGRSLRLSCVASGFTFSDYFMSWVRQAPGKGLEWVAHIYTKSYNYATYYAASVKGRFTISRDDSKSIAYLQMNSLKTEDTAVYYCTRDGSGYPSLDFWGQGSQVTVSS。
the beneficial effect of the present disclosure is that,
according to the antibody humanization method based on dynamic programming, the humanization transformation of the surface amino acid residues of a heterologous antibody is performed through the dynamic programming, the antibody heterogeneity is reduced, the antibody activity is considered, the humanization speed is high, the sequence humanization can be performed in batches, and the humanized antibody sequence has good affinity and specificity.
Example 2:
the present disclosure also provides an apparatus for antibody humanization based on dynamic programming, comprising,
the human antibody gene sequence acquisition module is used for acquiring a human antibody gene sequence;
the numbering module is used for numbering the heterologous antibody gene sequence;
the comparison module is used for comparing the heterologous antibody gene sequence with the human antibody gene sequence based on a dynamic rule algorithm;
and the amino acid substitution module is used for carrying out amino acid substitution according to the comparison result to obtain the humanized antibody gene sequence.
In an alternative embodiment, in the human antibody gene sequence acquisition module, the human antibody gene sequence is V, D, J and C gene sequence of a human antibody.
In alternative embodiments, in the numbering module, numbering the heterologous antibody gene sequence uses the Kabat numbering scheme.
In alternative embodiments, in the aligning module, aligning the heterologous antibody gene sequence to the human antibody gene sequence comprises:
initializing a scoring matrix, wherein the rows and columns of the matrix are respectively the base sequences of the heterologous antibody gene sequences and the human antibody gene sequences;
the score for each entry in the scoring matrix is calculated,
backtracking according to the values in the scoring matrix and acquiring a backtracking path;
and obtaining a corresponding base sequence according to the backtracking path to obtain a local optimal matching sequence.
In an alternative embodiment, in the amino acid substitution module, the amino acid substitution comprises, based on the alignment results, replacing a region of the framework region of the heterologous antibody gene sequence that is significantly different from the surface accessible amino acid residues of the human antibody, without replacing amino acid residues of the Complementarity Determining Region (CDR) of the heterologous antibody gene sequence that affect the conformation of the complementarity determining region of the antibody. Thus, antibody activity is maintained while reducing heterogeneity.
The beneficial effect of the present disclosure is that,
according to the device for antibody humanization based on dynamic programming, the humanization transformation of the surface amino acid residues of a heterologous antibody is performed through dynamic programming, the heterogeneity of the antibody is reduced, the activity of the antibody is considered, the humanization speed is high, the sequence humanization can be performed in batches, and the affinity and the specificity of the humanized antibody sequence are good.
It should be noted that, when the apparatus provided in the foregoing embodiment implements the functions thereof, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the apparatus may be divided into different functional modules to implement all or part of the functions described above. In addition, the apparatus and method embodiments provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solutions of the present disclosure, not to limit them; although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present disclosure.