WO2022205775A1 - Method and device for determining immunity index of individual, electronic device, and machine-readable storage medium - Google Patents
Method and device for determining immunity index of individual, electronic device, and machine-readable storage medium Download PDFInfo
- Publication number
- WO2022205775A1 WO2022205775A1 PCT/CN2021/117149 CN2021117149W WO2022205775A1 WO 2022205775 A1 WO2022205775 A1 WO 2022205775A1 CN 2021117149 W CN2021117149 W CN 2021117149W WO 2022205775 A1 WO2022205775 A1 WO 2022205775A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- immune
- index
- individual
- sequencing
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
Definitions
- the present invention relates to the field of biomedicine, and in particular, the present invention relates to a method, a device, an electronic device and a machine-readable storage medium for determining an individual's immunity index.
- Immunity is the body's own defense mechanism. It is the body's ability to identify and eliminate any foreign intrusion (viruses, bacteria, etc.)
- Ability is the physiological response of the human body to identify and exclude “others”.
- the immune system of the human body is maintained by the immune system, and the immune system is the best doctor in the world that the human body is born with.
- the immune system consists of two cooperating subsystems that provide innate and adaptive immunity.
- Innate immunity refers to a non-specific defense mechanism that protects the body from toxins or foreign substances (called antigens).
- antigens toxins or foreign substances
- the rapid response of the innate immune system also activates the adaptive system, which is the body's antigen-specific response to itself.
- the adaptive immune system consists of two main types of lymphocytes, called B cells and T cells. These lymphocytes have unique antigen receptors, each of which recognizes only one antigen, and this range of specificity is encoded by a fixed number of gene segments. Through a mechanism called V(D)J recombination, these genetic regions undergo irreversible somatic DNA recombination during cell development, resulting in the formation of mature lymphocytes with a single specificity.
- the immune repertoire refers to all the unique genetic rearrangements of T cell receptors (TCRs) and B cell receptors (BCRs) within the adaptive immune system.
- immune repertoire NGS detection provides technical support for evaluating the body's adaptive immune system in healthy or diseased states.
- Immunoglobulin and complement are the main effector components of humoral immunity. In the case of certain diseases (such as infections, autoimmune diseases, immunodeficiency diseases, etc.), the concentrations of these indicators will increase or decrease relative to the reference value, so that they can be evaluated. The clinical value of immunity and diagnosis of diseases.
- the five immunoassays target humoral immunity and cannot assess cellular immunity well.
- humoral immunity only the overall levels of IgG, IgA, IgM, and complement C3 and C4 can be detected, and in-depth analysis at the molecular sequence level cannot be performed.
- Lymphocyte subset analysis using flow cytometry and PCR technology to analyze the number and relative proportion of each subset of leukocytes in peripheral blood.
- flow cytometry or PCR technology By flow cytometry or PCR technology, the relative and absolute counts of immune cells in peripheral blood and their changes are monitored, and the immune status in disease states (such as tumors, infectious diseases, immune diseases, etc.) Assisting in diagnosis, tracking disease progression and deciding on medication timing.
- the most commonly detected subsets include T cells (CD3), B cells (CD19), NK cells (CD16+56), helper T cells (CD3+CD4+), and suppressor T cells (CD3+CD8+).
- lymphocyte subsets there are many types of lymphocyte subsets, and if a comprehensive analysis is carried out, the amount of peripheral blood that needs to be collected, the cost and the time are all unacceptable. It is difficult to obtain a comprehensive immune system status by analyzing only a few lymphocyte subsets. In addition, lymphocyte subsets have different normal reference ranges at different ages, and the results are affected by many factors, making clinical interpretation relatively difficult.
- an object of the present invention aims to solve one of the technical problems in the related art at least to a certain extent.
- an object of the present invention is to carry out high-sensitivity detection of the adaptive immune system of an individual at the molecular sequence level by means of the immune repertoire sequencing method. (Immune Age (IA)) to assess the health status of the individual body to achieve early health risk prediction.
- IA Immunune Age
- the present invention proposes a method for determining an individual immunity index.
- the method includes: (1) acquiring nucleic acid sequencing data of the individual to be tested; (2) by The sequencing result is compared with the reference sequence, and the V/J sequence and the CDR sequence contained in the nucleic acid sample are determined; (3) based on the V/J sequence and the CDR sequence contained in the nucleic acid sample, the statistical characteristics are determined.
- the statistical characteristics include at least one selected from the following: V/J gene usage diversity index, immune cell diversity index, number of immune cell types, immune cell homogeneity index; (4) based on the statistical characteristics, determine the an immune age value of an individual; and (5) determining an immunity index of the individual based on the immune age value.
- the method of the present invention can be implemented by using a small amount of samples by sequencing, so as to realize the high-sensitivity detection of the individual adaptive immune system at the molecular level, and realize non-invasive early diagnosis, curative effect evaluation, Condition tracking, relapse prediction and comprehensive immune assessment.
- the PCR technology can be used to amplify the genes contained in lymphocytes in peripheral blood, which requires less blood samples, and the subsequent processing of the samples is simple, and no inaccurate human well blood cell observation technology is required, and no operation is required. Sophisticated immunolabeling and flow analysis.
- immune evaluation by immune repertoire sequencing can not only improve the sensitivity of detection, but also realize functions such as early diagnosis, evaluation of curative effect, tracking of illness, prediction of recurrence, and comprehensive evaluation of immunity.
- the present invention provides a device for determining an individual immunity index.
- the device includes: a sequencing data acquisition unit for acquiring nucleic acid sequencing data of an individual to be tested; sequencing A result analysis unit for determining the V/J sequence and CDR sequence contained in the nucleic acid sample by comparing the sequencing result with a reference sequence; a statistical unit for determining the V/J sequence contained in the nucleic acid sample based on the Sequence and CDR sequence, determine statistical characteristics, and the statistical characteristics include at least one selected from the following: V/J gene usage diversity index, immune cell diversity index, number of immune cell types, immune cell homogeneity index; immune age a determining unit for determining an immune age value of the individual based on the statistical feature; and an immunity index determining unit for determining an immune index for the individual based on the immune age value.
- the present invention provides an electronic device, according to an embodiment of the present invention, comprising a processor and a memory, the memory storing machine-executable instructions executable by the processor, the The processor executes the machine-executable instructions to implement the aforementioned method of determining an immunity index of an individual.
- the present invention provides a machine-readable storage medium.
- the machine-readable storage medium stores machine-executable instructions, and the machine-executable instructions are called by a processor when the and, when executed, the machine-executable instructions cause a processor to implement the method of determining an individual's immunity index as described in any preceding item.
- FIG. 1 is a schematic flowchart of a method for determining an individual immunity index according to an embodiment of the present invention
- FIG. 2 is a partial schematic flowchart of a method for determining an individual immunity index according to an embodiment of the present invention
- FIG. 3 is a schematic structural diagram of a device for determining an individual immunity index according to an embodiment of the present invention.
- Fig. 4 is a partial structural schematic diagram of a device for determining an individual immunity index according to an embodiment of the present invention.
- Fig. 5 is the prediction result of the immunity index of different age groups in the embodiment 2 of the present invention.
- FIG. 6 is a distribution diagram of the relationship between the immunity index and individual age in Example 2 of the present invention.
- Embodiments of the present invention are described in detail below.
- the embodiments described below are exemplary, only for explaining the present invention, and should not be construed as limiting the present invention. If no specific technique or condition is indicated in the examples, the technique or condition described in the literature in the field or the product specification is used.
- the reagents or instruments used without the manufacturer's indication are conventional products that can be obtained from the market.
- the present invention proposes a method for determining the immunity index of an individual. 1, according to an embodiment of the present invention, the method includes:
- nucleic acid sequencing data from the individual to be tested is first acquired for subsequent analysis.
- these nucleic acid sequencing data may contain the genetic information of immune cells, for example, according to embodiments of the present invention, blood samples containing immune cells or tissue samples containing immune cells (described herein) may be used.
- Tissue samples should be understood in a broad sense and can include at least a part of organs), such as non-encapsulated diffuse lymphoid tissue and lymph nodes contained in the submucosal mucosa of the intestinal tract, respiratory tract, urogenital tract, etc.
- nucleic acid sequencing data can be obtained by high-throughput sequencing.
- second- or third-generation sequencing platforms including but not limited to high-throughput sequencing platforms such as MGISEQ-T7, MGISEQ-2000, MGISEQ-200, BGISEQ-500, BGISEQ-50, MGISP-960, and MGISP-100.
- the sequencing process includes:
- RNA For blood or tissue samples, extract DNA or RNA. For each sample, take the starting amount of DNA or RNA, add primers (TCR or BCR) for a certain chain, and perform multiple PCR amplification. PCR is carried out for a total of two rounds. One round was PCR reaction with VJ-specific primers (with partial sequencing adapters), and the second round was sequencing adapters for ordinary PCR library construction. Afterwards, multiple samples are pooled together for sequencing, resulting in data for each sample. According to the embodiment of the present invention, a tag sequence may also be introduced in the second round of PCR, thereby realizing the distinction of sample batches.
- acquiring nucleic acid sequencing data may further include:
- a nucleic acid sample of the individual to be tested is obtained, and the nucleic acid sample includes at least one of DNA molecules and RNA molecules.
- RNA molecules include at least one of DNA molecules and RNA molecules.
- Those skilled in the art can use commercially available kits and follow the manufacturer's instructions for extraction of DNA molecules or RNA molecules. It can be understood by those skilled in the art that, after obtaining RNA molecules, reverse transcription can be easily used to obtain cDNA molecules.
- VJ-specific primers can be used to perform a first amplification process, so as to obtain a first amplification product.
- V gene and the J gene, the immune cell-specific sequences contained in the nucleic acid sample obtained in step S110 may be amplified by VJ-specific primers.
- VJ-specific primers refer to specific primers that can amplify V and J genes. For V and J genes, it is worth noting that for most loci, they are classified as families according to their degree of homology. Forms come together. These VJ-specific primers can be used to analyze the combinatorial diversity of V-J rearrangements at at least one locus selected from loci TRA, TRB, TRG, TRD, IgH, IgK, IgL, and the like.
- the VJ-specific primer used in the present invention has the following nucleotide sequence:
- the VJ-specific primer contains a portion of the sequence of the sequencing adapter. Therefore, it is convenient to introduce sequencing adapters into the amplification products through the second amplification process.
- a second amplification process is performed on the first amplification product to obtain a second amplification product, wherein the second amplification product carries a sequencing adapter.
- the second amplification process can be performed by using the common sequence in the first amplification product, and the primers used can be set to be suitable for introduction into sequencing adapters.
- the obtained second amplification product constitutes a sequencing library that can be used for sequencing.
- the second amplification product is sequenced to obtain sequencing results.
- the sequencing library (second amplification product) can be sequenced using a sequencing platform.
- nucleic acid sequencing data can be obtained by high-throughput sequencing.
- second- or third-generation sequencing platforms including but not limited to high-throughput sequencing platforms such as MGISEQ-T7, MGISEQ-2000, MGISEQ-200, BGISEQ-500, BGISEQ-50, MGISP-960, and MGISP-100. Paired-end sequencing is preferably used. It can improve the efficiency of subsequent analysis.
- the V/J sequence and the CDR sequence contained in the nucleic acid sample are determined by aligning the sequencing result with the reference sequence.
- software such as SOAPnuke (v1.5.3) can be used to filter the linker contaminating sequences, low-quality bases and sequences on the raw sequencing data.
- the FASTQ file was converted into a FASTA file with a self-developed program for sequence splicing; finally, if the sequencing mode was paired-end sequencing, COPE (v1.5.3) and the self-developed program were used to assemble the sequences.
- blastall (v2.2.25) can be used to align the preprocessed FASTA sequence to the V(D)J reference gene sequence, and then the self-developed program is used to perform re-alignment and select the best alignment result , that is: use different methods to count the scores of the non-CDR3 and CDR3 regions, select the best hit with the highest score, and determine the attribution of the sequenced sequence by aligning with the CDR, V, and J reference sequences, so as to determine the CDR sequence and VJ sequence. of.
- the structure of immune molecules is analyzed. This part mainly includes two functions: error correction and region determination. First, the errors introduced in PCR and sequencing were corrected by self-developed programs, and then the CDR regions were determined using the rules of V/J gene reference sequences and conserved amino acids and the established computational methods.
- the CDR sequence can be determined by a common method.
- the CDR sequence is at least one of CDR1, CDR2 and CDR3 sequences, preferably a CDR3 sequence. Because CDR3 has the greatest variation, it directly determines the antigen-binding specificity of TCR.
- the CDR3 of TCR is encoded by three genes V, D, and J. During the maturation of lymphocytes, various recombinant sequence fragments are formed through the rearrangement of V, D, and J genes, plus DNA base SNP, Indel Mutations create a diversity of T cells.
- V/J refers to at least a portion of the result of a V(D)J rearrangement for a particular cell, which may be a V gene sequence, a J gene sequence, or a V gene sequence.
- the combination of the gene sequence and the J gene sequence may also sandwich the D gene sequence between the V gene sequence and the J gene sequence.
- Statistical features are determined based on the V/J sequences and CDR sequences contained in the nucleic acid sample, and the statistical features include at least one selected from the following: V/J gene usage diversity index, immune cell diversity index, number of immune cell types, immune Cell Homogeneity Index.
- At least one of the V/J gene usage diversity index and the immune cell diversity index is a Shannon index.
- the type of immune cells is determined based on the CDR3 sequence.
- the immune cell homogeneity index is the Gini index.
- the immune repertoire feature data is counted, and the statistical features mainly include the following:
- V/J gene usage diversity i.e. Shannon_index(V-J);
- Immune cell homogeneity i.e. Clone_Gini.
- Shannon_index represents the Shannon index
- the calculation formula is as follows:
- CDR3 is taken as an example
- S represents the total number of unique CDR3s
- p(i) represents the frequency of CDR3s.
- Uniq_number represents the unique sequence number.
- Clone_Gini represents the Gini index, and the calculation formula is as follows:
- x refers to the frequency of each immune cell type
- n refers to the number of immune cell types.
- the immune age value of the individual is determined.
- the immune age value is determined based on at least one statistical feature using a maximum a posteriori probability estimate.
- step S400 it further includes: (4-1) using a predetermined immune age prediction coefficient distribution (mainly according to the characteristics of the selected feature to determine the parameter prior distribution, if the selected feature is continuous, in In the case of a large amount of data, it is generally considered to be a normal distribution), based on each statistical feature, determine the immune age prediction coefficient corresponding to each statistical feature; and (4-2) According to the formula Determine the immune age of the individual, where IA represents the immune age of the individual, i represents the number of statistical features, n represents the number of statistical features, ⁇ i represents the immune age prediction coefficient corresponding to the ith statistical feature, and xi represents the ith statistical feature The numerical value of the feature, ⁇ 0 represents the bias term in the prediction model.
- a predetermined immune age prediction coefficient distribution mainly according to the characteristics of the selected feature to determine the parameter prior distribution, if the selected feature is continuous, in In the case of a large amount of data, it is generally considered to be a normal distribution
- the MAP maximum a posteriori probability estimate, maximum a posteriori probability estimation
- ⁇ A means "not A"
- Biochemical indicators mainly include conventional indicators, such as macrobiochemical, blood routine and so on.
- the training data is mainly based on the characteristics of the selected features to determine the prior distribution of the parameters. If the selected features are continuous, in the case of a large amount of data, it is generally considered to be a normal distribution. If it is discrete, it is directly weighted according to the formula below. Just multiply.
- the selected members of the training set mainly include some indicators (V/J gene usage diversity, immune diversity, immune cell type, immune cell homogeneity) obtained from immune repertoire analysis and some biochemical indicators (large biochemical, blood routine, etc.).
- the immunity index of the individual is determined based on the immune age value.
- the immunity index is determined by the following formula:
- IA represents the immune age value determined in step S400
- IAmax represents the upper limit of IA in the predetermined group
- IAmin represents the lower limit of IA in the predetermined group.
- the technical solution After determining the immune index of the individual, the technical solution can realize the high-sensitivity detection of the individual adaptive immune system at the molecular level, and can realize non-invasive early diagnosis, curative effect evaluation, disease tracking, recurrence prediction and comprehensive immunity. Evaluate.
- the method of the present invention can be implemented by using a small amount of samples by sequencing, so as to realize the high-sensitivity detection of the individual adaptive immune system at the molecular level, and realize non-invasive early diagnosis, curative effect evaluation, Condition tracking, relapse prediction and comprehensive immune assessment.
- the PCR technology can be used to amplify the genes contained in lymphocytes in peripheral blood, which requires less blood samples, and the subsequent processing of the samples is simple, and no inaccurate human well blood cell observation technology is required, and no operation is required. Sophisticated immunolabeling and flow analysis.
- immune evaluation by immune repertoire sequencing can not only improve the sensitivity of detection, but also realize functions such as early diagnosis, evaluation of curative effect, tracking of disease condition, prediction of recurrence, and comprehensive evaluation of immunity.
- the present invention provides a device for determining an individual immunity index.
- the device includes:
- the sequencing data acquisition unit 100 is used to acquire nucleic acid sequencing data of the individual to be tested; the sequencing result analysis unit 200 is used to determine the V/J sequence and CDR sequence contained in the nucleic acid sample by comparing the sequencing result with the reference sequence Statistical unit 300 for determining statistical features based on the V/J sequences and CDR3 sequences contained in the nucleic acid sample, the statistical features including at least one selected from the following: V/J gene usage diversity index, immune cell diversity index , the number of immune cell types, and the immune cell homogeneity index; the immune age determination unit 400 is used to determine the immune age value of the individual based on the statistical characteristics; the immune index determination unit 500 is used to determine the immune age value of the individual based on the immune age value. index.
- the sequencing data acquisition unit further includes: a nucleic acid sample acquisition module 110 , a first amplification module 120 and a second amplification module 130 , and a sequencing module 140 .
- the nucleic acid sample acquisition module 110 is used to acquire nucleic acid samples of the individual to be tested, and the nucleic acid samples include at least one of DNA molecules and RNA molecules;
- the first amplification module 120 is used to use VJ specific The first amplification process is performed on the primers to obtain the first amplification product;
- the second amplification module 130 is used for performing the second amplification process on the first amplification product to obtain the second amplification product, wherein the first amplification product is
- the second amplification product carries a sequencing adapter;
- the sequencing module 140 is used to sequence the second amplification product so as to obtain a sequencing result;
- the nucleic acid sample is obtained from an individual's blood or tissue sample.
- the VJ-specific primer contains a portion of the sequence of the sequencing adapter.
- the CDR sequence is at least one of CDR1, CDR2 and CDR3 sequences, preferably a CDR3 sequence.
- At least one of the V/J gene usage diversity index and the immune cell diversity index is a Shannon index.
- the type of immune cells is determined based on the CDR3 sequence.
- the immune cell homogeneity index is the Gini index.
- the immune age determination unit is adapted to determine the immune age value based on the at least one statistical feature using a maximum a posteriori probability estimate.
- the immune age determination unit is configured to: using a predetermined distribution of immune age prediction coefficients, based on each of the statistical features, respectively determine the immune age prediction coefficient corresponding to each statistical feature; and according to the formula Determine the immune age of the individual, where IA represents the immune age of the individual, i represents the number of statistical features, n represents the number of statistical features, ⁇ i represents the immune age prediction coefficient corresponding to the ith statistical feature, and xi represents the ith statistical feature The numerical value of the feature, ⁇ 0 represents the bias term in the pre-prediction model.
- the immunity index is determined by the following formula:
- IA represents the immune age value determined in the immune age determination unit
- IAmax represents the upper limit of IA in the predetermined population
- IAmin represents the lower limit of IA in the predetermined population.
- the present invention provides an electronic device, according to an embodiment of the present invention, comprising a processor and a memory, the memory stores machine-executable instructions that can be executed by the processor, and the processor executes the machine-executable instructions. Instructions to implement the preceding method of determining an individual's immunity index.
- the present invention provides a machine-readable storage medium.
- the machine-readable storage medium stores machine-executable instructions, and the machine-executable instructions are called by a processor when the and when executed, the machine-executable instructions cause a processor to implement any of the preceding methods of determining an immunity index of an individual.
- primers with sequencing adapters are used to further amplify and build a library, and the sequencing library is subjected to high-throughput sequencing.
- sequencing data is analyzed as follows:
- SOAPnuke (v1.5.3) was used to perform junction contamination sequences, low-quality bases and sequences (filtered according to the average quality value of the bases in the sequence and the proportion of the number of N bases contained in the sequence) on the original sequencing data. , “the base quality value of the read is less than or equal to 20", “the number of N bases is greater than or equal to 5", the two satisfy one or all of them are filtered out);
- V/J gene usage diversity i.e. Shannon_index(V-J);
- Immune cell homogeneity i.e. Clone_Gini.
- Shannon_index represents the Shannon index
- the calculation formula is as follows:
- CDR3 is taken as an example
- S represents the total number of unique CDR3s
- p(i) represents the frequency of CDR3s.
- Uniq_number represents the unique sequence number.
- Clone_Gini represents the Gini index, and the calculation formula is as follows:
- x refers to the frequency of each immune cell type
- n refers to the number of immune cell types.
- the MAP maximum a posteriori probability estimate
- the specific model is as follows:
- IA represents the immune age of the predicted sample
- IA max and IA min represent the upper and lower bounds in the population distribution, respectively.
- primers with sequencing adapters are used to further amplify and build a library, and the sequencing library is subjected to high-throughput sequencing.
- sequencing data is analyzed as follows:
- SOAPnuke (v1.5.3) was used to perform junction contamination sequences, low-quality bases and sequences (filtered according to the average quality value of the bases in the sequence and the proportion of the number of N bases contained in the sequence) on the original sequencing data. , “the base quality value of the read is less than or equal to 20", “the number of N bases is greater than or equal to 5", the two satisfy one or all of them are filtered out);
- Shannon_index represents the Shannon index
- the calculation formula is as follows:
- CDR3 is taken as an example
- S represents the total number of unique CDR3s
- p(i) represents the frequency of CDR3s.
- Uniq_number represents the unique sequence number.
- the MAP maximum a posteriori probability estimate, maximum a posteriori probability estimation
- IA represents the immune age of the predicted sample
- IA max and IA min represent the upper and lower bounds in the population distribution, respectively.
- the immunity index showed a downward trend with increasing age. Although the sample size of the age group greater than 50 is small, the decline trend of the immunity index shown in Figure 6 is not obvious, but the decline trend of the immunity index shown in Figure 5 is more obvious. Therefore, the results of this example show that the immunity index can be used as an index for evaluating the health index.
Landscapes
- Medical Informatics (AREA)
- Engineering & Computer Science (AREA)
- Public Health (AREA)
- Health & Medical Sciences (AREA)
- Pathology (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Apparatus Associated With Microorganisms And Enzymes (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Description
本发明涉及生物医学领域,具体的,本发明涉及确定个体免疫力指数的方法、设备、电子设备和机器可读存储介质。The present invention relates to the field of biomedicine, and in particular, the present invention relates to a method, a device, an electronic device and a machine-readable storage medium for determining an individual's immunity index.
免疫力是人体自身的防御机制,是人体识别和消灭外来侵入的任何异物(病毒、细菌等),处理衰老、损伤、死亡、变性的自身细胞,以及识别和处理体内突变细胞和病毒感染细胞的能力,是人体识别和排除“异己”的生理反应。人体的免疫力是依靠免疫系统来维护的,免疫系统是人体与生俱来拥有的世界上最好的医生。Immunity is the body's own defense mechanism. It is the body's ability to identify and eliminate any foreign intrusion (viruses, bacteria, etc.) Ability is the physiological response of the human body to identify and exclude "others". The immune system of the human body is maintained by the immune system, and the immune system is the best doctor in the world that the human body is born with.
免疫系统由两个相互配合的子系统组成,可提供先天免疫和适应性免疫。先天免疫是指保护人体免受毒素或异物(称为抗原)的非特异性防御机制。先天免疫系统的快速反应也会激活适应性系统,适应性系统是机体针对自身的抗原特异性反应。The immune system consists of two cooperating subsystems that provide innate and adaptive immunity. Innate immunity refers to a non-specific defense mechanism that protects the body from toxins or foreign substances (called antigens). The rapid response of the innate immune system also activates the adaptive system, which is the body's antigen-specific response to itself.
适应性免疫系统由两种主要类型的淋巴细胞组成,称为B细胞和T细胞。这些淋巴细胞具有独特的抗原受体,每个独特的抗原受体仅识别一个抗原,这种特异性范围是由固定数目的基因片段编码的。通过一种称为V(D)J重组的机制,这些遗传区域在细胞发育过程中发生不可逆的体细胞DNA重组,从而形成具有单一特异性的成熟淋巴细胞。免疫库是指适应性免疫系统内所有独特的T细胞受体(TCR)和B细胞受体(BCR)遗传重排。The adaptive immune system consists of two main types of lymphocytes, called B cells and T cells. These lymphocytes have unique antigen receptors, each of which recognizes only one antigen, and this range of specificity is encoded by a fixed number of gene segments. Through a mechanism called V(D)J recombination, these genetic regions undergo irreversible somatic DNA recombination during cell development, resulting in the formation of mature lymphocytes with a single specificity. The immune repertoire refers to all the unique genetic rearrangements of T cell receptors (TCRs) and B cell receptors (BCRs) within the adaptive immune system.
随着精准医学和免疫疗法的发展,免疫组库的应用场景越来越广泛。应用场景包括:生物标志物的挖掘,自身免疫性疾病和感染性疾病的检测,免疫排斥和耐受性评估,肿瘤免疫评估,免疫重建以及用药和疫苗评估。因此,免疫组库NGS检测为评估健康或疾病状态下的机体适应性免疫系统提供了技术支持。With the development of precision medicine and immunotherapy, the application scenarios of immune repertoires are becoming more and more extensive. Application scenarios include: biomarker mining, detection of autoimmune and infectious diseases, immune rejection and tolerance assessment, tumor immune assessment, immune reconstitution, and drug and vaccine assessment. Therefore, immune repertoire NGS detection provides technical support for evaluating the body's adaptive immune system in healthy or diseased states.
目前市场上用来分析免疫功能的主要方法有:The main methods currently on the market for analyzing immune function are:
1)免疫五项,检测血液中免疫球蛋白和补体的含量。即通过单向免疫扩散试验、酶联免疫吸附试验(ELISA)、放射免疫试验(RIA)、免疫固定电泳、免疫比浊法等方法,检测血液中免疫球蛋白G(IgG)、免疫球蛋白A(IgA)、免疫球蛋白M(IgM)、补体C3和C4的含量。免疫球蛋白和补体是体液免疫的主要效应成分,在某些疾病(如感染、自身免疫疾病、免疫缺陷病等)情况下,这些指标的浓度相对参考值将出现升高或降低,从而具有评估免疫力、诊断疾病的临床价值。然而,免疫五项检测针对体液免疫,不能很好评估细胞免疫。在评估体液免疫时,只能检测IgG、IgA、IgM和补体C3、C4的总体水平,不能在分子序列层次上进行深度分析。1) Five items of immunity, to detect the content of immunoglobulin and complement in the blood. That is, by one-way immunodiffusion test, enzyme-linked immunosorbent assay (ELISA), radioimmunoassay (RIA), immunofixation electrophoresis, immunoturbidimetry and other methods to detect immunoglobulin G (IgG), immunoglobulin A in blood (IgA), immunoglobulin M (IgM), complement C3 and C4 content. Immunoglobulin and complement are the main effector components of humoral immunity. In the case of certain diseases (such as infections, autoimmune diseases, immunodeficiency diseases, etc.), the concentrations of these indicators will increase or decrease relative to the reference value, so that they can be evaluated. The clinical value of immunity and diagnosis of diseases. However, the five immunoassays target humoral immunity and cannot assess cellular immunity well. When evaluating humoral immunity, only the overall levels of IgG, IgA, IgM, and complement C3 and C4 can be detected, and in-depth analysis at the molecular sequence level cannot be performed.
2)血常规,利用细胞计数的方法分析外周血中白细胞的数量,白细胞数目的增高表明 体内存在炎症反应。即通过显微镜观测对外周血中的白细胞进行分类和计数。白细胞总数高于参考值上限称白细胞增多,低于参考值下限为白细胞减少。其增多和减少主要受中性粒细胞数量的影响,淋巴细胞等数量的改变也会引起白细胞总数的变化。从生理性变化到恶性肿瘤都有可能引起白细胞总数异常,医生可结合血常规检测结果进行临床诊断。然而,血常规检测只能大致判断细胞免疫整体水平的状况,无法分辨针对具体疾病的免疫,也无法在基因水平判断免疫细胞的分类和多样性。2) Blood routine, the number of leukocytes in the peripheral blood is analyzed by the method of cell counting, and the increase of the number of leukocytes indicates that there is an inflammatory reaction in the body. That is, the leukocytes in peripheral blood are classified and counted by microscope observation. The total number of leukocytes above the upper limit of the reference value is called leukocytosis, and the lower limit of the reference value is leukopenia. Its increase and decrease are mainly affected by the number of neutrophils, and changes in the number of lymphocytes can also cause changes in the total number of white blood cells. From physiological changes to malignant tumors, the total number of white blood cells may be abnormal, and doctors can make clinical diagnosis based on the results of routine blood tests. However, blood routine testing can only roughly judge the overall level of cellular immunity, and cannot distinguish immunity against specific diseases, nor can it judge the classification and diversity of immune cells at the gene level.
淋巴细胞亚群分析,利用流式细胞分析以及PCR技术分析外周血中白细胞各个亚群的数目和相对比例。通过流式细胞分析或PCR技术,对外周血中免疫细胞的相对计数、绝对计数及其变化进行监控,分析疾病状态下的免疫状况(如肿瘤、感染性疾病、免疫性疾病等),以此辅助诊断、追踪病情发展及决定用药时机。最常检测的亚群包括T细胞(CD3)、B细胞(CD19)、NK细胞(CD16+56)、辅助性T细胞(CD3+CD4+)和抑制性T细胞(CD3+CD8+)等。然而,淋巴细胞亚群种类繁多,如进行全面分析,则需要采集的外周血量、费用及时间均难以接受。只进行少数几种淋巴细胞亚群分析,则难以获取全面的免疫系统状况。并且淋巴细胞亚群在不同年龄阶段有不同的正常参考范围,并且其结果受多种因素的影响,造成临床判读相对困难。Lymphocyte subset analysis, using flow cytometry and PCR technology to analyze the number and relative proportion of each subset of leukocytes in peripheral blood. By flow cytometry or PCR technology, the relative and absolute counts of immune cells in peripheral blood and their changes are monitored, and the immune status in disease states (such as tumors, infectious diseases, immune diseases, etc.) Assisting in diagnosis, tracking disease progression and deciding on medication timing. The most commonly detected subsets include T cells (CD3), B cells (CD19), NK cells (CD16+56), helper T cells (CD3+CD4+), and suppressor T cells (CD3+CD8+). However, there are many types of lymphocyte subsets, and if a comprehensive analysis is carried out, the amount of peripheral blood that needs to be collected, the cost and the time are all unacceptable. It is difficult to obtain a comprehensive immune system status by analyzing only a few lymphocyte subsets. In addition, lymphocyte subsets have different normal reference ranges at different ages, and the results are affected by many factors, making clinical interpretation relatively difficult.
发明内容SUMMARY OF THE INVENTION
本发明旨在至少在一定程度上解决相关技术中的技术问题之一。为此,本发明的一个目的旨在通过免疫组库测序方法从分子序列层面上对个体适应性免疫系统进行高灵敏度检测,通过免疫球蛋白基因和TCR基因的多种指标(如多样性、均一性等)的综合分析对个体免疫力评估,通过免疫年龄(Immune Age(IA))评估个体机体的健康状况,实现早期健康风险预测。The present invention aims to solve one of the technical problems in the related art at least to a certain extent. To this end, an object of the present invention is to carry out high-sensitivity detection of the adaptive immune system of an individual at the molecular sequence level by means of the immune repertoire sequencing method. (Immune Age (IA)) to assess the health status of the individual body to achieve early health risk prediction.
在本发明的第一方面,本发明提出了一种确定个体免疫力指数的方法,根据本发明的实施例,该方法包括:(1)获取待测个体的核酸测序数据;(2)通过将所述测序结果与参考序列比对,确定所述核酸样本中所包含V/J序列以及CDR序列;(3)基于所述核酸样本中所包含V/J序列以及CDR序列,确定统计特征,所述统计特征包括选自下列的至少之一:V/J基因使用多样性指数、免疫细胞多样性指数、免疫细胞种类数目、免疫细胞均一性指数;(4)基于所述统计特征,确定所述个体的免疫年龄数值;和(5)基于所述免疫年龄数值,确定所述个体的免疫力指数。In the first aspect of the present invention, the present invention proposes a method for determining an individual immunity index. According to an embodiment of the present invention, the method includes: (1) acquiring nucleic acid sequencing data of the individual to be tested; (2) by The sequencing result is compared with the reference sequence, and the V/J sequence and the CDR sequence contained in the nucleic acid sample are determined; (3) based on the V/J sequence and the CDR sequence contained in the nucleic acid sample, the statistical characteristics are determined. The statistical characteristics include at least one selected from the following: V/J gene usage diversity index, immune cell diversity index, number of immune cell types, immune cell homogeneity index; (4) based on the statistical characteristics, determine the an immune age value of an individual; and (5) determining an immunity index of the individual based on the immune age value.
根据本发明的实施例,通过测序可以采用少量的样本即可实施本发明的方法,以从分子层面上实现对个体适应性免疫系统进行高灵敏度检测,而且可以实现无创的早期诊断、疗效评估、病情追踪、复发预测以及免疫力综合评估。例如根据本发明的实施例,可以采 用PCR技术扩增外周血中淋巴细胞含有的基因,所需血液样本少,样本后续处理简便,不需要进行不准确的人孔血细胞观察技术,也不需要操作复杂的免疫标记和流式分析。对于骨髓瘤检验,因为只需要采取外周血,不需要实施骨髓穿刺,可以减少对病人身体的损伤,具有积极的意义。总之,根据本发明的实施例,免疫组库测序进行免疫评估不仅可以提升检测的灵敏度,而且可以实现早期诊断,评估疗效,追踪病情,预测复发以及免疫力的综合评估等功能。According to the embodiments of the present invention, the method of the present invention can be implemented by using a small amount of samples by sequencing, so as to realize the high-sensitivity detection of the individual adaptive immune system at the molecular level, and realize non-invasive early diagnosis, curative effect evaluation, Condition tracking, relapse prediction and comprehensive immune assessment. For example, according to the embodiments of the present invention, the PCR technology can be used to amplify the genes contained in lymphocytes in peripheral blood, which requires less blood samples, and the subsequent processing of the samples is simple, and no inaccurate human well blood cell observation technology is required, and no operation is required. Sophisticated immunolabeling and flow analysis. For myeloma test, because only peripheral blood needs to be taken, no bone marrow puncture is required, which can reduce the damage to the patient's body, which has positive significance. In conclusion, according to the embodiments of the present invention, immune evaluation by immune repertoire sequencing can not only improve the sensitivity of detection, but also realize functions such as early diagnosis, evaluation of curative effect, tracking of illness, prediction of recurrence, and comprehensive evaluation of immunity.
在本发明的第二方面,本发明提出了一种确定个体免疫力指数的设备,根据本发明的实施例,该设备包括:测序数据获取单元,用于获取待测个体的核酸测序数据;测序结果分析单元,用于通过将所述测序结果与参考序列比对,确定所述核酸样本中所包含V/J序列以及CDR序列;统计单元,用于基于所述核酸样本中所包含V/J序列以及CDR序列,确定统计特征,所述统计特征包括选自下列的至少之一:V/J基因使用多样性指数、免疫细胞多样性指数、免疫细胞种类数目、免疫细胞均一性指数;免疫年龄确定单元,用于基于所述统计特征,确定所述个体的免疫年龄数值;和免疫力指数确定单元,用于基于所述免疫年龄数值,确定所述个体的免疫力指数。In a second aspect of the present invention, the present invention provides a device for determining an individual immunity index. According to an embodiment of the present invention, the device includes: a sequencing data acquisition unit for acquiring nucleic acid sequencing data of an individual to be tested; sequencing A result analysis unit for determining the V/J sequence and CDR sequence contained in the nucleic acid sample by comparing the sequencing result with a reference sequence; a statistical unit for determining the V/J sequence contained in the nucleic acid sample based on the Sequence and CDR sequence, determine statistical characteristics, and the statistical characteristics include at least one selected from the following: V/J gene usage diversity index, immune cell diversity index, number of immune cell types, immune cell homogeneity index; immune age a determining unit for determining an immune age value of the individual based on the statistical feature; and an immunity index determining unit for determining an immune index for the individual based on the immune age value.
采用本发明的实施例的该设备,可以有效地实施前面所描述的确定个体免疫力的方法。由此,前面所描述的特征和优点同样适用于该设备,在此不再赘述。Using the apparatus of an embodiment of the present invention, the previously described method of determining immunity of an individual can be effectively implemented. Thus, the features and advantages described above are also applicable to the device and will not be repeated here.
在本发明的第三方面,本发明提出了一种电子设备,根据本发明的实施例,包括处理器和存储器,所述存储器存储有能够被所述处理器执行的机器可执行指令,所述处理器执行所述机器可执行指令以实现前面所述的确定个体免疫力指数的方法。In a third aspect of the present invention, the present invention provides an electronic device, according to an embodiment of the present invention, comprising a processor and a memory, the memory storing machine-executable instructions executable by the processor, the The processor executes the machine-executable instructions to implement the aforementioned method of determining an immunity index of an individual.
在本发明的第四方面,本发明提出了一种机器可读存储介质,根据本发明的实施例,该机器可读存储介质存储有机器可执行指令,该机器可执行指令在被处理器调用和执行时,机器可执行指令促使处理器实现前面任一项所述的确定个体免疫力指数的方法。In a fourth aspect of the present invention, the present invention provides a machine-readable storage medium. According to an embodiment of the present invention, the machine-readable storage medium stores machine-executable instructions, and the machine-executable instructions are called by a processor when the and, when executed, the machine-executable instructions cause a processor to implement the method of determining an individual's immunity index as described in any preceding item.
图1是根据本发明一个实施例的确定个体免疫力指数的方法的流程示意图;1 is a schematic flowchart of a method for determining an individual immunity index according to an embodiment of the present invention;
图2是根据本发明一个实施例的确定个体免疫力指数的方法的部分流程示意图;FIG. 2 is a partial schematic flowchart of a method for determining an individual immunity index according to an embodiment of the present invention;
图3是根据本发明一个实施例的确定个体免疫力指数的设备的结构示意图;3 is a schematic structural diagram of a device for determining an individual immunity index according to an embodiment of the present invention;
图4是根据本发明一个实施例的确定个体免疫力指数的设备的部分结构示意图;Fig. 4 is a partial structural schematic diagram of a device for determining an individual immunity index according to an embodiment of the present invention;
图5是本发明实施例2中的不同年龄段人群的免疫力指数的预测结果;Fig. 5 is the prediction result of the immunity index of different age groups in the
图6是本发明实施例2中的免疫力指数与个体年龄关系的分布图。FIG. 6 is a distribution diagram of the relationship between the immunity index and individual age in Example 2 of the present invention.
下面详细描述本发明的实施例。下面描述的实施例是示例性的,仅用于解释本发明,而不能理解为对本发明的限制。实施例中未注明具体技术或条件的,按照本领域内的文献所描述的技术或条件或者按照产品说明书进行。所用试剂或仪器未注明生产厂商者,均为可以通过市购获得的常规产品。Embodiments of the present invention are described in detail below. The embodiments described below are exemplary, only for explaining the present invention, and should not be construed as limiting the present invention. If no specific technique or condition is indicated in the examples, the technique or condition described in the literature in the field or the product specification is used. The reagents or instruments used without the manufacturer's indication are conventional products that can be obtained from the market.
在本发明的第一方面,本发明提出了一种确定个体免疫力指数的方法。参考图1,根据本发明的实施例,该方法包括:In the first aspect of the present invention, the present invention proposes a method for determining the immunity index of an individual. 1, according to an embodiment of the present invention, the method includes:
S100获取核酸测序数据S100 obtains nucleic acid sequencing data
根据本发明的实施例,在该步骤中,首先获取来自待测个体的核酸测序数据,以便用于后续的分析。本领域技术人员能够理解的是,这些核酸测序数据可以含有免疫细胞的遗传信息,例如根据本发明的实施例,可以采用来自含有免疫细胞的血液样本或者含有免疫细胞的组织样本(这里所述的组织样本应做广义理解,可以包括器官的至少一部分),例如肠道、呼吸道、泌尿生殖道等黏膜下所含有的非包膜化的弥散性淋巴组织和淋巴小结等。According to an embodiment of the present invention, in this step, nucleic acid sequencing data from the individual to be tested is first acquired for subsequent analysis. Those skilled in the art can understand that these nucleic acid sequencing data may contain the genetic information of immune cells, for example, according to embodiments of the present invention, blood samples containing immune cells or tissue samples containing immune cells (described herein) may be used. Tissue samples should be understood in a broad sense and can include at least a part of organs), such as non-encapsulated diffuse lymphoid tissue and lymph nodes contained in the submucosal mucosa of the intestinal tract, respiratory tract, urogenital tract, etc.
根据本发明的实施例,核酸测序数据可以通过高通量测序获得。例如二代或者三代测序平台,包括但不限于MGISEQ-T7、MGISEQ-2000、MGISEQ-200、BGISEQ-500、BGISEQ-50、MGISP-960、MGISP-100等高通量测序平台。According to embodiments of the present invention, nucleic acid sequencing data can be obtained by high-throughput sequencing. For example, second- or third-generation sequencing platforms, including but not limited to high-throughput sequencing platforms such as MGISEQ-T7, MGISEQ-2000, MGISEQ-200, BGISEQ-500, BGISEQ-50, MGISP-960, and MGISP-100.
本领域技术人员可以在获取核酸后,可以按照测序平台的操作手册进行测序,以便获得核酸测序数据。例如,简言之,根据本发明的一个实施例,测序过程包括:After obtaining the nucleic acid, those skilled in the art can perform sequencing according to the operation manual of the sequencing platform, so as to obtain nucleic acid sequencing data. For example, briefly, according to one embodiment of the present invention, the sequencing process includes:
对于血液或者组织样本,提取DNA或者RNA,对于每个样本,取DNA或RNA的起始量,加入某条链的引物(TCR或者BCR),进行多重PCR扩增,PCR总共进行两轮,第一轮是VJ特异性引物(带部分测序接头)PCR反应,第二轮是测序接头进行普通的PCR建库。之后,多个样本汇总在一起进行测序,从而得到每个样本的数据。根据本发明的实施例中,在第二轮PCR中还可以引入标签序列,从而实现对样本批次的区分。For blood or tissue samples, extract DNA or RNA. For each sample, take the starting amount of DNA or RNA, add primers (TCR or BCR) for a certain chain, and perform multiple PCR amplification. PCR is carried out for a total of two rounds. One round was PCR reaction with VJ-specific primers (with partial sequencing adapters), and the second round was sequencing adapters for ordinary PCR library construction. Afterwards, multiple samples are pooled together for sequencing, resulting in data for each sample. According to the embodiment of the present invention, a tag sequence may also be introduced in the second round of PCR, thereby realizing the distinction of sample batches.
参考图2,根据本发明的具体实施例,获取核酸测序数据可以进一步包括:2, according to a specific embodiment of the present invention, acquiring nucleic acid sequencing data may further include:
S110获取核酸样本S110 Obtain nucleic acid samples
在该步骤中,获取待测个体的核酸样本,核酸样本包括DNA分子和RNA分子的至少之一。本领域技术人员可以采用商购的试剂盒并按照制造商所提供的说明书进行DNA分子或RNA分子的提取。本领域技术人员能够理解的是,在获取RNA分子后,可以容易地采用逆转录处理,获得cDNA分子。In this step, a nucleic acid sample of the individual to be tested is obtained, and the nucleic acid sample includes at least one of DNA molecules and RNA molecules. Those skilled in the art can use commercially available kits and follow the manufacturer's instructions for extraction of DNA molecules or RNA molecules. It can be understood by those skilled in the art that, after obtaining RNA molecules, reverse transcription can be easily used to obtain cDNA molecules.
S120第一扩增处理S120 First Amplification Process
在获得核酸样本后,可以采用VJ特异性引物进行第一扩增处理,以便获得第一扩增产物。After the nucleic acid sample is obtained, VJ-specific primers can be used to perform a first amplification process, so as to obtain a first amplification product.
需要说明的是,可以通过VJ特异性引物对步骤S110中得到的核酸样本中所包含的免疫细胞特有序列即V基因和J基因进行扩增。It should be noted that the V gene and the J gene, the immune cell-specific sequences contained in the nucleic acid sample obtained in step S110, may be amplified by VJ-specific primers.
本文中,VJ特异性引物是指可以扩增V基因和J基因的特异性引物,对于V基因和J基因,值得注意的是,对大多数基因座而言,它们根据其同源程度以家族形式聚集在一起。这些VJ特异性引物可以用于分析至少一个基因座位上V-J重排的组合多样性,基因座位选自座位TRA、TRB、TRG、TRD、IgH、IgK、IgL等。Herein, VJ-specific primers refer to specific primers that can amplify V and J genes. For V and J genes, it is worth noting that for most loci, they are classified as families according to their degree of homology. Forms come together. These VJ-specific primers can be used to analyze the combinatorial diversity of V-J rearrangements at at least one locus selected from loci TRA, TRB, TRG, TRD, IgH, IgK, IgL, and the like.
根据本发明的实施例,本发明所采用VJ特异性引物具有下列核苷酸序列:According to an embodiment of the present invention, the VJ-specific primer used in the present invention has the following nucleotide sequence:
另外,根据本发明的实施例,VJ特异性引物含有测序接头的一部分序列。由此,方便后续通过第二扩增处理,在扩增产物中引入测序接头。In addition, according to an embodiment of the present invention, the VJ-specific primer contains a portion of the sequence of the sequencing adapter. Therefore, it is convenient to introduce sequencing adapters into the amplification products through the second amplification process.
S130第二扩增处理S130 Second Amplification Treatment
对第一扩增产物进行第二扩增处理,以便获得第二扩增产物,其中,第二扩增产物携带测序接头。A second amplification process is performed on the first amplification product to obtain a second amplification product, wherein the second amplification product carries a sequencing adapter.
通过采用第一扩增产物中的共同序列,可以进行第二扩增处理,并且采用的引物可以设置为适于引入测序接头。由此,所得到的第二扩增产物构成了可以用于测序的测序文库。The second amplification process can be performed by using the common sequence in the first amplification product, and the primers used can be set to be suitable for introduction into sequencing adapters. Thus, the obtained second amplification product constitutes a sequencing library that can be used for sequencing.
当然本领域技术人员能够理解的是,为了提高测序效率或者方便分析,还可以对第二扩增产物进行其他常规的处理,例如杂交探针筛选等处理。在此不再赘述。Of course, those skilled in the art can understand that, in order to improve sequencing efficiency or facilitate analysis, other conventional processing, such as hybridization probe screening, may also be performed on the second amplification product. It is not repeated here.
S140测序S140 sequencing
对第二扩增产物进行测序,以便获得测序结果。The second amplification product is sequenced to obtain sequencing results.
根据本发明的实施例,在构建测序文库之后,可以对测序文库(第二扩增产物)利用测序平台进行测序。根据本发明的实施例,核酸测序数据可以通过高通量测序获得。例如二代或者三代测序平台,包括但不限于MGISEQ-T7、MGISEQ-2000、MGISEQ-200、BGISEQ-500、BGISEQ-50、MGISP-960、MGISP-100等高通量测序平台。优选采用双末端测序。可以提高后续分析效率。According to an embodiment of the present invention, after the sequencing library is constructed, the sequencing library (second amplification product) can be sequenced using a sequencing platform. According to embodiments of the present invention, nucleic acid sequencing data can be obtained by high-throughput sequencing. For example, second- or third-generation sequencing platforms, including but not limited to high-throughput sequencing platforms such as MGISEQ-T7, MGISEQ-2000, MGISEQ-200, BGISEQ-500, BGISEQ-50, MGISP-960, and MGISP-100. Paired-end sequencing is preferably used. It can improve the efficiency of subsequent analysis.
S200序列比对确定V/J序列和CDR序列S200 sequence alignment to determine V/J sequences and CDR sequences
在获得测序数据后,根据本发明的实施例,通过将测序结果与参考序列比对,确定核酸样本中所包含V/J序列以及CDR序列。After the sequencing data is obtained, according to an embodiment of the present invention, the V/J sequence and the CDR sequence contained in the nucleic acid sample are determined by aligning the sequencing result with the reference sequence.
根据本发明的实施例,在进行比对之前,可以采用例如SOAPnuke(v1.5.3)等软件对原始测序数据进行接头污染序列、低质量碱基和序列的过滤。According to an embodiment of the present invention, before performing the alignment, software such as SOAPnuke (v1.5.3) can be used to filter the linker contaminating sequences, low-quality bases and sequences on the raw sequencing data.
用自主开发程序把FASTQ文件转换为FASTA文件,以便进行序列拼接;最后,如果测序模式是双末端测序,则采用COPE(v1.5.3)和自主开发的程序对序列进行拼接。接下来, 可以采用blastall(v2.2.25)对预处理之后的FASTA序列比对到V(D)J参考基因序列上,接下来采用自主开发的程序进行重比对并选择最佳的比对结果,即:对non-CDR3、CDR3区域用不同方法统计分数,选取得分最高的best hit,通过与CDR、V、J参考序列进行比对来确定测序序列的归属,以便确定CDR序列和VJ序列的。The FASTQ file was converted into a FASTA file with a self-developed program for sequence splicing; finally, if the sequencing mode was paired-end sequencing, COPE (v1.5.3) and the self-developed program were used to assemble the sequences. Next, blastall (v2.2.25) can be used to align the preprocessed FASTA sequence to the V(D)J reference gene sequence, and then the self-developed program is used to perform re-alignment and select the best alignment result , that is: use different methods to count the scores of the non-CDR3 and CDR3 regions, select the best hit with the highest score, and determine the attribution of the sequenced sequence by aligning with the CDR, V, and J reference sequences, so as to determine the CDR sequence and VJ sequence. of.
在得到V基因和J基因的序列后,对免疫分子的结构进行分析此部分主要包含两个功能:错误矫正和区域确定。首先,采用自主开发程序对PCR和测序环节的引入的错误进行矫正,其次利用V/J基因参考序列与保守氨基酸的规律与建立的计算方法确定CDR区域。After obtaining the sequences of V and J genes, the structure of immune molecules is analyzed. This part mainly includes two functions: error correction and region determination. First, the errors introduced in PCR and sequencing were corrected by self-developed programs, and then the CDR regions were determined using the rules of V/J gene reference sequences and conserved amino acids and the established computational methods.
根据本发明的实施例,可以通过常用的方法确定CDR序列。根据本发明的实施例,CDR序列为CDR1、CDR2和CDR3序列的至少之一,优选CDR3序列。因为CDR3变异最大,直接决定了TCR的抗原结合特异性。TCR的CDR3由V、D、J三个基因编码,在淋巴细胞的成熟过程中,通过V、D、J基因的重排形成了各种重组序列片段,再加上DNA碱基的SNP、Indel突变形成了T细胞的多样性。According to the embodiments of the present invention, the CDR sequence can be determined by a common method. According to an embodiment of the present invention, the CDR sequence is at least one of CDR1, CDR2 and CDR3 sequences, preferably a CDR3 sequence. Because CDR3 has the greatest variation, it directly determines the antigen-binding specificity of TCR. The CDR3 of TCR is encoded by three genes V, D, and J. During the maturation of lymphocytes, various recombinant sequence fragments are formed through the rearrangement of V, D, and J genes, plus DNA base SNP, Indel Mutations create a diversity of T cells.
在本文中所使用的术语“V/J”是指针对特定细胞,其所具有的V(D)J重排的结果的至少一部分,其可以是V基因序列,J基因序列,也可以是V基因序列与J基因序列的组合,还有可能在V基因序列和J基因序列中夹着D基因序列。The term "V/J" as used herein refers to at least a portion of the result of a V(D)J rearrangement for a particular cell, which may be a V gene sequence, a J gene sequence, or a V gene sequence. The combination of the gene sequence and the J gene sequence may also sandwich the D gene sequence between the V gene sequence and the J gene sequence.
S300确定统计特征S300 Determine statistical characteristics
基于核酸样本中所包含V/J序列以及CDR序列,确定统计特征,统计特征包括选自下列的至少之一:V/J基因使用多样性指数、免疫细胞多样性指数、免疫细胞种类数目、免疫细胞均一性指数。Statistical features are determined based on the V/J sequences and CDR sequences contained in the nucleic acid sample, and the statistical features include at least one selected from the following: V/J gene usage diversity index, immune cell diversity index, number of immune cell types, immune Cell Homogeneity Index.
根据本发明的实施例,V/J基因使用多样性指数和免疫细胞多样性指数的至少之一为香农指数。根据本发明的实施例,免疫细胞的种类是基于CDR3序列确定的。According to an embodiment of the present invention, at least one of the V/J gene usage diversity index and the immune cell diversity index is a Shannon index. According to an embodiment of the present invention, the type of immune cells is determined based on the CDR3 sequence.
根据本发明的实施例,免疫细胞均一性指数为基尼指数。According to an embodiment of the present invention, the immune cell homogeneity index is the Gini index.
根据本发明的实施例,对免疫组库特征数据进行统计,统计特征主要包括以下几个:According to an embodiment of the present invention, the immune repertoire feature data is counted, and the statistical features mainly include the following:
V/J基因使用多样性,即Shannon_index(V-J);V/J gene usage diversity, i.e. Shannon_index(V-J);
免疫多样性,即Shannon_index(CDR3_aa);Immune diversity, i.e. Shannon_index(CDR3_aa);
免疫细胞种类,即Uniq_number(CDR3_aa);Immune cell type, i.e. Uniq_number (CDR3_aa);
免疫细胞均一性,即Clone_Gini。Immune cell homogeneity, i.e. Clone_Gini.
以上指标中,Shannon_index表示Shannon指数,计算公式如下:Among the above indicators, Shannon_index represents the Shannon index, and the calculation formula is as follows:
其中,如果以CDR3为例,S表示唯一CDR3的总数,p(i)表示CDR3的频率。Among them, if CDR3 is taken as an example, S represents the total number of unique CDR3s, and p(i) represents the frequency of CDR3s.
Uniq_number表示唯一序列数。Uniq_number represents the unique sequence number.
Clone_Gini表示Gini指数,计算公式如下:Clone_Gini represents the Gini index, and the calculation formula is as follows:
其中,x指每一种免疫细胞类型出现的频率,n指免疫细胞种类数。Among them, x refers to the frequency of each immune cell type, and n refers to the number of immune cell types.
S400确定免疫年龄数值S400 determines the immune age value
在该步骤中,基于统计特征,确定个体的免疫年龄数值。In this step, based on the statistical characteristics, the immune age value of the individual is determined.
根据本发明的实施例,基于至少一个统计特征,利用最大后验概率估计,确定免疫年龄数值。According to an embodiment of the present invention, the immune age value is determined based on at least one statistical feature using a maximum a posteriori probability estimate.
根据本发明的实施例,在步骤S400中,进一步包括:(4-1)利用预先确定的免疫年龄预测系数分布(主要依据选取特征的特性确定参数先验分布,如果选取的特征是连续,在大数据量的情况下,一般认为是正态分布),基于统计特征的每一个,分别确定各统计特征所对应的免疫年龄预测系数;和(4-2)按照公式 确定个体的免疫年龄,其中,IA表示个体的免疫年龄,i表示统计特征的编号,n表示统计特征的数目,θi表示第i个统计特征所对应的免疫年龄预测系数,xi表示第i个统计特征的数值,θ0表示预测模型中的偏置项。 According to an embodiment of the present invention, in step S400, it further includes: (4-1) using a predetermined immune age prediction coefficient distribution (mainly according to the characteristics of the selected feature to determine the parameter prior distribution, if the selected feature is continuous, in In the case of a large amount of data, it is generally considered to be a normal distribution), based on each statistical feature, determine the immune age prediction coefficient corresponding to each statistical feature; and (4-2) According to the formula Determine the immune age of the individual, where IA represents the immune age of the individual, i represents the number of statistical features, n represents the number of statistical features, θi represents the immune age prediction coefficient corresponding to the ith statistical feature, and xi represents the ith statistical feature The numerical value of the feature, θ0 represents the bias term in the prediction model.
为了方便理解,下面对最大后验概率估计的原理进行解释如下:For the convenience of understanding, the principle of maximum posterior probability estimation is explained as follows:
根据本发明的实施例,基于以上特征指数,结合生化指标采用MAP(maximum a posteriori probability estimate,最大后验概率估计)模型进行IA计算,从而进行综合性免疫力评估和机体风险预测,具体原理如下:According to an embodiment of the present invention, based on the above characteristic indices, combined with biochemical indices, the MAP (maximum a posteriori probability estimate, maximum a posteriori probability estimation) model is used to perform IA calculation, so as to perform comprehensive immunity assessment and body risk prediction. The specific principles are as follows :
MAP的理论依据源于贝叶斯模型,贝叶斯公式如下:The theoretical basis of MAP is derived from the Bayesian model, and the Bayesian formula is as follows:
由全概率公式将B时间展开得到如下公式:The following formula is obtained by expanding the B time by the full probability formula:
其中,~A表示“非A”,Among them, ~A means "not A",
生化指标主要包括常规的指标,如大生化、血常规等。Biochemical indicators mainly include conventional indicators, such as macrobiochemical, blood routine and so on.
MAP的原理具体如下:The principle of MAP is as follows:
最大后验概率假设在给定观测指标x下,预测参数θ的取值,假设f为x的抽样分布,则f(x|θ)为在给定参数θ时观测值为x的概率。假设g为参数θ的先验分布(可由训练数据得到),则根据贝叶斯公式,有:The maximum posterior probability assumes that under a given observation index x, the value of the prediction parameter θ is assumed, and if f is the sampling distribution of x, then f(x|θ) is the probability that the observed value is x when the parameter θ is given. Assuming that g is the prior distribution of the parameter θ (which can be obtained from the training data), then according to the Bayesian formula, there are:
其中,训练数据主要依据选取特征的特性确定参数先验分布,如果选取的特征是连续,在大数据量的情况下,一般认为是正态分布,如果是离散的,直接按照下面的公式加权累乘即可。选取的训练集成员主要包括免疫组库分析得到的一些指标(V/J基因使用多样性,免疫多样性,免疫细胞种类,免疫细胞均一性)以及一些生化指标(大生化,血常规等)。Among them, the training data is mainly based on the characteristics of the selected features to determine the prior distribution of the parameters. If the selected features are continuous, in the case of a large amount of data, it is generally considered to be a normal distribution. If it is discrete, it is directly weighted according to the formula below. Just multiply. The selected members of the training set mainly include some indicators (V/J gene usage diversity, immune diversity, immune cell type, immune cell homogeneity) obtained from immune repertoire analysis and some biochemical indicators (large biochemical, blood routine, etc.).
其中, 为θ的参数空间,由于参数空间 是连续的,因此分母以积分的形式计算,则: in, is the parameter space of θ, since the parameter space is continuous, so the denominator is calculated as an integral, then:
其中 为使函数f(x|θ)g(θ)取最大值的参数,即预测Immune Age(IA)的系数。若观测值为n维的(即x=(x 1,x 2,…,x n)),则 in The parameter to maximize the function f(x|θ)g(θ), that is, the coefficient of predicting Immune Age (IA). If the observed value is n-dimensional (ie x=(x 1 ,x 2 ,...,x n )), then
IA的预测公式如下:The prediction formula of IA is as follows:
S500确定免疫力指数S500 Determines the Immunity Index
在该步骤中,基于免疫年龄数值,确定个体的免疫力指数。In this step, the immunity index of the individual is determined based on the immune age value.
根据本发明的实施例,免疫力指数是通过下列公式确定的:According to an embodiment of the present invention, the immunity index is determined by the following formula:
其中,IA表示在步骤S400中确定的免疫年龄数值,IAmax表示预先确定的群体中的IA上限,IAmin表示预先确定的群体中的IA下限。Wherein, IA represents the immune age value determined in step S400, IAmax represents the upper limit of IA in the predetermined group, and IAmin represents the lower limit of IA in the predetermined group.
在确定个体的免疫力指数后,该技术方案可以实现从分子层面上实现对个体适应性免疫系统进行高灵敏度检测,而且可以实现无创的早期诊断、疗效评估、病情追踪、复发预测以及免疫力综合评估。After determining the immune index of the individual, the technical solution can realize the high-sensitivity detection of the individual adaptive immune system at the molecular level, and can realize non-invasive early diagnosis, curative effect evaluation, disease tracking, recurrence prediction and comprehensive immunity. Evaluate.
根据本发明的实施例,通过测序可以采用少量的样本即可实施本发明的方法,以从分子层面上实现对个体适应性免疫系统进行高灵敏度检测,而且可以实现无创的早期诊断、疗效评估、病情追踪、复发预测以及免疫力综合评估。例如根据本发明的实施例,可以采用PCR技术扩增外周血中淋巴细胞含有的基因,所需血液样本少,样本后续处理简便,不需要进行不准确的人孔血细胞观察技术,也不需要操作复杂的免疫标记和流式分析。对于骨髓瘤检验,因为只需要采取外周血,不需要实施骨髓穿刺,可以减少对病人身体的损伤,具有积极的意义。总之,根据本发明的实施例,免疫组库测序进行免疫评估不仅可以提升 检测的灵敏度,而且可以实现早期诊断,评估疗效,追踪病情,预测复发以及免疫力的综合评估等功能。According to the embodiments of the present invention, the method of the present invention can be implemented by using a small amount of samples by sequencing, so as to realize the high-sensitivity detection of the individual adaptive immune system at the molecular level, and realize non-invasive early diagnosis, curative effect evaluation, Condition tracking, relapse prediction and comprehensive immune assessment. For example, according to the embodiments of the present invention, the PCR technology can be used to amplify the genes contained in lymphocytes in peripheral blood, which requires less blood samples, and the subsequent processing of the samples is simple, and no inaccurate human well blood cell observation technology is required, and no operation is required. Sophisticated immunolabeling and flow analysis. For myeloma test, because only peripheral blood needs to be taken, no bone marrow puncture is required, which can reduce the damage to the patient's body, which has positive significance. In a word, according to the embodiment of the present invention, immune evaluation by immune repertoire sequencing can not only improve the sensitivity of detection, but also realize functions such as early diagnosis, evaluation of curative effect, tracking of disease condition, prediction of recurrence, and comprehensive evaluation of immunity.
在本发明的第二方面,本发明提出了一种确定个体免疫力指数的设备,根据本发明的实施例,参考图3,该设备包括:In a second aspect of the present invention, the present invention provides a device for determining an individual immunity index. According to an embodiment of the present invention, referring to FIG. 3 , the device includes:
测序数据获取单元100、测序结果分析单元200、统计单元300、免疫年龄确定单元400和免疫力指数确定单元500。其中,测序数据获取单元100,用于获取待测个体的核酸测序数据;测序结果分析单元200,用于通过将测序结果与参考序列比对,确定核酸样本中所包含V/J序列以及CDR序列;统计单元300,用于基于核酸样本中所包含V/J序列以及CDR3序列,确定统计特征,统计特征包括选自下列的至少之一:V/J基因使用多样性指数、免疫细胞多样性指数、免疫细胞种类数目、免疫细胞均一性指数;免疫年龄确定单元400,用于基于统计特征,确定个体的免疫年龄数值;免疫力指数确定单元500,用于基于免疫年龄数值,确定个体的免疫力指数。The sequencing
采用本发明的实施例的该设备,可以有效地实施前面所描述的确定个体免疫力的方法。由此,前面所描述的特征和优点同样适用于该设备,在此不再赘述。Using the apparatus of an embodiment of the present invention, the previously described method of determining immunity of an individual can be effectively implemented. Thus, the features and advantages described above are also applicable to the device and will not be repeated here.
根据本发明的实施例,参考图4,测序数据获取单元进一步包括:核酸样本获取模块110、第一扩增模块120和第二扩增模块130、测序模块140。其中,根据本发明的实施例,核酸样本获取模块110,用于获取待测个体的核酸样本,核酸样本包括DNA分子和RNA分子的至少之一;第一扩增模块120,用于采用VJ特异性引物进行第一扩增处理,以便获得第一扩增产物;第二扩增模块130,用于对第一扩增产物进行第二扩增处理,以便获得第二扩增产物,其中,第二扩增产物携带测序接头;测序模块140,用于对第二扩增产物进行测序,以便获得测序结果;核酸样本是从个体的血液或者组织样本中获得的。According to an embodiment of the present invention, referring to FIG. 4 , the sequencing data acquisition unit further includes: a nucleic acid
根据本发明的实施例,VJ特异性引物含有测序接头的一部分序列。According to an embodiment of the present invention, the VJ-specific primer contains a portion of the sequence of the sequencing adapter.
根据本发明的实施例,CDR序列为CDR1、CDR2和CDR3序列的至少之一,优选CDR3序列。According to an embodiment of the present invention, the CDR sequence is at least one of CDR1, CDR2 and CDR3 sequences, preferably a CDR3 sequence.
根据本发明的实施例,V/J基因使用多样性指数和免疫细胞多样性指数的至少之一为香农指数。According to an embodiment of the present invention, at least one of the V/J gene usage diversity index and the immune cell diversity index is a Shannon index.
根据本发明的实施例,免疫细胞的种类是基于CDR3序列确定的。According to an embodiment of the present invention, the type of immune cells is determined based on the CDR3 sequence.
根据本发明的实施例,免疫细胞均一性指数为基尼指数。According to an embodiment of the present invention, the immune cell homogeneity index is the Gini index.
根据本发明的实施例,免疫年龄确定单元适于基于至少一个统计特征,利用最大后验概率估计,确定免疫年龄数值。According to an embodiment of the invention, the immune age determination unit is adapted to determine the immune age value based on the at least one statistical feature using a maximum a posteriori probability estimate.
根据本发明的实施例,免疫年龄确定单元用于:利用预先确定的免疫年龄预测系数分 布,基于统计特征的每一个,分别确定各统计特征所对应的免疫年龄预测系数;和按照公式 确定个体的免疫年龄,其中,IA表示个体的免疫年龄,i表示统计特征的编号,n表示统计特征的数目,θi表示第i个统计特征所对应的免疫年龄预测系数,xi表示第i个统计特征的数值,θ0表示预先预测模型中的偏置项。 According to an embodiment of the present invention, the immune age determination unit is configured to: using a predetermined distribution of immune age prediction coefficients, based on each of the statistical features, respectively determine the immune age prediction coefficient corresponding to each statistical feature; and according to the formula Determine the immune age of the individual, where IA represents the immune age of the individual, i represents the number of statistical features, n represents the number of statistical features, θi represents the immune age prediction coefficient corresponding to the ith statistical feature, and xi represents the ith statistical feature The numerical value of the feature, θ0 represents the bias term in the pre-prediction model.
根据本发明的实施例,免疫力指数是通过下列公式确定的:According to an embodiment of the present invention, the immunity index is determined by the following formula:
其中,IA表示在免疫年龄确定单元中确定的免疫年龄数值,IAmax表示预先确定的群体中的IA上限,IAmin表示预先确定的群体中的IA下限。Wherein, IA represents the immune age value determined in the immune age determination unit, IAmax represents the upper limit of IA in the predetermined population, and IAmin represents the lower limit of IA in the predetermined population.
在本发明的第三方面,本发明提出了一种电子设备,根据本发明的实施例,包括处理器和存储器,存储器存储有能够被处理器执行的机器可执行指令,处理器执行机器可执行指令以实现前面的确定个体免疫力指数的方法。In a third aspect of the present invention, the present invention provides an electronic device, according to an embodiment of the present invention, comprising a processor and a memory, the memory stores machine-executable instructions that can be executed by the processor, and the processor executes the machine-executable instructions. Instructions to implement the preceding method of determining an individual's immunity index.
在本发明的第四方面,本发明提出了一种机器可读存储介质,根据本发明的实施例,该机器可读存储介质存储有机器可执行指令,该机器可执行指令在被处理器调用和执行时,机器可执行指令促使处理器实现前面任一项的确定个体免疫力指数的方法。In a fourth aspect of the present invention, the present invention provides a machine-readable storage medium. According to an embodiment of the present invention, the machine-readable storage medium stores machine-executable instructions, and the machine-executable instructions are called by a processor when the and when executed, the machine-executable instructions cause a processor to implement any of the preceding methods of determining an immunity index of an individual.
实施例1:Example 1:
1、测序数据获取1. Sequencing data acquisition
采集1000例志愿者的外周血液5mL,利用DNA提取试剂盒提取外周血样本的DNA,利用V基因和J基因特异性引物对DNA样本进行扩增,引物中带有部分测序接头,以便获得带有部分测序接头的V基因样本和J基因样本。Collect 5 mL of peripheral blood from 1000 volunteers, extract the DNA from the peripheral blood samples using a DNA extraction kit, and amplify the DNA samples using V gene and J gene specific primers with partial sequencing adapters in order to obtain DNA samples with V gene samples and J gene samples of partially sequenced adapters.
针对所得到的扩增样本,再利用带有测序接头的引物进行进一步扩增建库,并对测序文库进行高通量测序。For the obtained amplified samples, primers with sequencing adapters are used to further amplify and build a library, and the sequencing library is subjected to high-throughput sequencing.
2、测序数据分析2. Sequencing data analysis
数据下机后,对测序数据进行如下分析:After the data is off the computer, the sequencing data is analyzed as follows:
(1)采用SOAPnuke(v1.5.3)对原始测序数据进行接头污染序列、低质量碱基和序列(根据序列中碱基的平均质量值和所含的N碱基数量占比两个指标进行过滤,“read的碱基质量值小于等于20”、“N碱基数大于等于5”,两者满足其一或全满足的被过滤掉)的过滤;(1) SOAPnuke (v1.5.3) was used to perform junction contamination sequences, low-quality bases and sequences (filtered according to the average quality value of the bases in the sequence and the proportion of the number of N bases contained in the sequence) on the original sequencing data. , "the base quality value of the read is less than or equal to 20", "the number of N bases is greater than or equal to 5", the two satisfy one or all of them are filtered out);
(2)把FASTQ文件转换为FASTA文件;(2) Convert the FASTQ file to a FASTA file;
(3)采用blastall(v2.2.25)对预处理之后的FASTA序列比对到V(D)J参考基因序列上,并进行重比对,选择最佳的比对结果;(3) Use blastall (v2.2.25) to align the pretreated FASTA sequence to the V(D)J reference gene sequence, and perform multiple alignments to select the best alignment result;
(4)将比对后的序列数据进行结构分析(错误校正和区域确定),采用华大基因结构分析程序对PCR和测序环节的引入的错误进行矫正,其次利用V/J基因参考序列与保守氨基酸的规律与建立的计算方法确定CDR3区域。(4) Perform structural analysis (error correction and region determination) on the aligned sequence data, and use the BGI gene structure analysis program to correct the errors introduced in PCR and sequencing. The regularity of amino acids and the established computational method determine the CDR3 region.
3、指标统计与预测3. Indicator statistics and forecasting
对免疫组库特征数据进行统计,并根据自主开发模型进行免疫力预测和分析。Statistics on immune repertoire feature data, and immunity prediction and analysis based on self-developed models.
统计特征主要包括以下几个:Statistical features mainly include the following:
V/J基因使用多样性,即Shannon_index(V-J);V/J gene usage diversity, i.e. Shannon_index(V-J);
免疫多样性,即Shannon_index(CDR3_aa);Immune diversity, i.e. Shannon_index(CDR3_aa);
免疫细胞种类,即Uniq_number(CDR3_aa);Immune cell type, i.e. Uniq_number (CDR3_aa);
免疫细胞均一性,即Clone_Gini。Immune cell homogeneity, i.e. Clone_Gini.
以上指标中,Shannon_index表示Shannon指数,计算公式如下:Among the above indicators, Shannon_index represents the Shannon index, and the calculation formula is as follows:
其中,如果以CDR3为例,S表示唯一CDR3的总数,p(i)表示CDR3的频率。Among them, if CDR3 is taken as an example, S represents the total number of unique CDR3s, and p(i) represents the frequency of CDR3s.
Uniq_number表示唯一序列数。Uniq_number represents the unique sequence number.
Clone_Gini表示Gini指数,计算公式如下:Clone_Gini represents the Gini index, and the calculation formula is as follows:
其中,x指每一种免疫细胞类型出现的频率,n指免疫细胞种类数。Among them, x refers to the frequency of each immune cell type, and n refers to the number of immune cell types.
基于以上特征指数,结合血常规生化指标采用MAP(maximum a posteriori probability estimate,最大后验概率估计)模型进行IA计算,从而进行综合性免疫力评估和机体风险预测。Based on the above characteristic indexes, combined with blood routine biochemical indexes, the MAP (maximum a posteriori probability estimate) model was used for IA calculation, so as to conduct comprehensive immunity assessment and body risk prediction.
其中 为使函数f(x|θ)g(θ)取最大值的参数,即预测Immune Age(IA)的系数。若观测值为n维的(即x=(x 1,x 2,…,x n)),则 in The parameter to maximize the function f(x|θ)g(θ), that is, the coefficient of predicting Immune Age (IA). If the observed value is n-dimensional (ie x=(x 1 ,x 2 ,...,x n )), then
IA的预测公式如下:The prediction formula of IA is as follows:
确定免疫力:Determine immunity:
基于预测出来的IA,结合群体分布特征,最终确定个体免疫力Immune Index(II)情况,具体模型如下:Based on the predicted IA, combined with the population distribution characteristics, the individual immunity Immune Index (II) is finally determined. The specific model is as follows:
其中,IA表示预测样本的免疫年龄,IA max和IA min分别表示群体分布中的上限和下限。 where IA represents the immune age of the predicted sample, and IA max and IA min represent the upper and lower bounds in the population distribution, respectively.
实施例2:Example 2:
1、测序数据获取1. Sequencing data acquisition
采集439例志愿者的外周血液5mL,利用DNA提取试剂盒提取外周血样本的DNA,利用V基因和J基因特异性引物对DNA样本进行扩增,引物中带有部分测序接头,以便获得带有部分测序接头的V基因样本和J基因样本。Collect 5 mL of peripheral blood from 439 volunteers, extract the DNA from the peripheral blood samples using a DNA extraction kit, and amplify the DNA samples using V gene and J gene specific primers with partial sequencing adapters in order to obtain DNA samples with V gene samples and J gene samples of partially sequenced adapters.
针对所得到的扩增样本,再利用带有测序接头的引物进行进一步扩增建库,并对测序文库进行高通量测序。For the obtained amplified samples, primers with sequencing adapters are used to further amplify and build a library, and the sequencing library is subjected to high-throughput sequencing.
2、测序数据分析2. Sequencing data analysis
数据下机后,对测序数据进行如下分析:After the data is off the computer, the sequencing data is analyzed as follows:
(1)采用SOAPnuke(v1.5.3)对原始测序数据进行接头污染序列、低质量碱基和序列(根据序列中碱基的平均质量值和所含的N碱基数量占比两个指标进行过滤,“read的碱基质量值小于等于20”、“N碱基数大于等于5”,两者满足其一或全满足的被过滤掉)的过滤;(1) SOAPnuke (v1.5.3) was used to perform junction contamination sequences, low-quality bases and sequences (filtered according to the average quality value of the bases in the sequence and the proportion of the number of N bases contained in the sequence) on the original sequencing data. , "the base quality value of the read is less than or equal to 20", "the number of N bases is greater than or equal to 5", the two satisfy one or all of them are filtered out);
(2)把FASTQ文件转换为FASTA文件;(2) Convert the FASTQ file to a FASTA file;
(3)采用blastall(v2.2.25)对预处理之后的FASTA序列比对到V(D)J参考基因序列上,并进行重比对,选择最佳的比对结果;(3) Use blastall (v2.2.25) to align the pretreated FASTA sequence to the V(D)J reference gene sequence, and perform multiple alignments to select the best alignment result;
(4)将比对后的序列数据进行结构分析(错误校正和区域确定),采用华大基因结构分析程序对PCR和测序环节的引入的错误进行矫正,其次利用V/J基因参考序列与保守氨基酸的规律与建立的计算方法确定CDR3区域。(4) Perform structural analysis (error correction and region determination) on the aligned sequence data, and use the BGI gene structure analysis program to correct the errors introduced in PCR and sequencing. The regularity of amino acids and the established computational method determine the CDR3 region.
3、指标统计3. Indicator statistics
对免疫组库特征数据进行统计,统计特征主要包括以下3个:Statistical data on immune repertoire characteristics mainly include the following three:
免疫多样性,即Shannon_index(CDR3_aa);Immune diversity, i.e. Shannon_index(CDR3_aa);
免疫细胞种类,即Uniq_number(CDR3_aa);Immune cell type, i.e. Uniq_number (CDR3_aa);
序列多样性,即Uniq_number(seq_aa)。Sequence diversity, i.e. Uniq_number(seq_aa).
以上指标中,Shannon_index表示Shannon指数,计算公式如下:Among the above indicators, Shannon_index represents the Shannon index, and the calculation formula is as follows:
其中,如果以CDR3为例,S表示唯一CDR3的总数,p(i)表示CDR3的频率。Among them, if CDR3 is taken as an example, S represents the total number of unique CDR3s, and p(i) represents the frequency of CDR3s.
Uniq_number表示唯一序列数。Uniq_number represents the unique sequence number.
4、预处理4. Preprocessing
移除3个含有缺失值的样本。Remove 3 samples with missing values.
5、模型训练5. Model training
将剩余436个样本按年龄分为3组(20-30岁、30-50岁、>50岁),从每组各随机抽取75%的样本并将其合并为训练集,剩余的111个样本作为测试集。Divide the remaining 436 samples into 3 groups by age (20-30 years old, 30-50 years old, >50 years old), randomly select 75% of the samples from each group and combine them into the training set, the remaining 111 samples as a test set.
使用训练集,基于上述3个免疫组库特征指数,采用MAP(maximum a posteriori probability estimate,最大后验概率估计)模型进行IA计算,从而进行综合性免疫力评估和机体风险预测。模型参数的训练过程如下:Using the training set, based on the above three immune repertoire characteristic indices, the MAP (maximum a posteriori probability estimate, maximum a posteriori probability estimation) model was used for IA calculation, so as to conduct comprehensive immunity assessment and body risk prediction. The training process of the model parameters is as follows:
其中 为使函数f(x|θ)g(θ)取最大值的参数,即预测Immune Age(IA)的系数。此处观测值是3维的(即x=(x 1,x 2,x 3)),则 in The parameter to maximize the function f(x|θ)g(θ), that is, the coefficient of predicting Immune Age (IA). Here the observations are 3-dimensional (ie x=(x 1 , x 2 , x 3 )), then
基于训练出的参数,得到IA的预测公式:Based on the trained parameters, the prediction formula of IA is obtained:
基于预测出来的IA,结合群体分布特征,最终确定个体免疫力Immune Index(II)。具体公式如下:Based on the predicted IA, combined with the population distribution characteristics, the individual immunity Immune Index (II) was finally determined. The specific formula is as follows:
其中,IA表示预测样本的免疫年龄,IA max和IA min分别表示群体分布中的上限和下限。 where IA represents the immune age of the predicted sample, and IA max and IA min represent the upper and lower bounds in the population distribution, respectively.
6、II预测结果6. II prediction results
从图5和6可以看出,随着年龄的增加,免疫力指数呈下降趋势。尽管年龄段大于50的样本量较少,图6呈现出的免疫力指数下降趋势不太明显,但图5中呈现出的免疫力指数下降趋势较明显。因此,该实施例的结果表明,免疫力指数可以作为一个用来评估健康指数的指标。As can be seen from Figures 5 and 6, the immunity index showed a downward trend with increasing age. Although the sample size of the age group greater than 50 is small, the decline trend of the immunity index shown in Figure 6 is not obvious, but the decline trend of the immunity index shown in Figure 5 is more obvious. Therefore, the results of this example show that the immunity index can be used as an index for evaluating the health index.
尽管上面已经示出和描述了本发明的实施例,可以理解的是,上述实施例是示例性的, 不能理解为对本发明的限制,本领域的普通技术人员在本发明的范围内可以对上述实施例进行变化、修改、替换和变型。Although the embodiments of the present invention have been shown and described above, it should be understood that the above embodiments are exemplary and should not be construed as limiting the present invention. Embodiments are subject to variations, modifications, substitutions and variations.
Claims (23)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202180065823.0A CN116391237A (en) | 2021-03-30 | 2021-09-08 | Method, device, electronic device and machine-readable storage medium for determining individual immunity index |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110342463.6 | 2021-03-30 | ||
| CN202110342463 | 2021-03-30 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2022205775A1 true WO2022205775A1 (en) | 2022-10-06 |
Family
ID=83455556
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2021/117149 Ceased WO2022205775A1 (en) | 2021-03-30 | 2021-09-08 | Method and device for determining immunity index of individual, electronic device, and machine-readable storage medium |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN116391237A (en) |
| WO (1) | WO2022205775A1 (en) |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100062473A1 (en) * | 2006-06-15 | 2010-03-11 | Katsuiku Hirokawa | Immunity evaluation method, immunity evaluation apparatus, immunity evaluation program and data recording medium having the immunity evaluation program stored therein |
| US20140235478A1 (en) * | 2013-02-04 | 2014-08-21 | The Board Of Trustees Of The Leland Stanford Junior University | Measurement and Comparison of Immune Diversity by High-Throughput Sequencing |
| US20180356403A1 (en) * | 2017-06-09 | 2018-12-13 | The Regents Of The University Of California | Use of Immune Repertoire Diversity For Predicting Transplant Rejection |
| WO2019215740A1 (en) * | 2018-05-07 | 2019-11-14 | Technion Research & Development Foundation Limited | Immune age and use thereof |
| WO2020178816A1 (en) * | 2019-03-04 | 2020-09-10 | The National Institute for Biotechnology in the Negev Ltd. | Kits, compositions and methods for evaluating immune system status |
| CN112331344A (en) * | 2020-11-12 | 2021-02-05 | 深圳泛因医学有限公司 | Immune state evaluation method and application |
-
2021
- 2021-09-08 WO PCT/CN2021/117149 patent/WO2022205775A1/en not_active Ceased
- 2021-09-08 CN CN202180065823.0A patent/CN116391237A/en active Pending
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100062473A1 (en) * | 2006-06-15 | 2010-03-11 | Katsuiku Hirokawa | Immunity evaluation method, immunity evaluation apparatus, immunity evaluation program and data recording medium having the immunity evaluation program stored therein |
| US20140235478A1 (en) * | 2013-02-04 | 2014-08-21 | The Board Of Trustees Of The Leland Stanford Junior University | Measurement and Comparison of Immune Diversity by High-Throughput Sequencing |
| US20180356403A1 (en) * | 2017-06-09 | 2018-12-13 | The Regents Of The University Of California | Use of Immune Repertoire Diversity For Predicting Transplant Rejection |
| WO2019215740A1 (en) * | 2018-05-07 | 2019-11-14 | Technion Research & Development Foundation Limited | Immune age and use thereof |
| WO2020178816A1 (en) * | 2019-03-04 | 2020-09-10 | The National Institute for Biotechnology in the Negev Ltd. | Kits, compositions and methods for evaluating immune system status |
| CN112331344A (en) * | 2020-11-12 | 2021-02-05 | 深圳泛因医学有限公司 | Immune state evaluation method and application |
Also Published As
| Publication number | Publication date |
|---|---|
| CN116391237A (en) | 2023-07-04 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20200357487A1 (en) | Computer-implemented method and system for determining a disease status of a subject from immune-receptor sequencing data | |
| JP2014503223A (en) | Method for evaluating immune diversity and use thereof | |
| US20250329410A1 (en) | Systems and Methods for Evaluating Immunological Peptide Sequences | |
| JP2022512890A (en) | Sample quality evaluation method | |
| WO2021232388A1 (en) | Method for determining base type of predetermined site in embryonic cell chromosome, and application thereof | |
| CN110904213A (en) | Intestinal flora-based ulcerative colitis biomarker and application thereof | |
| CN112992273A (en) | Early colorectal cancer risk prediction evaluation model and system | |
| CN118430784A (en) | Diagnostic system and construction method for advanced fibrosis and cirrhosis of CHB combined with HS | |
| Rawat et al. | Identification of a type 1 diabetes-associated T cell receptor repertoire signature from the human peripheral blood | |
| CN116287207B (en) | Application of biomarkers in diagnosing cardiovascular-related diseases | |
| WO2019224668A1 (en) | Method for determining the probability of the risk of chromosomal and genetic disorders from free dna of fetal origin | |
| WO2025175065A1 (en) | Systems and methods for assessment of immune response and applications thereof | |
| CN115856309A (en) | Application of protein marker in preparation of product for diagnosing Parkinson's disease or predicting Parkinson's disease | |
| CN113178257A (en) | Training method of classification model of pulmonary nodules | |
| WO2022205775A1 (en) | Method and device for determining immunity index of individual, electronic device, and machine-readable storage medium | |
| CN110459312B (en) | Rheumatoid arthritis susceptibility sites and their applications | |
| CN114424291A (en) | Immune repertoire health assessment system and method | |
| Shen et al. | DeepTAPE: Enhancing Systemic Lupus Erythematosus Diagnosis with Deep Learning Based on TCRβ CDR3 Sequences | |
| CN117233389A (en) | Markers for rapid identification of CEBPA double mutations in acute myeloid leukemia | |
| WO2022210606A1 (en) | Method for evaluating future risk of developing dementia | |
| CN115873936A (en) | Biomarker CXCL8 and application thereof | |
| CN119446284B (en) | Application of bone marrow gene marker in prediction of marrow clearing sensitivity of sulfonic acid alkylating agent | |
| Ghraichy et al. | Maturation of naïve and antigen-experienced B-cell receptor repertoires with age | |
| CN119842887B (en) | Biomarkers and their application in evaluating severe bronchiolitis in children with respiratory syncytial virus infection | |
| Pinal-Fernandez | Transcriptome profiling and longitudinal cohort studies of myositis subsets |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21934414 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 14/02/2024) |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 21934414 Country of ref document: EP Kind code of ref document: A1 |