[go: up one dir, main page]

WO2008000186A1 - Méthode d'identification d'un nouveau gène et nouveaux gènes résultants - Google Patents

Méthode d'identification d'un nouveau gène et nouveaux gènes résultants Download PDF

Info

Publication number
WO2008000186A1
WO2008000186A1 PCT/CN2007/070153 CN2007070153W WO2008000186A1 WO 2008000186 A1 WO2008000186 A1 WO 2008000186A1 CN 2007070153 W CN2007070153 W CN 2007070153W WO 2008000186 A1 WO2008000186 A1 WO 2008000186A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
software
protein
sequences
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2007/070153
Other languages
English (en)
Chinese (zh)
Other versions
WO2008000186A8 (fr
Inventor
Zailin Yu
Zhihua Zheng
Y. Tom Tang
Genny Yan Yu Fu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING BIOWAY-FORTUNE RESEARCH CENTER FOR GENE DRUGS Ltd
Tianjin Sinobiotech Ltd
Fortunerock Inc
Original Assignee
BEIJING BIOWAY-FORTUNE RESEARCH CENTER FOR GENE DRUGS Ltd
Tianjin Sinobiotech Ltd
Fortunerock Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING BIOWAY-FORTUNE RESEARCH CENTER FOR GENE DRUGS Ltd, Tianjin Sinobiotech Ltd, Fortunerock Inc filed Critical BEIJING BIOWAY-FORTUNE RESEARCH CENTER FOR GENE DRUGS Ltd
Priority to CNA2007800202904A priority Critical patent/CN101460625A/zh
Publication of WO2008000186A1 publication Critical patent/WO2008000186A1/fr
Anticipated expiration legal-status Critical
Publication of WO2008000186A8 publication Critical patent/WO2008000186A8/fr
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • C07K14/775Apolipopeptides
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P9/00Drugs for disorders of the cardiovascular system
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/92Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving lipids, e.g. cholesterol, lipoproteins, or their receptors
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K38/00Medicinal preparations containing peptides
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Definitions

  • the present invention relates to the creation of new biological computer analysis methods and pathways for obtaining new functional genes.
  • the results demonstrate that this analytical method allows for the acquisition of new gene sequences that are consistent with the published human genome chromosomal DNA sequences.
  • This method can be used to analyze and acquire new genes that have biological functions and are related to the diagnosis and treatment of human health and disease, especially genes that are genetic drugs or drug targets.
  • Gene drugs are based on the products of functional genes or genes found in genomics research, and are made by corresponding technologies such as biology, molecular biology or biochemistry, bioengineering, etc., and control intermediate products by corresponding analytical techniques. And finished quality bioactive substance products, clinically useful for the treatment, prevention and diagnosis of certain diseases.
  • Recombinant protein drugs, vaccines, DNA drugs, RNA drugs, and gene therapy drugs are all genetic drugs.
  • Gene drug target refers to the functional gene and gene product (functional protein) found in genomics research, starting from biology, chemistry, physics, molecular biology or biochemistry, bioengineering and other related technologies.
  • An antagonist or inhibitor for example: obtaining a specific antibody, causing the functional protein to lose biological activity by antigen, antibody binding, or screening for a small molecule compound having a biological activity inhibiting the gene product (antibody or small molecule compound) ) as a drug for the treatment and diagnosis of human diseases.
  • the "traditional" gene drug and drug target gene discovery steps are based on the symptoms of the disease to find the difference between physiological and biochemical indicators of normal people and patients, for example: human growth hormone is due to the patient's height is the same as normal People are relatively short, and through various analyses, they are found to be endogenously deficient in human growth hormone secretion and cause insufficiency.
  • human growth hormone is extracted from human urine in the early stage, and then injected into The patient
  • the isolated and purified natural protein is sequenced, and then the DNA sequence is deduced from the protein sequence, synthesized, compounded, detected (DNA probed), and the gene fragment is "displayed” to obtain the complete sequence.
  • DNA probed DNA probed
  • DNA probed DNA probed
  • the gene fragment is "displayed” to obtain the complete sequence.
  • a foreign system such as E. coli
  • Prepared, purified recombinant proteins, genetically engineered drugs are established through preclinical testing (animal testing) and clinical trials. This process can be referred to as a "traditional" or "classical” genetic drug discovery program.
  • the present invention provides a method of discovering a novel gene, the method comprising the steps of:
  • step 2) performing secretory signal peptide analysis on the protein sequence obtained in step 1), respectively obtaining a protein sequence containing a secreted signal peptide and a protein sequence not containing a secreted signal peptide;
  • step 1) performing a transmembrane region analysis on the protein sequence obtained in step 1), respectively obtaining a protein sequence containing a transmembrane region and a protein sequence containing no transmembrane region;
  • steps 2) and 3 Combining the sequence of results obtained in steps 2) and 3), roughly dividing the sequence into: a sequence containing a secreted signal peptide and not containing a transmembrane region; a sequence containing a secreted signal peptide and containing a transmembrane region; a three-category sequence of secreted signal peptides and sequences having a transmembrane region in the sequence of 5 to 8 in the sequence of transmembrane regions;
  • the matching condition is: the sequence similarity is 15% ⁇ 95%, preferably 20% to 90%, more preferably more than 25% and less than 90%, and it is required that these mutation points are distributed as uniformly as possible throughout the matched sequence;
  • the method of discovering novel genes of the present invention is accomplished by a computer system platform, the computer system platform comprising:
  • the following software for sequence editing - software for converting sequences in fasta format to sequences in tabular format; software for converting sequences in GenBank format to sequences in fasta format; reverse sequence complementary program for DNA sequences; translation program for DNA sequences; acquisition of GenBank Software for CDS sequences in a formatted sequence file; software that combines two simple sequence fragments and filters out duplicates between them;
  • Software that implements the deletion of any sequence in the database software that implements the operation of inserting sequence sequences into sequences; software that batches or individually acquires certain sequences in large databases; performs DNA on temporary, unindexed databases, Software for obtaining protein sequence access; software for directly acquiring sequence data on GenBank via network; software for directly acquiring sequence data on local database from local network; program for indexing database in Fasta format; database for GenBank format Indexing program; software for fragment sequence acquisition of genomic sequences; software for facilitating acquisition of a fragment in a sequence;
  • the computer system platform is preferably based on the Linux operating system.
  • the bioinformatics processing program and the established system platform technology of the present invention can be used to discover new genes and analyze their products, so that humans can more clearly understand the relationship between gene expression and diseases, and improve the level of disease treatment.
  • the present invention employs the procedure "reverse” opposite to the conventional "traditional” process described above to perform functional genomics studies of gene drugs, with the aim of greatly accelerating the screening of novel gene drugs, and the present invention is designed to show conventional "traditional” gene drugs.
  • the search method is simpler, requires less computer equipment, and is easier to operate and master, and can shorten the results for several years.
  • the invention firstly compiles a novel computer program software processing system to specifically screen genetic drugs and drug target genes.
  • the self-programming program uses a published human genomic DNA sequence to manipulate a Linux system platform through a series of program software to predict new protein (gene) sequence (ORF) coding.
  • This software operating system will combine the operating system and advantages of disease types, disease occurrence, formation mechanisms, mechanisms, genetic information, such as the use of bioinformatics to predict secreted peptides, signal peptides, transmembrane regions, and Various existing functional genomics tools and computational tools are integrated and augmented with a new self-programming software system to achieve predictive screening and splicing of novel genes.
  • the possible ORF sequences predicted by the computer through functional genomics studies, using high-throughput screening methods, the steps are to screen for gene drugs at the cellular and animal levels.
  • the splicing, cloning and amplification of genes are accomplished using molecular biology techniques.
  • high-throughput screening methods such as quantitative PCR and gene chip technology
  • the computer bioinformatics discovery and analysis technology platform of the present invention can be used for new gene discovery in humans, and can also be used, but not limited to, for genetic discovery and analysis purposes of animals, plants, and microorganisms.
  • the computer-generated bioinformatics analysis program and the established feasible platform technology of the present invention combine existing published human genome research materials and information with a program designed and operated by the present invention to analyze a large amount of data and libraries, and obtain new ones therefrom.
  • the predicted gene is designed to solve the shortcomings in technology and time to obtain new genes using traditional techniques.
  • the programming involved in the present invention has the following advantages: 1) rapid analysis and acquisition of new possible genes; 2) simple and efficient operation procedures; 3) acquired new genes with biological functions And the possibility of clinical application as a gene drug and gene drug target.
  • LDL low density lipoprotein
  • HDL high density lipoprotein
  • the protein in lipoproteins is called apolipoprotein (Apolip 0 p r0 tein).
  • Lipoprotein combines with cholesterol to form lipoprotein cholesterol, which operates cholesterol inside and outside the cell.
  • the clinical significance of the reduction of high-density lipoprotein cholesterol may indicate a predisposition to coronary heart disease.
  • the clinical significance of low-density lipoprotein cholesterol increase may indicate coronary heart disease and cerebrovascular disease caused by atherosclerosis.
  • the key step in the reverse transport of cholesterol is to transfer cholesterol from the cell to the extracellular lipoprotein.
  • An important component of various lipoproteins is apolipoprotein.
  • Apolipoproteins are responsible for transporting different lipoproteins to various parts of the body.
  • Apolipoprotein is a protein located on the surface of lipoproteins, which are composed of amino acids in a certain order. They are present in various types of lipoproteins in a variety of forms and in different ratios.
  • Various lipoproteins also have different functions and different metabolic pathways due to the different types of apolipoproteins they contain.
  • AI-BP apoA-I binding protein
  • the AI-BP-encoded gene, APOA1BP is located on chromosome lq21 and consists of 6 exons and 5 introns. 2.5kbo Northern blot analysis confirmed that APOAlBP mRNA is ubiquitously expressed in kidney, heart, liver, and thyroid gland. High expression in the adrenal gland and testis.
  • the AI-BP protein was not found in normal human serum, but there was a high level of AI-BP in serum samples from patients with septic syndrome. Healthy human AI-BP protein has a significant amount in cerebrospinal fluid and urine.
  • the present invention also discloses for the first time two novel genes similar to apolipoprotein-related proteins obtained by the procedures and methods of the present invention, which are located on human chromosome 19. These two genes differ from the apolipoprotein interacting protein genes that are now available: 1) are located on different chromosomes; 2) have no secreted peptides; 3) the amino acid sequence of the protein is only 40.0 compared to the known ⁇ gene.
  • the present invention describes various known or disclosed biological information materials, the acquisition of libraries and their localized work content, the obtained libraries and materials are, but not limited to, NCBI remote databases, biological information analysis required for downloading Related latest database. These include: Human Expression DNA Sequence Tag Database, Non-redundant Protein Sequence Database, Nucleotide Database, Patented Protein Sequence Database, Human Chromosomal Sequence Database, etc. All of these downloaded databases are formatted on the local computer. Convert it to a sequence format database that is recognized by the local program.
  • All of the applied databases, libraries, and databases that are highlighted in the present invention are from publicly available data, and are validated and digitally processed by local computers to form ready to be retrieved by the local computer and can be used with the present invention. Programming fusion and programming.
  • the biological information analysis program used in the present invention mainly has, but is not limited to,
  • sequence alignment is - blastall: NCBI (National Center for Biotechnology Information) blast package, which can perform rough alignment of gene sequences; WU-blast: University of Washington's blast package, which is new The functions of gene retrieval and analysis are excellent; Fasta: EMBL (European Molecular Biology Laboratory) sequence alignment software package; clustalw: multiple sequence alignment analysis software; sim4: expression sequence and chromosome genome sequence alignment software ;
  • the software used for database editing is:
  • Pressdb WU-blast program-specific nucleotide sequence database formatting software; im-index: mainly used to index the sequence database to achieve large database operability; setdb: WU-blast program-specific protein sequence database format Software
  • the software used for sequence stitching has -
  • Cap4/Phrap Sequence splicing software from the University of Washington Genomics Research Center; merger: simple sequence splicing software;
  • Tmpred predicts transmembrane of protein sequences
  • Signalp predicts signal peptides of protein sequences
  • remap Sequence cleavage site analysis software
  • restrict sequence cleavage information statistical software
  • showorf DNA sequence translation software
  • pepinfo graphical display of various amino acid content in protein sequences
  • pepstats statistics of various amino acids in protein sequences The content also yields molecular weight, isoelectric point, charged charge and light absorption at 280 nm
  • pepwheel graphically shows the helical wheel of all amino acid residues in the protein sequence
  • Proparam mainly used to comprehensively determine the hydrophilicity/hydrophobicity of the protein
  • Tmap Graphic showing the transmembrane region of the protein
  • ps_scan Protein active site/domain analysis software.
  • the invention provides an independent bio-computer program for the prediction of new genes and analysis of the results of the examples.
  • the present invention also includes all programs that are run on a local computer and that form the new gene discovery and analysis technology system platform of the present invention.
  • sequence format conversion software can convert the sequence of fasta format into a table format sequence; gb2f as ta : Sequence format conversion software, converting sequences in GenBank format to sequences in fasta format; tt-comp-dna: sequence editing software, DNA sequence reverse complement program; translate: sequence editing software, DNA sequence translation program; gb2cds : sequence editing software , obtain the CDS sequence in the sequence file in GenBank format; tt-zip-2: sequence editing software, mainly used to merge two simple sequence fragments, and filter out the repeated parts between them;
  • the software used for database operations is:
  • Im-del database editing software, can delete any sequence in the database; im-insert: database editing software, can realize the operation of adding sequence to the sequence database; im-retrieve: database editing software, batch or single Obtain some sequences in a large database; tt _get: software for DNA and protein sequence acquisition operations on temporary unindexed databases; rfetch: database operation software to directly obtain sequence data on GenBank via the network; lfetch: database Operating software, the local network directly obtains the sequence data on the local database; biofaseqindex: database editing software, the program for indexing the database in Fasta format; biogbseqindex: database editing software, the program for indexing the database in GenBank format; tt_subseq_genome : software for fragment sequence acquisition of genomic sequences; tt-sub-seq: software for sequence editing, which facilitates the acquisition of a fragment in a sequence;
  • the software used for plotting the alignment comparison results has a - drawBlast: blast result plotting program that can make a rough comparison of the results of the blast results;
  • the software used for data analysis is:
  • Tt_tmpred_p Data parser software, dedicated to parsing tt-tmpred to generate analysis result data; parser-bx: parser software, software for parsing the output of programs such as blastn, blastp, blastx; parser-fasta: parser software, Software for parsing the results of the fasty comparison program output; ps_signalp: data parser software, parsing the result data generated by the pepsigp program; tt_pblast: blastn result parsing software, automatic machine analysis for a large number of result output; used to assist other programs
  • the software that runs is - tt-cycle: auxiliary software, mainly used to meet the requirements of some programs that cannot be automated. Automated operation;
  • the re-optimized software is:
  • Ed_ca P 4 Recompiled Cap4 program, the implementation can automatically complete the configuration of the cap4 runtime environment; extractcontigs: converts the score matrix data output from cap4 into a file in fasta format; pepsigp: Recompiled Signalp software, only a single prediction The signal peptide program is improved to achieve comprehensive forecasting of batch automation; primers-for-fulllength- clone: batch primer design software; tt_fasty_l: improved fasty program, the main purpose is to achieve convenient operation; tt-tmpred: recompiled protein sequence Transmembrane region prediction, the improved sequence can achieve batch analysis.
  • the computer system platform of the present invention may also include other authoritative biological information data analysis software packages, such as Emboss (Biology Sequence Analysis Software Package, which is available at http://eTnboss.sourceforge.net).
  • Source code can be purchased at http: ⁇ www.ch.embnet.org/ to obtain the Linux version of the source code
  • Singlp secretory signal peptide analysis, available at http: ⁇ www.cbs.dtu.
  • Predict Protein protein basic information analysis prediction, available for free download at http://www.predictprotein.org/
  • Clustalw preface and sequence analysis, available at http:/ /www.ebi.ac.uk/Download to the free Linux software
  • Primer primer design analysis, available at http://primer3.sourceforge.net/.
  • the preferred operating environment of the software of the present invention is a Linux system. To filter and filter large amounts of data, it is necessary to obtain the source code of the above software, and use the core functional part to recompile the software suitable for the operation of the platform.
  • nucleotide sequence libraries are downloaded from the NCBI remote database.
  • the process and method of the database are downloaded from the NCBI remote database.
  • the use of various biological information analysis software and systems is enumerated, in particular, the computer analysis system platform of the present invention is compiled, so that independent software analysis systems can work together to discover and analyze new genes. jobs.
  • a comprehensive process framework diagram of a computer operational analysis program for discovering new genes is presented.
  • the computer and software engineering completed by the present invention is an independent and complete biological information processing system platform, which can be copied, copied and transplanted, and can be used for, but not limited to, discovery and functional analysis of new genes. , demonstration, teaching, business purposes, clinical treatment and medical diagnostic applications.
  • This information processing platform in the present invention is actually applied to the discovery and analysis of new protein sequences (see Example 3 for specific operations) to obtain 35 possible new protein sequences.
  • Two of them are similar to the known apolipoprotein A1BP gene and are disclosed as an example.
  • the two new genes, BFC06016 and BFC06104 have the nucleotide sequences shown in Seq ID No. 1 and Seq ID No. 3, respectively; they are in GenBank accession numbers DQ778079 and DQ778080, respectively.
  • the amino acids encoded by the nucleotide sequence are the sequences shown by Seq ID No. 2 and Seq ID No. 4, respectively.
  • genes can be obtained by whole DNA sequence synthesis methods and used in biological and clinical application research and product development applications.
  • a method and technique for synthesizing a whole gene DNA sequence are exemplified in detail.
  • the main method is to use PCR to distribute synthetic DNA fragments and assemble them into full gene sequences, and the results of DNA sequencing confirmed the synthesis.
  • the two new human genes were found to be biologically active by quantitative PCR, and their expression profiles varied with human tissues and organs. Preliminary experimental results revealed that these two genes are biologically active.
  • the DNA and protein amino acid sequences of the present invention are applicable to new drug development and clinical diagnosis, preferably these genes are drug or drug target genes for diagnostic and therapeutic purposes related to cardiovascular diseases, more preferably as Gene drug or gene therapy drug target.
  • nucleotides of the novel gene of the present invention can be introduced into a host cell by recombinant cloning techniques to allow expression of the encoded protein.
  • the host cell is genetically engineered (transduced or transformed or transfected) into a host system carrying the vector plasmid containing the novel gene mentioned in the present invention in an invasive manner, a viral infection "phage” or the like.
  • the engineered host cells can be cultured in a medium containing conventional nutrients and modified as appropriate to facilitate the promoter.
  • the expression cells are selected by controlling the selection of transformants or amplification of the nucleotide conditions encoding the nucleotide strands of these genes, such as temperature and pH, in an appropriate manner.
  • the recombinant vector carries a nucleotide comprising a protein encoding a novel gene.
  • the recombinant vector may be an expression vector which expresses the fusion protein in a host cell by a nucleotide sequence encoding.
  • the form can be, but is not limited to, fusion or separate insertion.
  • Host organisms and somatic cells include, but are not limited to, vertebrates (such as humans, monkeys, rats, rabbits, etc.) fish, chickens, insects, plants, yeast, fungi, bacteria, and the like.
  • Nucleotides encoding the present invention can be expressed as proteins under the action of a suitable promoter.
  • suitable promoters include, but are not limited to, adenovirus promoters, such as the major late promoter of adenovirus; or heterologous promoters, such as the CMV promoter and the RSV promoter; inducible promoters may have MMT promoters , a heat-stimulated promoter, an albumin promoter, an ApoAI promoter, and a human globulin promoter; a viral thymidine promoter is a herpesvirus thymidine kinase promoter; a retroviral LTR promoter includes a modification Post-LTR promoter; beta-actin promoter; human growth hormone promoter.
  • a native promoter can also be used to control the expression of a protein encoded by a nucleotide.
  • recombinant cells have the ability to express a nucleic acid sequence encoding a protein as described herein.
  • the recombinant engineered cells can express the novel protein of the present invention continuously or in the presence or absence of an inducing agent.
  • Recombinant engineered cell forms include, but are not limited to, cells of vertebrates (i.e., humans, monkeys, mice, rabbits, fish, chickens, etc.), insects, plants, yeast, fungi, and bacteria.
  • Antibodies to novel proteins obtained according to the present invention and known techniques include, but are not limited to, polyclonal, monoclonal or humanized antibodies and the practical application of such antibodies.
  • the specific antibody is preferably produced by immunizing an animal.
  • the specific antibodies have important applications in clinical diagnosis, treatment, and as biological agents.
  • the present invention also provides a nucleic acid having a nucleotide sequence of at least 95%, preferably at least 96%, more preferably at least 97%, further preferably at least 98%, further preferably Seq ID No. 1 or Seq ID No. 3. At least 99% sequence homology, and the protein encoded by the nucleic acid has the same function as the protein encoded by Seq ID No. 1 or Seq ID No. 3, respectively.
  • the nucleic acid encoding the protein of the present invention are as follows: After substitution of the amino acid sequence defined by S e q ID N 0 .2 or S e q ID N 0 .4, deletion or addition of one or several amino acids and with Seq ID No. 2 or Seq ID
  • the protein represented by No. 4 has the same function as a protein derived from a protein represented by Seq ID No. 2 or Seq ID No. 4.
  • the present invention also provides a protein having an amino acid sequence of at least 90%, preferably at least 92%, more preferably at least 95%, further preferably at least 97%, still more preferably at least 99, with Seq ID No. 2 or Seq ID No. 4. % sequence homology, and the protein has the same function as the protein shown by Seq ID No. 2 or Seq ID No. 4.
  • the protein of the present invention is substituted, deleted or added with one or several amino acids in the amino acid sequence defined by Seq ID No. 2 or Seq ID No. 4 and with Seq ID No. 2 or Seq ID No. 4
  • the protein shown has the same function as a protein derived from a protein represented by Seq ID No. 2 or Seq ID No. 4. DRAWINGS
  • Figure 1 is a diagram of a computer network hardware connection framework for new gene discovery.
  • 2A, 2B and 2C are respectively a flow chart of a method for novel gene discovery, a flow chart of protein sequence clustering, and a comprehensive process framework diagram of a computer operation analysis program.
  • Figure 3 shows the DNA nucleotide sequence (A1) of two newly discovered apolipoprotein A1BP-like genes BFC06016 (A) and the corresponding amino acid sequence (A2) and BFC06104 (B) DNA nucleosides The acid sequence (B1) and the corresponding amino acid sequence (B2).
  • FIG 4 shows the results of protein hydrophobicity/hydrophilic prediction of BCF06016 (A) and BFC06104 (B) predicted by computer using ProParam software.
  • Figure 5A and Figure 5B show the results of protein transmembrane region analysis of BFC06016 gene by Tmpred/tmap analysis software, which proves that there is no transmembrane region (Fig. 5A). Similarly, BFC06104 has no transmembrane. Zone ( Figure 5B).
  • Figures 6A and 6B show the helical wheel of each amino acid residue in the protein sequence using the pepwheel pattern.
  • Figure 6A shows the results of BFC06016 and Figure 6B for the BFC06104 protein amino acid spiral wheel analysis.
  • Figures 7A and 7B use pepinfo to count the amino acid content and distribution of various amino acids in the protein sequence.
  • Figure 7A shows the results of the BFC06016 gene analysis
  • Figure 7B shows the results of the BFC06104 gene analysis.
  • Figure 8 shows that the BFC06016 (A) and BFC06104 (B) genes are located on the human chromosome 19 DNA sequence.
  • Figure 9 shows the amino acid homology between BFC06016 and BFC06104 and the known apolipoprotein A1BP gene.
  • the star character (*) represents the same amino acid between the three genes; the blank symbol ( ) indicates that the amino acid is not the same among the three; the lower dot symbol (.) represents the amino acid semi-conservative mutation; the upper and lower two points (:) represent the amino acid conserved mutation .
  • the amino acid homology between BFC06016 and apolipoprotein A1BP was 40.0%; the amino acid homology between BFC06104 and apolipoprotein A1BP was 41.5%.
  • Figure 10 is a flow chart showing the complete synthesis of a new gene nucleotide sequence predicted by a computer.
  • Figure 11 is a quantitative map of the expression of these genes observed in different human tissues and cell lines using qPCR detection primers prepared using the common DNA sequences of the two newly discovered genes.
  • Figure 12 shows the newly discovered gene clones and protein expression and results in bacteria. detailed description
  • the bioinformatics analysis program used in the present invention is derived from public channels or commercial software, and mainly includes blastall: NCBI (National Biotechnology Information Center) blast software package, which can realize the comparison of approximate gene sequences; WU- Blast: The blasting software package at the University of Washington, which performs well in the search and analysis of new genes; Fasta: EMBL (European Molecular Biology Laboratory) sequence alignment software package; cap4/Phrap: University of Washington Genomics Science Sequence splicing software for the research center; Tmpred: predicting transmembrane of protein sequences; Signalp: signal peptide for predicting protein sequences; clustalw: multi-sequence alignment analysis software; pressdb: database editing software, nucleotide sequence specific for WU-blast programs Database formatting software; sim4: expression sequence and chromosomal genome sequence alignment software; im_index: database editing software, mainly used to index the sequence database, realize the operability of large databases; setdb: database editing software, WU-blast program
  • Computer program for the execution of the present invention Mainly has the following software: tbl2f aS ta- n /f aS ta2tbl- n: sequence format conversion software, which can convert the sequence of fasta format into a sequence of table format; gb2f as ta : sequence Format conversion software, can convert the sequence of GenBank format into a sequence of fasta format; drawBlast: blast result drawing program, can make a rough comparison diagram by blast result data; ed-cap4: Recompiled Cap4 program, can Automatically complete the configuration of the cap4 runtime environment; extractcontigs: convert the score matrix data output by cap4 to a file in the fasta format; im-del: database editing software, which can delete any sequence in the database; im-insert: database editing Software, can achieve the operation of inserting sequences into the sequence database; im-retrieve: database editing software, batch or single to obtain certain sequences in large databases; pepsigp: recompiled Signalp software, only
  • Example 3 a new gene acquisition process
  • the flow chart of the novel gene discovery method of the present invention, the protein sequence clustering flow chart, and the computer operation analysis program integrated flow framework are shown in Figures 2A, 2B and 2C.
  • the patent protein database is parsed to obtain all protein sequences in the database of 500aa or less, preferably 400aa or less, more preferably 300aa or less (programs are fasta2tbl-n, tbl2fasta_n), and all sequences are transmitted to pepsigp for secretory signal peptides.
  • the results were transferred to ps-signal analysis, and the protein sequence containing the secreted signal peptide and the protein sequence containing no secreted signal peptide were obtained respectively; through the tt-cycle, all the sequences containing the signal peptide were transmembrane by tt-tmpred Regional dynamic predictive analysis, the results are directly sent to the program tt-tmp r ed_p for analysis, and then the sequence is divided into: a sequence containing a secreted signal peptide and containing no transmembrane region; containing a secreted signal peptide and containing a transmembrane region Sequence; a sequence of sequences having a transmembrane region containing a secreted signal peptide and having a transmembrane region of 5 to 8 (greater than or equal to 5 and less than or equal to 8) (see Figure 2B for details);
  • the amino acid sequence fragment obtained as a model is tblastn-aligned with the human expression
  • An expression sequence tag that matches (with a sequence similarity of 15% to 95%, preferably 20% to 90%, more preferably greater than 25% and less than 90%, preferably the mutation is distributed as uniformly as possible throughout the matched sequence), for example
  • all the sequence fragments containing the parameters that meet the parameter preferences can be obtained and sent to the tt_pblast for analysis by the pipeline to obtain all expressions that meet the preferred conditions.
  • the sequence of tags is sent to the script for filterA and ployT replacement filtering; establish the environment necessary for cap4 to run.
  • These fasta format sequences are all converted to xml data exchange format files using fastaclust2caml.
  • Cap4 and Phraps software are started to completely splicing these sequences, and then the contiguous data files are restored to the FASTA format file by extractcontigs, and the sequences are merged; the blastx ratio of these sequences and the non-secreted protein database is set first.
  • parser-bx parsing excludes all perfectly matched sequences, and then the remaining sequences and the patent protein sequence database are tt-fasty-1, and parser-fasta parses to obtain the remaining sequences;
  • the remaining programs and human chromosome sequence databases are blastn alignment verification analysis, and the patent nucleotide sequence database is blastn aligned to correct mutation or deletion problems on the sequence, and the nucleotide sequence database as well as the human expressed sequence tag database do blastn ratio Whether the analysis solves the problem of insufficient sequence length, and whether the blastx verification sequence has been found in the non-redundant protein database has been discovered; comparing the results obtained by the five repeated runs analysis, the full-length gene sequence can be obtained. This can be determined using Sim4 software.
  • ProParam can be used for predictive analysis of hydrophobicity/hydrophilicity of the protein
  • Signalp can be used for secretory signal peptide analysis using Signalp
  • Trpred and tmap can be used for protein transmembrane region analysis
  • the secondary structure of the protein can be analyzed using gamier; the spiral wheel of each amino acid residue in the protein sequence can be graphically displayed using pepwheel; the content of amino acids of various natures in the protein sequence can be counted using pepinfo and The distribution of these amino acids is shown; pepstat can be used to calculate the content of various amino acids in the protein sequence and obtain information such as molecular weight, isoelectric point, charged charge and light absorption value at 280 nm.
  • PubMed literature search a large number of related literatures are collected through PubMed literature search. The discovered genes are predicted for biological functions.
  • Example 4 Obtainment of a new apolipoprotein A1BP-like BFC06016 and BFC06104 genes According to the above new gene acquisition operation procedure, the actual operation is performed at the terminal of the server, and the present invention obtains 35 protein sequences predicted by the computer, which is possible. New gene candidates. Two new genes similar to the apolipoprotein A1BP gene are now numbered BFC06016 and BFC06104, respectively. Seq ID No. 1 and Seq ID No. 2 are the DNA sequence and amino acid sequence of BFC06016 and are listed in Figure 3 (A). Seq ID No. 3 and Seq ID No. 4 are the DNA sequence and amino acid sequence of BFC06104 and are listed in Figure 3 (B).
  • FIG. 4 (B) also has no secretory signal peptide); using Tmpred/tmap analysis software for protein transmembrane region analysis, BFC06016 gene protein transmembrane region analysis results, proved that it has no transmembrane The region (Fig. 5A), for the same reason, also proved that BFC06104 has no transmembrane region (Fig. 5B); the spiral wheel of each amino acid residue in the protein sequence is shown by the pepwheel pattern, and Fig. 6A and Fig. 6B are BFC06016 and BFC06104, respectively. Results of protein amino acid helical round analysis; using pepinfo to calculate the content of various amino acids in the protein sequence and Cloth, FIG. 7A shows the results of genetic analysis BFC06016, FIG 7B shows the results BFC06104 genetic analysis.
  • Example 5 Comparison between the human gene BFC06016 and BFC06104 similar to the apolipoprotein A1 binding protein gene and known apolipoprotein A1BP
  • Sim4 specific application software determines the position of the full-length gene on the chromosome; A1BP known human apolipoprotein genes are located in the human chromosome 1 (see Document Ritter et al Genetics, 79: 693-702,2002 ) 0
  • the BFC06016 and BFC06104 genes predicted by the computer analysis method designed by the present invention are located on human chromosome 19, respectively, as shown in Fig. 8 (A) and Fig. 8 (B).
  • the full-length cDNA sequences of the BFC06016 and BFC06104 genes were obtained in a human cDNA library, which are Seq ID No. 5 and Seq ID No. 6, respectively.
  • Amino acid sequence comparisons with the known human apolipoprotein AIBP are shown in Figure 9. Its amino acid homology with apolipoprotein A1BP was 41.5% and 40.0%, respectively.
  • the star character (*) represents the same amino acid among the three genes; the blank symbol ( ) indicates that the amino acid is not the same among the three; the lower dot symbol (.) represents the amino acid semi-conservative mutation; the upper and lower two points (:) represent the amino acid conserved mutation.
  • the amino acid homology between BFC06016 and apolipoprotein A1BP was 40.0%; the amino acid homology between BFC06104 and apolipoprotein A1BP was 41.5%.
  • Example 6 a brief description of molecular cloning techniques
  • coli was purchased from GIBCO/BRL. Purification of plasmid DNA, recovery of DNA fragments, and the like were carried out using a commercial Qiagen purification column. Pichia pastoris or BL21DE3 strains were used for protein expression and preparation.
  • Seq ID No. 7 5'- CACATATGAGCAGCGCAGCCGGCCCAGACCCGTCGGAGG CGCCCGAAGAGCGGC -3' Synthesis 1-57 positive strand, 54 bases long.
  • Seq ID No. 8 5'- GGGCGGCTGCCTCCGCGGTGCTGAGGAAATGCCGCTCTTC
  • GGGCGCCTCCG -3' Synthesis 37-87 is reverse complemented and is 51 bases in length.
  • Seq ID No. 9 5'- CCGCGGAGGCAGCCGCCCTGGAGCGGGAGCTGCTGGAGG ATTATCGCTTTGGGCGGC -3' 70-126 Positive strand, 57 bases long.
  • Seq ID No. 10 5'- CAGCCACGGCACTAGCATGACCGCACAGCTCCACGAGCT GCTGCCGCCCAAAGCGATA -3' 111-168 Reverse complement, 58 bases long.
  • Seq ID No. ll 5'- TGCTAGTGCCGTGGCTGTGACCAAGGCGTTCCCGTTGCC CGCTCTCTCCCGGAAGCAG -3' 152-209 Positive strand, 58 bases long.
  • Seq ID No. 12 5'- CTGCCCCGTTCTGCTCCGGGCCACACACGACCAGCACCG TCCTCTGCTTCC GGGAGAG -3' 195-252 Reverse complement, 58 bases long.
  • Seq ID No. 13 5'- GCAGAACGGGGCAGTGGGGCTGGTCTGTGCCCGGCACC
  • Seq ID No. 14 5'-GCAGGTCCAGCGAGCGTGTGGGGTAGAAGATGGTGGGT TCATACTCAAA CACCCGC -3' 278-333 Reverse complement, 56 bases long.
  • Seq ID No. 15 5'- CACGCTCGCTGGACCTGCTGCATCGGGACCTGACCACCC AGTGCGAGAAGATGGAC -3' 316-371 Positive strand, 56 bases long.
  • Seq ID No. 16 5'- ATGAGCTGCACCTCAGTGGGCAGGTAGCTCAGGAAGGG GATGTCCATCTTC TCGC -3' 358-412 is a reverse complement, 55 bases long.
  • Seq ID No. 17 5'- CCTGCCCACTGAGGTGCAGCTCATTAACGAAGCCTATGG GCTGGTGGTGGATGCCGT -3' 389-445 positive strand, 57 bases long.
  • Seq ID No.18 5'- GGGGCCCCCGACCTCGCCCGGCTCCACGCCGGGGCCCA
  • GTACGGCATCCACCACC -3' 431-485 is a reverse complement, 55 bases long.
  • Seq ID No. 19 5'-GCCGGGCGAGGTCGGGGGCCCCTGCACCCGCGCGCTGG CCACGCTCAAGCTGCTGTCC -3' 464-521 positive strand, 58 bases long.
  • Seq ID No.20 5'- GCCTGAGGGGATGTCCAGGCTCACGAGGGGGATGGACA GCAGCTTGAGCG TGGCC -3' 500-554 is reverse complemented and is 55 bases in length.
  • Seq ID No. 21 5'- CATCCCCTCAGGCTGGGACGCAGAGACCGGCAGCGATT CGGAGGACGGG CTGCGGCCTG -3' 542-600 positive strand, 59 bases long.
  • Seq ID No. 22 5'-GCGCAGCGCTTGGGCGCCGCGAGAGACACCAGCACGTC AGGCCGCAGCCC GTCCTCCGA -3' 579-637 Reverse complement, 59 bases long.
  • Seq ID No. 23 5'- CGTGCTGGT GTCTCTCGCG GCGCCCAAGCGCTGCGCTG G CCGCTTCTCCGGGCGCCACC -3' 602-660 positive strand, 59 bases long.
  • Seq ID No. 24 5'- CTTGCGGCGCACGTCATCGGGCACGAACCTGCCGGCCA CGAAGTGGTGGCG CCCGGAGA -3' 646-704 reverse complement, 59 bases long.
  • Seq ID No.25 5'- TGACGTGCGCCGCAAGTTCGCTCTGCGCCTGCCGGGATA
  • Seq ID No. 26 5'- TAGCGGCCGCTCACAGTGCCGCGACGCAGTCGGTGCCC GTGTATCCCGGC -3' 719-768 reverse complement, 50 bases long.
  • Seq ID No. 27 5'- CACATATGATGAGCAGCGCAG - 3' 1-21 positive strand, 21 bases long.
  • Seq ID No. 28 5'- TAGCGGCCGCTCACAGTGCCGC -3' 747-768 is reverse complementary and is 22 bases long.
  • oligonucleotide strand primer of the DNA sequence to be synthesized Starting from the first oligonucleotide strand primer of the DNA sequence to be synthesized, first, every four oligonucleotide strands are grouped, and a long-chain DNA fragment is synthesized by PCR. For example, Seq ID No. 7, Seq ID No. 8, Seq ID No. 9, and Seq ID No. 10 are a group. In a 25 ⁇ l PCR buffer reaction volume, the primer contents were ⁇ : ⁇ : ⁇ : 100 pM of primer, 20 mM dNTP, appropriate amount of water and lu T4 DNA poly-polymerization (T4 Taq Polymerase).
  • the PCR cycle reaction of 5 procedures is performed first, and then the oligoribonucleotide chain primers at both ends are added (here, If the first and second sets of products are combined, S e qID N 0 .7 and S e q ID No. 14 are each added to 100 pM).
  • a larger DNA fragment was prepared by the same PCR cycle procedure. However, the 72 ° C holding time in the cycle can be appropriately increased.
  • the full sequence synthesis of the designed DNA can be completed by the demonstration of operation diagram 10.
  • the BFC06016 and BFC06104 computer predicted gene sequences were used to obtain synthesis and preparation, and the 5' end thereof contained an Nde I restriction enzyme.
  • the full-length DNA synthesized by PCR was inserted into the pTA vector, and contained two EcoRI and Notl sites, respectively, to the left and right of the insertion site.
  • the DNA sequence was verified by sequencing to confirm that the synthesized DNA sequence was correct.
  • This plasmid was named pTA-BFC06016.
  • cDNA For quantitative PCR analysis, mRNA from human tissues and cell lines from various sources was used to synthesize cDNA. In a reaction volume of 25 ⁇ , 100 units of M-MLV reverse transcriptase (Ambion), 0.5 mM dNTPs were added. (Epicentre) and 40 ng/ml random 6 nucleotide primer (Fisher). The sample was reacted at 25 ° C for 10 minutes, at 42 ° C for 50 minutes, then at 70 ° C for 15 minutes, diluted to 500 ⁇ l and finally stored at -20 ° C. cDNA can also be It was purchased from a reagent company (including the use of the Clontech MTC cDNA library).
  • PCR primers and probes (6-FAM-labeled at the 5' end and TAMRA at the 3' end) were designed using ABI primer design software, which uses a common DNA sequence design PCR detection of two newly discovered genes. Primer. The synthesis was performed by Qiagen, Biosearch Technologies Inc or Applied Biosystems Inc.
  • Primer used Primer-F (nucleotide position 244-263) Seq ID No 29: 5'- CTGGAGGA
  • the QPCR reaction uses the ABI7700 Sequence Reaction Detection System. In a 25 ⁇ reaction volume, containing 5 ⁇ cDNA template, IX TaqMan Universal PCR Mixture Reaction (ABI), PCR primers for ⁇ , probe content 200 nM, and IX VIC-labeled Beta-2-Microglobulin endogenous control (ABI).
  • the PCR reaction conditions were 50 ° C, 2 minutes; 90 ° C, 10 minutes; then repeated 40 cycles at 95 ° C, 15 seconds, 60 ° C, 1 minute. Analysis of the results used sequence detection software (ABI) and application comparison CT methods to calculate the difference in multiples of the gene product.
  • the cloned cDNA fragment was cloned into an expression vector (6 histidine DNA sequences were inserted in the nucleotide reading frame of the gene).
  • IPTG was used to induce expression and purification of the target protein in E. coli (BL21/DE3) under the action of the T7 promoter (Fig. 12).
  • the target protein (with or without 6 XHis) was purified by affinity chromatography on a Ni column. After the purified protein was directly mixed with the immunoadjuvant, rabbits immunized with 3-4 kg were injected subcutaneously, and the immunization was repeated 3 times at intervals of 15-20 days.
  • the purified protein antigen is then injected directly into the vein to boost the immune response.
  • the antibody titer obtained is greater than 1:68, preferably 1:500.
  • the prepared antibody can be directly used for immunological tests for biological activity, functional, and clinical detection purposes, and may be used, but not limited to, ELISA, Westerten Blot, and the like.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • General Health & Medical Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Analytical Chemistry (AREA)
  • Biophysics (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • Immunology (AREA)
  • Biochemistry (AREA)
  • Biotechnology (AREA)
  • Biomedical Technology (AREA)
  • Microbiology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Urology & Nephrology (AREA)
  • Physics & Mathematics (AREA)
  • Hematology (AREA)
  • Wood Science & Technology (AREA)
  • Pathology (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Cell Biology (AREA)
  • Toxicology (AREA)
  • General Engineering & Computer Science (AREA)
  • Food Science & Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Cardiology (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Endocrinology (AREA)
  • General Chemical & Material Sciences (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Animal Behavior & Ethology (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)

Abstract

La présente invention concerne une méthode d'identification d'un nouveau gène au moyen d'analyses bioinformatiques, d'une technique de prévision par simulation par ordinateur et d'une technique biologique moléculaire. Plus particulièrement, à l'aide de données de séquences du génome humain, des moyens informatiques d'analyse et de prévision sont obtenus par auto-programmation et par réalisation d'analyses. Ainsi, une série de nouveaux gènes sont identifiés et préparés. Ces nouveaux gènes, appelés BFC06016 et BFC06104, sont similaires à l'apolipoprotéine AIBP et leurs numéros d'accès dans GenBank sont respectivement DQ778079 et DQ778080. Les gènes et leurs protéines codées peuvent être associés au métabolisme du cholestérol dans le corps et peuvent être utilisés comme cibles candidates de médicaments dans le diagnostic et le traitement d'une maladie cardiovasculaire humaine.
PCT/CN2007/070153 2006-06-21 2007-06-21 Méthode d'identification d'un nouveau gène et nouveaux gènes résultants Ceased WO2008000186A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA2007800202904A CN101460625A (zh) 2006-06-21 2007-06-21 发现新基因的方法以及所发现的新基因

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200610089339.9 2006-06-21
CNA2006100893399A CN1884521A (zh) 2006-06-21 2006-06-21 发现新基因的方法和使用的计算机系统平台以及新基因

Publications (2)

Publication Number Publication Date
WO2008000186A1 true WO2008000186A1 (fr) 2008-01-03
WO2008000186A8 WO2008000186A8 (fr) 2009-07-09

Family

ID=37582826

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2007/070153 Ceased WO2008000186A1 (fr) 2006-06-21 2007-06-21 Méthode d'identification d'un nouveau gène et nouveaux gènes résultants

Country Status (2)

Country Link
CN (2) CN1884521A (fr)
WO (1) WO2008000186A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109785900A (zh) * 2018-12-12 2019-05-21 上海派森诺生物科技股份有限公司 一种基于蛋白序列相似度的微生物群落功能基因分析方法
CN110033826A (zh) * 2018-12-10 2019-07-19 上海派森诺生物科技股份有限公司 一种应用于宏病毒组高通量测序数据的分析方法
CN111199772A (zh) * 2019-12-27 2020-05-26 上海派森诺生物科技股份有限公司 一种基于二代测序的pedv病毒基因组分析方法
CN112750501A (zh) * 2020-12-29 2021-05-04 上海派森诺生物科技股份有限公司 一种宏病毒组流程的优化分析方法
CN119832988A (zh) * 2024-12-23 2025-04-15 济南迪杰信息技术有限公司 一种生物试剂功能预测方法及系统

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1884521A (zh) * 2006-06-21 2006-12-27 北京未名福源基因药物研究中心有限公司 发现新基因的方法和使用的计算机系统平台以及新基因
CN101930502B (zh) * 2010-09-03 2011-12-21 深圳华大基因科技有限公司 表型基因的检测及生物信息分析的方法及系统
CN103186716B (zh) * 2011-12-29 2017-02-08 上海生物信息技术研究中心 基于元基因组学的未知病原快速鉴定系统及分析方法
CN105095623B (zh) * 2014-05-13 2017-11-17 中国人民解放军总医院 疾病生物标志物的筛选分析方法、平台、服务器及系统
CN110019155B (zh) * 2017-09-30 2023-04-07 山西医科大学 microRNA组学数据扰动平台
CA3100607A1 (fr) * 2018-05-23 2019-11-28 Envisagenics, Inc. Systemes et procedes d'analyse d'epissage alternatif
US20210225455A1 (en) * 2018-08-15 2021-07-22 Zymergen Inc. Bioreachable prediction tool with biological sequence selection
EP3867400A4 (fr) * 2018-10-17 2022-07-27 Quest Diagnostics Investments LLC Système de sélection de séquençage génomique
JP7435821B2 (ja) * 2020-11-12 2024-02-21 日本電信電話株式会社 学習装置、心理状態系列予測装置、学習方法、心理状態系列予測方法、及びプログラム

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004086568A (ja) * 2002-08-27 2004-03-18 Hitachi Ltd 新規遺伝子の作製方法及びプログラム
US20060069512A1 (en) * 1999-04-15 2006-03-30 Andrey Rzhetsky Gene discovery through comparisons of networks of structural and functional relationships among known genes and proteins
CN1884521A (zh) * 2006-06-21 2006-12-27 北京未名福源基因药物研究中心有限公司 发现新基因的方法和使用的计算机系统平台以及新基因

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060069512A1 (en) * 1999-04-15 2006-03-30 Andrey Rzhetsky Gene discovery through comparisons of networks of structural and functional relationships among known genes and proteins
JP2004086568A (ja) * 2002-08-27 2004-03-18 Hitachi Ltd 新規遺伝子の作製方法及びプログラム
CN1884521A (zh) * 2006-06-21 2006-12-27 北京未名福源基因药物研究中心有限公司 发现新基因的方法和使用的计算机系统平台以及新基因

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DATABASE GENBANK [online] Database accession no. (EAW84827) *
DATABASE GENBANK [online] Database accession no. (XP_001117298) *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110033826A (zh) * 2018-12-10 2019-07-19 上海派森诺生物科技股份有限公司 一种应用于宏病毒组高通量测序数据的分析方法
CN110033826B (zh) * 2018-12-10 2023-08-08 上海派森诺生物科技股份有限公司 一种应用于宏病毒组高通量测序数据的分析方法
CN109785900A (zh) * 2018-12-12 2019-05-21 上海派森诺生物科技股份有限公司 一种基于蛋白序列相似度的微生物群落功能基因分析方法
CN109785900B (zh) * 2018-12-12 2023-05-23 上海派森诺生物科技股份有限公司 一种基于蛋白序列相似度的微生物群落功能基因分析方法
CN111199772A (zh) * 2019-12-27 2020-05-26 上海派森诺生物科技股份有限公司 一种基于二代测序的pedv病毒基因组分析方法
CN111199772B (zh) * 2019-12-27 2023-05-23 上海派森诺生物科技股份有限公司 一种基于二代测序的pedv病毒基因组分析方法
CN112750501A (zh) * 2020-12-29 2021-05-04 上海派森诺生物科技股份有限公司 一种宏病毒组流程的优化分析方法
CN112750501B (zh) * 2020-12-29 2024-04-02 上海派森诺生物科技股份有限公司 一种宏病毒组流程的优化分析方法
CN119832988A (zh) * 2024-12-23 2025-04-15 济南迪杰信息技术有限公司 一种生物试剂功能预测方法及系统
CN119832988B (zh) * 2024-12-23 2025-10-28 济南迪杰信息技术有限公司 一种生物试剂功能预测方法及系统

Also Published As

Publication number Publication date
CN101460625A (zh) 2009-06-17
WO2008000186A8 (fr) 2009-07-09
CN1884521A (zh) 2006-12-27

Similar Documents

Publication Publication Date Title
WO2008000186A1 (fr) Méthode d'identification d'un nouveau gène et nouveaux gènes résultants
US7560541B2 (en) Heart20049410 full-length cDNA and polypeptides
US20020156773A1 (en) Soluble HLA ligand database utilizing predictive algorithms and methods of making and using same
JP2001512011A (ja) 組織非特異的分泌タンパク質の5’est
WO1993016178A2 (fr) Sequences caracteristiques du produit de transcription des genes humains
JP2001512016A (ja) 筋肉およびその他中胚葉組織中で発現される分泌タンパク質の5’est
JP2001512012A (ja) 精巣およびその他の組織で発現する分泌タンパク質の5’est
KR20090053893A (ko) 전사를 증가시키기 위한 기질부착부위(mars) 및 그의 용도
JP2001512014A (ja) 脳組織から同定された分泌タンパク質の5’est
Hahn et al. Duplication and extensive remodeling shaped POTE family genes encoding proteins containing ankyrin repeat and coiled coil domains
JP2001512005A (ja) 内胚葉において発現する分泌タンパク質の5’est
JP2002525024A (ja) 種々の組織中で発現される分泌タンパク質の5’est
Hsu et al. Discovering new hormones, receptors, and signaling mediators in the genomic era
JP2007536892A (ja) 分泌タンパク質ファミリー
Bischof et al. Genome-wide analysis of gene transcription in the hypothalamus
Yoder et al. BIVM, a novel gene widely distributed among deuterostomes, shares a core sequence with an unusual gene in Giardia lamblia
JP2003529371A (ja) ヒトセリンラセマーゼ
Duncan et al. Molecular characterisation and expression of CD4 in two distantly related marsupials: the gray short-tailed opossum (Monodelphis domestica) and tammar wallaby (Macropus eugenii)
WO2001007607A2 (fr) Clones d'adnc complet et proteines pour lesquelles ils codent
EP0695802A2 (fr) Récepteur analogue au récepteur humain de la caticotropine
JP2003506074A (ja) 薬剤標的同質遺伝子:免疫グロブリンe受容体ialphaサブユニット遺伝子における多型
JP2003506070A (ja) 薬剤標的同質遺伝子:5−ヒドロキシトリプタミン受容体1a遺伝子における多型
WO2003091435A1 (fr) Nouvelles proteines et differents adn les codant
JP2003520599A (ja) Abc1パラログをコードする遺伝子及びそれに由来するポリペプチド
US20030165902A1 (en) Haplotypes of the F2R gene

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200780020290.4

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07721771

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

NENP Non-entry into the national phase

Ref country code: RU

122 Ep: pct application non-entry in european phase

Ref document number: 07721771

Country of ref document: EP

Kind code of ref document: A1