[go: up one dir, main page]

WO2024125260A1 - Dna information storage method based on natural and non-natural bases - Google Patents

Dna information storage method based on natural and non-natural bases Download PDF

Info

Publication number
WO2024125260A1
WO2024125260A1 PCT/CN2023/133791 CN2023133791W WO2024125260A1 WO 2024125260 A1 WO2024125260 A1 WO 2024125260A1 CN 2023133791 W CN2023133791 W CN 2023133791W WO 2024125260 A1 WO2024125260 A1 WO 2024125260A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
dna
dna information
data information
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2023/133791
Other languages
French (fr)
Chinese (zh)
Inventor
梅辉
王钰
黄小罗
戴俊彪
熊成鹤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Publication of WO2024125260A1 publication Critical patent/WO2024125260A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present invention belongs to the technical field of data storage, and in particular relates to a DNA information storage method based on natural and non-natural bases.
  • DNA deoxyribonucleic acid
  • the current method does not make full use of the characteristics of bases, especially non-natural bases, and the coding density is limited to 2 bits/nt; at the same time, based on the current method, when storing information, each time the information is stored, the base needs to be synthesized from scratch, which is costly.
  • Patent CN113066534A proposes to use quaternary encoding encoded by four ATCG bases to read data on a biochip, and use a method of sequencing while synthesizing. This method measures the DNA sequence on the chip to read information.
  • the biochip can be stored for a long time at low temperatures, but it still cannot solve the error problem caused by sequencing.
  • DNA information storage technology mainly relies on DNA synthesis and sequencing technology, and some errors are often introduced during the synthesis and sequencing process, which makes it difficult to decode the information.
  • synthesis and sequencing also have certain restrictions on DNA sequences, such as the sequence to be synthesized cannot contain continuous repeated bases (i.e., homopolymer length ⁇ 4), the GC content is controlled at 40% to 60%, and the sequences cannot complement each other to form secondary structures.
  • both the writing and reading of information are subject to various technical limitations.
  • the current writing of information still remains in the mode of using only natural DNA encoding, and the characteristics of non-natural bases are not fully utilized, and the encoding density is limited; in the process of oligo synthesis, the length of the synthesis is limited to 200bp and there cannot be more than 3bp of homopolymers in the synthesized sequence, and there are more restrictions in the sequencing process, and a large number of repeated sequences cannot appear.
  • the sequencer itself has a large sequencing error; and the cost of synthesizing DNA from scratch is high.
  • the above problems not only limit the randomness of information encoding, but also distort the results, resulting in a situation where decoding cannot be performed.
  • the present invention is based on natural and non-natural bases, and utilizes the advantages of mass spectrometry sequencing analysis to establish a set of DNA information storage methods.
  • This storage process does not require the step of synthesis, so there are no various restrictions on sequence synthesis. It can not only meet the needs of information storage, but also break the inherent framework of existing DNA information storage.
  • the encoding density is high, providing a new direction for the development of DNA information storage technology.
  • One aspect of the present invention provides a DNA storage method based on natural and unnatural bases, the DNA storage method comprising the following steps:
  • step S13 splitting the data information in step S11) into data information units, and confirming the DNA information corresponding to the split data information units in the coding table obtained in step S12);
  • the DNA information is a deoxyribonucleotide or a polynucleotide sequence consisting of two or more deoxyribonucleotides, or a combination of any two of the deoxyribonucleotides or the polynucleotide sequence consisting of two or more deoxyribonucleotides;
  • the molecular weight of each DNA information is different, and the molecular weight difference between each is no less than 10.
  • the computer data information is text, numbers, pictures, videos, programs, and audio.
  • the data information is character information, RGB information, binary data information, octal data information, hexadecimal data information, or decimal data information.
  • the molecular weight of each DNA information is different, and the difference in molecular weight between each pair is no less than 10.
  • the deoxyribonucleotide is a natural deoxyribonucleotide or a non-natural deoxyribonucleotide
  • the non-natural deoxyribonucleotide is a base-modified deoxyribonucleotide.
  • deoxyribonucleotides in the polynucleotide sequence composed of two or more deoxyribonucleotides are natural deoxyribonucleotides or non-natural deoxyribonucleotides.
  • polynucleotide sequence composed of two or more deoxyribonucleotides has different molecular weights by adjusting the combination of the types and quantities of deoxyribonucleotides in the polynucleotide sequence.
  • the 32-1024 mapping relationships formed in the coding table are, for example, 32, 64, 128, 256, 512, and 1024.
  • the data information unit is to divide the data information into different units for recording information.
  • the data information unit is RGB pixel information that can correspond to a 4-bit or 8-bit binary number, a single character, or a single pixel.
  • the bases in the nucleic acid sequence are modified or unmodified.
  • the modified bases are at least one of DBCO modification, AMCA modification, thio modification, amino modification, biotin modification, digoxigenin modification, phosphate group, and sulfhydryl group.
  • polynucleotide sequences there are 32 types of polynucleotide sequences in the coding table, and 128 different combined polynucleotide sequences are formed by combining two polynucleotide sequences.
  • the coding table maps DNA information to 4-bit or 8-bit binary numbers one by one.
  • the coding table maps DNA information to the ASCII code table one by one.
  • the coding table maps DNA information to 128 characters one by one.
  • the coding table maps DNA information to RGB pixel information one by one.
  • the DNA information in the coding table is 128 or 256 different nucleotides with modified bases, and 32 or 64 different types of modifications are performed on A, T, C, and G, respectively, to obtain a total of 128 or 256 different nucleotides with modified bases.
  • one type of DNA information is mapped one-to-one with the color information in the RGB pixel information, namely R ⁇ G ⁇ B, and the other type of DNA information is mapped one-to-one with the numbers 0-255 in the color information of the RGB pixel information, and the combination of the two types of DNA information forms a one-to-one mapping relationship with the RGB pixel information.
  • the coding table comprises 128 combined polynucleotide sequences, the lengths of the polynucleotide sequences are divided into 8 groups, the lengths of the 8 groups of polynucleotide sequences are successively extended, each comprising 10-24 base nucleotides, and each group comprises 4 polynucleotide sequences, and the types and or quantities of bases in the 4 polynucleotide sequences in each group are different.
  • Another aspect of the present invention provides a method for reading information from a DNA information storage carrier obtained by the above DNA storage method, the information reading method comprising the following steps:
  • step S23 interpreting the data information unit according to the DNA information obtained in step S22) and the coding table in step S12);
  • the mass spectrometry detection method is MALDI mass spectrometry sequencing.
  • step S21 the method of mass spectrometry detection comprises the following steps:
  • the purification method in step S211) is ethanol precipitation, microdialysis or MillporeZiptip microlayer Analysis.
  • computer data information is data information that can exist on a computer, preferably selected from pictures, texts, programs, audio, and video.
  • Another aspect of the present invention provides a method for storing and decoding DNA information based on natural and unnatural bases, the method comprising:
  • the DNA storage method based on natural and unnatural bases as described above comprises the following steps:
  • the DNA information is a deoxyribonucleotide or a polynucleotide sequence consisting of two or more deoxyribonucleotides, or a combination of any two of the deoxyribonucleotides or the polynucleotide sequence consisting of two or more deoxyribonucleotides;
  • the molecular weight of each DNA information is different, and the molecular weight difference between each is no less than 10;
  • the information reading method comprising the following steps:
  • step S23 interpreting the data information unit according to the DNA information obtained in step S22) and the coding table in step S12);
  • Another aspect of the present invention provides a DNA information storage device based on natural and unnatural bases, the device comprising:
  • a data information extraction unit used for extracting the computer data to be stored and converting the computer data to be stored into data information corresponding to the information
  • a data information and DNA information conversion unit used to split or assemble the data information sequence and convert it into DNA information according to a preset mapping relationship
  • a synthesis and storage unit used to synthesize the data information and the nucleic acid sequence converted by the DNA information conversion unit, and store the deoxyribonucleotides corresponding to the DNA information, a polynucleotide sequence consisting of two or more deoxyribonucleotides, or a combination thereof in order on different wells of the storage unit chip;
  • mapping relationship is a one-to-one mapping relationship between DNA information and data information units
  • the DNA information is a deoxyribonucleotide or a polynucleotide sequence consisting of two or more deoxyribonucleotides, or a combination of any two of the deoxyribonucleotides or the polynucleotide sequence consisting of two or more deoxyribonucleotides;
  • the molecular weight of each DNA information is different, and the molecular weight difference between each pair is no less than 10.
  • the data information and DNA information conversion unit may include a DNA information encoding unit, a DNA information and data information matching unit, and a DNA information information conversion unit.
  • the DNA information encoding unit is used to record the combination of base types and quantities corresponding to each type of DNA information.
  • the DNA information and data information matching unit is used to call different DNA information in the DNA information encoding unit to match and correspond to the data information units one by one.
  • the DNA information conversion unit is used to convert the digital information in the data information extraction unit into DNA information one by one according to the information of the DNA information and data information matching unit.
  • Another aspect of the present invention provides a decoding device based on natural and non-natural base nucleic acid storage, the device include:
  • a reading unit used to detect the sequence to be tested stored in the synthesis and storage unit by a mass spectrometer, and confirm its DNA information according to the molecular weight
  • a DNA information and data information conversion unit used to convert the DNA information obtained by the reading unit into data information according to a preset mapping relationship, that is, the same data information and DNA information mapping relationship in the DNA information storage device;
  • a computer data output unit used to convert the DNA information and the data information obtained by the data information conversion unit into stored computer data
  • the reading module includes a mass spectrometer for performing mass spectrometry detection on the nucleic acid sequence in each well position in the chip for detecting stored information, and may also include a unit for purifying and/or enzymatically hydrolyzing the nucleic acid sequence in each well position.
  • the present invention provides a computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the steps of implementing the above-mentioned DNA storage method based on natural and non-natural bases or the information reading method of obtaining a DNA information storage carrier using the above-mentioned storage method are implemented.
  • the present invention provides a computer device, including a memory and a processor, wherein the memory stores a computer program that can be run on the processor, and when the processor executes the computer program, it implements the steps of the above-mentioned DNA storage method based on natural and non-natural bases or the above-mentioned DNA storage method to obtain the information reading method of the DNA information storage carrier.
  • the scheme of the present invention can use any modified base or unmodified base and unnatural base pairs. And as the number of available bases increases, the efficiency of DNA storage is also enhanced.
  • each base can encode 8-bit binary code, so the logical encoding capacity of the present invention can reach 8bit/nt, which has broken through the theoretical limit of 2bit/nt of 4-base encoding.
  • the encoding method of the present invention is designed according to the type and number of bases, without considering issues such as repetition and secondary structure in the sequence.
  • the present invention solves a series of problems caused by synthesis and sequencing from the source, directly abandons these two technologies, and instead adopts a method of fixed-point base storage and mass spectrometer detection.
  • the prior art uses four natural bases to directly map to quaternary codes, but the coding efficiency is still low.
  • the present invention can encode any natural and non-natural bases, and base combinations of different numbers and lengths can also be encoded as a variable of the code, which solves the limitations of the coding efficiency and coding method.
  • the coding efficiency and the degree of freedom of coding are greatly improved. Since there are many types of commercially available non-natural bases and they are highly commercialized, they can fully meet the coding requirements.
  • the storage and reading method of the present invention abandons the practice of using a sequencer to identify the nucleic acid sequence of the recorded information in the prior art, and innovatively uses a mass spectrometer to read the type of base, overcoming the errors and limitations brought by the sequencing process.
  • Different molecular weight chain types can be formed by combining different bases in different ratios.
  • FIG. 1 is a schematic diagram of a DNA storage method based on natural and unnatural base matrix spectrum decoding according to the present invention.
  • Example 2 is a schematic diagram of a direct-encoded DNA storage method based on natural and unnatural base matrix spectrum decoding according to Example 1 of a specific embodiment of the present invention.
  • FIG. 3 is a schematic diagram of an information reading method of a DNA information storage carrier of the present invention.
  • FIG4 is a schematic diagram of mass spectrometry detection according to the present invention.
  • FIG5 is a schematic diagram of the structure of an encoding device provided in Embodiment 4 of the present invention.
  • FIG6 is a schematic diagram of the structure of a decoding device provided in Embodiment 4 of the present invention.
  • FIG. 7 is a schematic diagram of the structure of a terminal device provided in Embodiment 4 of the present invention.
  • FIG8 is an overall technical flow chart of the present invention.
  • FIG. 9 is a schematic diagram of a picture to be encoded according to the present invention.
  • references to "one embodiment” or “some embodiments” etc. described in the specification of this application mean that one or more embodiments of the present application include specific features, structures or characteristics described in conjunction with the embodiment. Therefore, the statements “in one embodiment”, “in some embodiments”, “in some other embodiments”, “in some other embodiments”, etc. that appear in different places in this specification do not necessarily refer to the same embodiment, but mean “one or more but not all embodiments", unless otherwise specifically emphasized in other ways.
  • the terms “including”, “comprising”, “having” and their variations all mean “including but not limited to”, unless otherwise specifically emphasized in other ways.
  • the present invention innovatively establishes a DNA storage method based on natural and non-natural bases, replaces the sequencing method with a mass spectrometry method, and subverts the traditional information storage and decoding method. Moreover, the storage process of the present invention does not require too much synthesis, or even the steps of synthesis, so there are no various limitations on sequence synthesis.
  • the method of the present invention is described in the form of an embodiment in conjunction with the accompanying drawings.
  • Example 1 DNA storage method based on natural and unnatural base matrix spectrum decoding:
  • the computer data information to be stored may be data in any format, such as text, numbers, pictures, videos, programs, audio, etc.
  • the data information may be RGB information, binary data information, octal data information, hexadecimal data information, decimal data information, or character information.
  • the computer data information to be stored is converted into digital information by any method known in the art.
  • the image information can be converted into RGD information.
  • the RGD information is RGD pixel information, and the image information is converted into RGD pixel information of different pixel points.
  • the RGD pixel information consists of color types, namely R ⁇ G ⁇ D information and color intensity 0-255.
  • the information to be stored is character information
  • the DNA information is a deoxyribonucleotide or a polynucleotide sequence consisting of two or more deoxyribonucleotides, or a combination of any two of the deoxyribonucleotides or the polynucleotide sequence consisting of two or more deoxyribonucleotides;
  • the molecular weight of each DNA information is different, and the molecular weight difference between each is no less than 5.
  • the molecular weight difference between two DNA information is greater than 10, for example, any number between 10-100.
  • the 32-1024 mapping relationships in the coding table are, for example, 32, 64, 128, 256, 512, and 1024.
  • the data information in the DNA information and data information coding table is exemplarily 4-bit binary data or 8-bit binary data, and is the same as the number of DNA information in the coding table, forming a one-to-one mapping relationship.
  • the DNA information and the data information in the data information coding table are exemplarily character information, and the number is the same as the DNA information in the coding table, forming a one-to-one mapping relationship.
  • the data information in the DNA information and data information coding table is illustratively RGB pixel information, and forms a one-to-one mapping relationship with the DNA information in the coding table.
  • the DNA information may be natural deoxyribonucleotides or non-natural deoxyribonucleotides, and the non-natural deoxyribonucleotides are base-modified deoxyribonucleotides.
  • the modified base refers to at least one or a combination of two or more of DBCO modification, AMCA modification, thio modification, amino modification, biotin modification, digoxigenin modification, phosphate group, sulfhydryl group, amino group, NHBOC modification, Fmoc modification, carboxylic acid modification, Mal modification, NHS modification, azide modification, Cy3/Cy5/Cy7 modification, THP modification, benzyl modification, propynyl modification, bromine modification, tert-butyl propionate modification, tert-butyl acetate modification, methyl modification, biotin modification, pentafluorophenol modification, and sulfonate modification.
  • the unnatural base is selected from 2-aminoadenin-9-yl, 2-aminoadenine, 2-F-adenine, 2-thiouracil, 2-thiothymine, 2-thiocytosine, 2-propyl and alkyl derivatives of adenine and guanine, 2-amino-adenine, 2-amino-propyl-adenine, 2-aminopyridine, 2-pyridone, 2'-deoxyuridine, 2-amino-2'-deoxyadenosine, 3-deazaguanine, 3-deazaadenine, 4-thio Uracil, 4-thiothymine, uracil-5-yl, hypoxanthine-9-yl (I), 5-methyl-cytosine, 5-hydroxymethylcytosine, xanthine, hypoxanthine, 5-bromo- and 5-trifluoromethyl-uracil and cytosine; 5-halouracil, 5-halocytosine, 5-propynyl-urac
  • the deoxyribonucleotides in the polynucleotide sequence composed of two or more deoxyribonucleotides in the DNA information are natural deoxyribonucleotides or non-natural deoxyribonucleotides.
  • the polynucleotide sequence composed of two or more deoxyribonucleotides is made to have different molecular weights by adjusting the combination of the types and quantities of deoxyribonucleotides in the polynucleotide sequence. For example, a combination encoding containing four natural bases is used; or a mixed encoding of natural and non-natural bases is used.
  • the present invention does not require sequencing during the decoding process but adopts mass spectrometry for determination, the different masses of the chain types composed of ribonucleotides or deoxyribonucleotides in the chain types can be distinguished by mass spectrometry data.
  • the coding table contains 128 combined polynucleotide sequences, the lengths of the polynucleotide sequences are divided into 8 groups, the lengths of the 8 groups of polynucleotide sequences are successively extended, each containing 10-24 base nucleotides, 4 chain types in each group, and the base types and or quantities in the 4 chain types in each group are different.
  • deoxyribonucleotides can be modified with different modification groups.
  • different modification groups for example, 32 modifications are performed on each deoxyribonucleotide, and 128 different deoxyribonucleotides can be obtained, while 64 modifications can be used to obtain 256 different deoxyribonucleotides. If the number of nucleotides is increased, the number of encoded types can be doubled on the basis of the above.
  • a combination of two polynucleotides, a combination of two non-natural deoxyribonucleotides, or a combination of a polynucleotide and a non-natural deoxyribonucleotide may also be used.
  • the types of DNA information in the coding table can be increased in the form of combinations, thereby improving coding efficiency. For example, by preparing 32 different nucleotide sequences and combining them two by two, a maximum of 1024 combinations can be obtained, of which 128 or 256 can be selected for designing the coding table.
  • the DNA information in the coding table can be confirmed by the following method.
  • a1a1 corresponds to A4T2C2G2.
  • the number of nucleotides in a polynucleotide sequence can be adjusted, and if the upper limit of the number range selected for the number of polynucleotide sequences is smaller, the number of bases required will be less.
  • the above exemplary scheme selects 4 bases, and if the types of bases involved in coding are increased, the number of bases required will also be sharply reduced.
  • the polynucleotide sequence is mapped to the data information one by one, and a coding table is formed.
  • a coding table is formed.
  • the polynucleotide sequence described in Table 1 above i.e., DNA information and 8-bit binary information are mapped one by one to form a DNA information and binary data coding table encoding 128 kinds of information.
  • Four vertical digits and four horizontal digits of binary numbers together form 8-bit binary numbers.
  • Each group of 8-bit binary numbers corresponds to a group of DNA information, as shown in Table 2:
  • 00000000 corresponds to a1a1
  • the polynucleotide sequence combination corresponding to a1a1 is a sequence composed of A8T4C4G4.
  • four deoxyribonucleotides can be modified with different modification groups. Under the action of different modification groups, for example, 32 modifications are performed on each deoxyribonucleotide, then 128 different deoxyribonucleotides can be obtained, and 64 modifications can be used to obtain 256 different deoxyribonucleotides.
  • the nucleotides with modified bases are matched with the data information and a coding table is formed.
  • A1 represents the first modified deoxyadenine nucleotide, and so on, A, T, C, G+numbers represent the Nth modified deoxyribonucleotides.
  • 32 ribonucleotides with different modified bases can also be selected, with a total of 128 different modified bases, which are directly matched with the characters, and the encoding table is shown in Table 4 below:
  • direct coding can also be used to directly map each combination with a character, so as to achieve direct conversion of a text file into a DNA information file.
  • the DNA information in the coding table of Table 4 is used, and it is mapped one by one with the 128 types of character information in the ASII table to form a coding table, and the character information in the text can be directly converted into DNA information using the coding table.
  • step S13 splitting the data information in step S11), and confirming the DNA information corresponding to the split data information in sequence with the DNA information obtained in step S12) and the data information coding table.
  • the data information mapped in the above DNA information and data information encoding table is an 8-bit binary number
  • the data information in step S11) is split into 8-bit binary numbers and mapped in the above DNA information and data information encoding table in sequence. Find the DNA information corresponding to 8-bit binary data.
  • step S14 obtaining the chain types confirmed in step S13) and sequentially arranging different wells of the chip to obtain a DNA information storage carrier;
  • the method for obtaining nucleotide or polynucleotide sequence can be directly using commercially available nucleotides or synthesizing for different polynucleotide sequence types. It can also be synthesized and stored on a large scale according to the types in the coding table in step S12), and extracted during storage.
  • the confirmed nucleotide or polynucleotide sequences can be directly arranged in order in different wells of the chip without further connection, which can reduce the number of synthesis steps.
  • the method of the present invention is demonstrated below with several specific data to be stored.
  • the first specific implementation case is storing English words and characters: Hello world!
  • the information to be stored is the string "Hello world!”.
  • the 12 characters in the string are converted into 12 8-bit binary numbers in sequence, and then these binary numbers are converted into different polynucleotide sequence combinations according to the coding tables in Table 1 and Table 2.
  • the chain types corresponding to different characters are synthesized and placed in different wells of the chip in sequence.
  • the chip containing all the information is the storage medium for the above information.
  • the above characters can also be encoded in the form of direct encoding, as shown in Figure 2, by mapping the characters with different DNA sequences or ribonucleotides of non-natural bases with different modification groups, as shown in Table 4 above, and encoded and stored.
  • the information to be stored is the text file "wssnt10.txt" encoded in Goldman's 2012 article "Towards practical, high-capacity, low-maintenance information storage in synthesized DNA”.
  • the wssnt10.txt is as follows:
  • the above 107,738-byte text is directly encoded and converted into a base combination information file.
  • the polynucleotide combination corresponding to "! is e1e2, that is, the base sequence is "4A4T4C6G4A4T4C6G".
  • an exemplary fragment is as follows:
  • a biochip is prepared for DNA information storage.
  • the information to be stored is a picture.
  • the generated binary file is encoded according to the encoding tables in Tables 1 and 2 above.
  • the polynucleotide combination corresponding to "0100 0001" is e1e2, that is, the base sequence is "4A4T4C6G4A4T4C6G".
  • the base encoding method the first three specific implementation cases selected four natural bases ATCG for demonstration, but those skilled in the art can choose as needed when using the method of the present invention for encoding, which can be a polynucleotide sequence, or even a combination thereof, or a single non-natural base deoxyribonucleotide, not limited to natural base deoxyribonucleotides.
  • the present invention innovatively uses mass spectrometry as a detection method, which can not only distinguish natural bases but also identify non-natural bases, achieving a function that DNA sequencing cannot achieve.
  • the fourth specific implementation case is demonstrated by storing the first chapter of the original version of Pride and Prejudice.
  • the text file size is 4,501 bytes.
  • the binary file size generated by the binary-encoded text conversion is 36,008 bytes, as shown in the following excerpt:
  • the nucleotides corresponding to the encoded DNA information are stored in different wells of the biochip according to their sequence.
  • each nucleotide can encode an 8-bit binary code, so the logical encoding capacity of the present invention can reach 8 bits/nt, which has broken through the theoretical limit of the existing four-base encoding.
  • the information to be stored in the fifth embodiment is still the text of Chapter 1 of Pride and Prejudice, but unlike the fourth embodiment, in this embodiment, the text information is not converted into binary data, but the characters in the text information are directly used as the information data to be encoded.
  • the encoded nucleotides are sequentially stored in different wells of the chip to obtain a storage medium.
  • the information to be stored is a picture file.
  • the picture information to be stored is shown in FIG9 .
  • the original picture in FIG9 is a color picture. Its RGB information is as follows:
  • the image information is directly encoded and converted into nucleotide information.
  • the encoded sequence information is as follows:
  • the nucleotides are sequentially stored in different wells of the chip to obtain a storage medium.
  • the image information can also be divided into different pixel points according to the pixels, and the RGB pixel information of different pixel points is used as the data information.
  • a coding table is further designed, which contains DNA information corresponding to different colors, namely RGB, at a depth of 0-255, and is encoded and stored in this way.
  • Example 2 DNA information decoding method based on natural and non-natural base matrix spectrum decoding
  • the present invention uses MALDI mass spectrometry sequencing to read out the base combination in the chain type at each position on the chip.
  • nucleotide or polynucleotide sequences or their combinations are placed in different chip wells, and mass spectrometry detection is performed on the sequences to be tested in different wells, and the peaks of the substances are ejected in the order of different wells.
  • the mass spectrometry results show the molecular weight of the sequence to be tested and its fragment peaks after bombardment. Based on these data, the type of DNA information to be sequenced in the coding table can be confirmed.
  • the purification method is ethanol precipitation, microdialysis or MillporeZiptip microchromatography.
  • the mass spectrometry sequencing method is MALDI mass spectrometry sequencing.
  • MALDI mass spectrometry sequencing can display RNA or DNA with different bases as peaks of different flight time sequences, identify the base combinations contained in these time sequence peaks, and directly translate these base combinations into Into their corresponding different nucleic acid sequences.
  • step S23 interpreting the data information unit according to the DNA information obtained in step S22) and the coding table in step S12);
  • the chain type and data information coding table in step S23) is the chain type and data information coding table obtained in step S12) in the above-mentioned DNA storage method based on natural and non-natural bases.
  • the corresponding binary number is confirmed in Table 2. Since every two chain types in Table 2 correspond to 8-bit binary numbers, the conversion is performed in a manner that every two serial chains correspond to one ASCII character to obtain all binary data;
  • the polynucleotide sequence combinations in the 12 different wells in the chip are first digested and purified, and then the specific nucleotide types and quantities in the chain types are obtained by MALDI mass spectrometry.
  • the type of polynucleotide sequence corresponding to each well can be confirmed, and the corresponding binary number can be confirmed according to the correspondence between the DNA information used in the encoding process and the data information encoding table.
  • the binary number is directly converted into the corresponding character, that is, the original data information, that is, the string Hello world!, is obtained.
  • the stored information is the text file "wssnt10.txt" encoded in Goldman's 2012 article "Towards practical, high-capacity, low-maintenance information storage in synthesized DNA”.
  • the polynucleotide sequences in different wells in the chip are digested and purified respectively, and then the specific nucleotide types and quantities in the chain types are obtained by MALDI mass spectrometry.
  • the type of DNA information corresponding to each well can be confirmed, and according to the corresponding relationship between the DNA information used in the encoding process and the data information encoding table, the different coding bases are converted into corresponding characters, that is, the original data information, that is, the text file "wssnt10.txt" is obtained.
  • the stored information is picture information.
  • the difference from the first two specific embodiments is that the obtained binary data is converted into picture information.
  • a fourth specific implementation scheme for the stored Pride and Prejudice Chapter 1: first purify the nucleotides with modified bases in different wells in the chip, and then obtain the specific types and quantities of nucleotides in the nucleotides with modified bases by MALDI mass spectrometry.
  • MALDI mass spectrometry results the type of nucleotides with modified bases corresponding to each well can be confirmed, and according to the correspondence between the nucleotides with modified bases used in the encoding process and the data information matching table, the corresponding binary number can be confirmed, and the binary number is directly converted into the corresponding character, that is, the original data information, namely Pride and Prejudice Chapter 1, can be obtained.
  • the stored information is Chapter 1 of Pride and Prejudice.
  • the nucleotides with modified bases in different wells in the chip are purified separately, and then the specific nucleotide types in the nucleotides with modified bases are obtained by MALDI mass spectrometry.
  • the type of nucleotides with modified bases corresponding to each well can be confirmed according to the MALDI mass spectrometry results, and the corresponding characters can be directly confirmed according to the correspondence between the nucleotides with modified bases and the data information matching table used in the encoding process, and spliced according to the characters to obtain the original computer data information.
  • the stored information is picture information. It is similar to the fourth and fifth specific embodiments, except that the obtained binary data is converted into picture information.
  • Example 3 Encoding device for decoding DNA storage based on natural and unnatural base matrix spectra
  • FIG5 shows a structural block diagram of the encoding device provided by embodiment 3 of the present invention.
  • FIG5 only shows the part related to embodiment 3 of the present invention.
  • the encoding device may include:
  • a data information extraction unit used for extracting the computer data to be stored and converting the computer data to be stored into data information corresponding to the information
  • a data information and DNA information conversion unit used to split or assemble the data information sequence and convert it into DNA information according to a preset mapping relationship
  • the synthesis and storage unit is used to synthesize the DNA sequence obtained by converting the data information with the DNA information conversion unit, and The DNA sequences are stored in different wells of the memory cell chip in sequence.
  • the data information extraction unit may include an information storage unit and a conversion unit, wherein the information storage unit can be used to store and call computer information to be stored, such as text, numbers, pictures, audio, video, etc.
  • the conversion unit can convert computer information into any digital information in a conventional way, such as characters, binary data information, octal data information, hexadecimal data information, decimal data information, RGB pixel information, etc.
  • the data information and DNA information conversion unit may include a DNA information encoding unit, a DNA information and data information matching unit, and a DNA information information conversion unit.
  • the DNA information encoding unit is used to record the combination of base types and quantities corresponding to each type of DNA information.
  • the DNA information and data information matching unit is used to call different DNA information in the DNA information encoding unit to match and correspond to the data information units one by one.
  • the DNA information conversion unit is used to convert the digital information in the data information extraction unit into DNA information one by one according to the information of the DNA information and data information matching unit.
  • only 4 deoxyribonucleotides with different bases can be selected to form at least 128 different chain types. That is, different numbers of deoxyribonucleic acids with different bases are combined.
  • the number of chain types can be enlarged or reduced by increasing the number of nucleotide types or adjusting the length of the chain types to meet the information storage requirements of different needs.
  • each deoxyribonucleotide is subjected to 32 or 64 different modifications, so as to form at least 128 or 256 different nucleotides with modified bases.
  • the number of nucleotides with modified bases can be enlarged or reduced to meet the information storage capacity of different requirements.
  • the synthesis and storage unit includes a synthesis unit and a storage unit, and the synthesis can obtain the DNA information for the storage information confirmed by the data information and the DNA information conversion unit.
  • the storage unit can store the DNA sequence that records the data information.
  • the storage unit is a chip with multiple wells, and each well accommodates a sequence. In other words, each well corresponds to one type of DNA information.
  • the wells on the chip are arranged in order.
  • Example 4 Decoding device for decoding nucleic acid storage based on natural and unnatural base matrix spectra
  • FIG6 shows a structural block diagram of a decoding device provided in embodiment 4 of the present invention.
  • FIG6 only shows the part related to embodiment 4 of the present invention.
  • the decoding device may include:
  • a reading unit used to detect the sequence to be tested stored in the synthesis and storage unit by a mass spectrometer, and confirm its DNA information according to the molecular weight
  • a DNA information and data information conversion unit used to convert the DNA information obtained by the reading unit into data information according to a preset mapping relationship, that is, the same data information and DNA information mapping relationship in the DNA information storage device;
  • the reading module includes a mass spectrometer for performing mass spectrometry detection on the nucleic acid sequence in each well position in the chip for detecting stored information, and may also include pre-treatment such as purification and/or enzymatic hydrolysis of the nucleic acid sequence in each well position.
  • a computer data output unit used to convert the DNA information and the data information obtained by the data information conversion unit into stored computer data
  • Fig. 7 is a schematic diagram of the structure of a computer device provided by an embodiment of the present invention.
  • the computer device of this embodiment includes: at least one processor (only one is shown in Fig. 7), storage, and a computer program stored in the memory and executable on the at least one processor, and when the processor executes the computer program, any storage method and decoding method of the present invention are implemented.
  • the computer device may be a computing device such as a laptop computer, a desktop computer, a tablet computer, a mobile phone, etc.
  • the computer device includes at least a processor and a memory.
  • FIG. 7 is only a schematic diagram of a computer device and does not constitute a limitation on the computer device, and may also include other components, such as information input or output components.
  • An embodiment of the present application further provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps in the above-mentioned method embodiments can be implemented.
  • the computer-readable storage medium may also be any available medium or data storage device that can be accessed by a computer, such as a server or data center integrated with the medium.
  • the available medium may be a magnetic medium, a DVD, or a semiconductor medium.
  • An embodiment of the present invention provides a computer program product.
  • the terminal device can implement the steps in the above-mentioned method embodiments when executing the computer program product.
  • the terminal device may be a general-purpose computer, a handheld computer, a mobile phone, a special-purpose computer, a computer network, or other programmable devices, or storage devices with programming functions.
  • the computer program may be stored in a computer-readable storage medium, or transmitted to another computer-readable storage medium via a network.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the systems, devices and methods described in the present application may also be implemented in other ways.
  • the device embodiments described above are merely illustrative, and for example, the division of the functional units may be re-divided according to actual needs without affecting the satisfaction or completion of the above-mentioned functions and steps of the present invention.
  • multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed.
  • the units of the above-mentioned devices may be merged or re-divided according to the storage and decoding methods, and additional functional units may be added according to actual needs to meet the requirements of the above-mentioned steps and methods.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention relates to a DNA information storage method based on natural and non-natural bases, comprising the following steps: extracting data information to be stored; designing a coding table, wherein the coding table is formed by mapping DNA information and data information units; according to split data information, sequentially confirming, in the coding table, DNA information corresponding to the split data information; sequentially arranging the DNA information in hole sites of a physical medium according to a sequence to obtain a DNA information storage carrier; and during reading, reading the DNA information in a mass spectrum mode and decoding the DNA information. According to the storage and reading methods, natural and non-natural bases can be fully utilized, q-ary coding is implemented, and the density is expected to be greater than 8 bits/nt; furthermore, a trouble that DNA needs to be synthesized from the beginning for information storage is avoided; additionally, the method of identifying recorded information by using a sequencer is further discarded, the type of the base is innovatively read by using a mass spectrometer, and errors and limitations caused by a sequencing process are overcome.

Description

一种基于天然和非天然碱基的DNA信息存储方法A DNA information storage method based on natural and unnatural bases 技术领域Technical Field

本发明属于数据存储技术领域,具体涉及基于天然和非天然碱基的DNA信息存储方法。The present invention belongs to the technical field of data storage, and in particular relates to a DNA information storage method based on natural and non-natural bases.

背景技术Background technique

随着网络技术的进步,信息的交流以及产生都呈爆炸式的增长。对于如此海量的信息,如何存储将成为亟需面对的问题。现有的硅基存储技术已经无法满足需求的增长,研究者已经将目光聚焦在其它物质上,其中脱氧核糖核酸(DNA)作为存储介质的研究更是研究的热点。DNA作为信息存储介质,其优势有存储密度高、存储时间长且稳定、能耗低等。但是同时在研究的初期DNA仍存在一些问题需要被解决,如目前的方法未充分利用碱基,尤其是非天然碱基的特性,编码密度受限于2bits/nt;同时基于目前的方法在存储信息时,每一次信息存储都需要将碱基进行从头合成,成本高。With the advancement of network technology, the exchange and generation of information have exploded. For such a huge amount of information, how to store it will become an urgent problem. The existing silicon-based storage technology can no longer meet the growth of demand, and researchers have focused their attention on other substances, among which the research on deoxyribonucleic acid (DNA) as a storage medium is a hot topic. As an information storage medium, DNA has the advantages of high storage density, long and stable storage time, and low energy consumption. However, at the same time, there are still some problems that need to be solved in the early stage of DNA research. For example, the current method does not make full use of the characteristics of bases, especially non-natural bases, and the coding density is limited to 2 bits/nt; at the same time, based on the current method, when storing information, each time the information is stored, the base needs to be synthesized from scratch, which is costly.

另外DNA信息存储技术严重受限于DNA合成和测序技术,而且对于合成和测序过程产生的错误很难被消除。因此如何跳过或降低DNA合成和测序带来的错误,也逐渐成为DNA信息存储领域的研究热点之一。专利CN113066534A提出利用以ATCG四种碱基来编码的四进制编码,生物芯片读取数据,采用边合成边测序的方法进行测序,该方法通过对芯片上的DNA序列进行测定,从而读取信息,生物芯片可以在低温下长期保存的特点,但是依然无法解决测序带来的错误问题。In addition, DNA information storage technology is severely limited by DNA synthesis and sequencing technology, and it is difficult to eliminate errors caused by the synthesis and sequencing process. Therefore, how to skip or reduce the errors caused by DNA synthesis and sequencing has gradually become one of the research hotspots in the field of DNA information storage. Patent CN113066534A proposes to use quaternary encoding encoded by four ATCG bases to read data on a biochip, and use a method of sequencing while synthesizing. This method measures the DNA sequence on the chip to read information. The biochip can be stored for a long time at low temperatures, but it still cannot solve the error problem caused by sequencing.

现有的DNA信息存储技术主要依赖于DNA合成和测序技术,而在合成和测序过程中往往会引入一些错误,给信息的解码带来困难。同时,合成和测序对DNA序列也有一定的限制,比如待合成的序列中不能含有连续的重复碱基(即均聚物长度≤4)、GC含量控制在40%~60%、序列之间不能互补形成二级结构等。Existing DNA information storage technology mainly relies on DNA synthesis and sequencing technology, and some errors are often introduced during the synthesis and sequencing process, which makes it difficult to decode the information. At the same time, synthesis and sequencing also have certain restrictions on DNA sequences, such as the sequence to be synthesized cannot contain continuous repeated bases (i.e., homopolymer length ≤ 4), the GC content is controlled at 40% to 60%, and the sequences cannot complement each other to form secondary structures.

发明内容Summary of the invention

现有的DNA信息存储技术中,无论是信息的写入和读取都受到了种种的技术限制。如目前信息的写入还停留在仅仅利用天然DNA编码的模式下,未充分利用非天然碱基特性,编码密度受限;在oligo合成的过程中合成的长度限制在200bp内且合成的序列中不能有超过3bp的均聚物,而在测序的过程中的限制更多,不能有大量的重复序列出现,测序仪本身存在较大的测序误差;而且从头合成DNA成本较高等。以上这些问题不仅限制了信息编码的随机性还会使结果失真,从而导致无法解码的情况。为解决上述问题,本发明基于天然和非天然碱基,利用质谱测序分析的优点,建立了一套DNA信息存储方法,这个存储过程不需要合成的步骤,因此也不存在序列合成的种种限制。既能满足信息存储的需求,也打破了现有DNA信息存储的固有框架,同时基于天然和非天然碱基存储,编码密度高,为DNA信息存储技术的发展提供新的思路的方向。In the existing DNA information storage technology, both the writing and reading of information are subject to various technical limitations. For example, the current writing of information still remains in the mode of using only natural DNA encoding, and the characteristics of non-natural bases are not fully utilized, and the encoding density is limited; in the process of oligo synthesis, the length of the synthesis is limited to 200bp and there cannot be more than 3bp of homopolymers in the synthesized sequence, and there are more restrictions in the sequencing process, and a large number of repeated sequences cannot appear. The sequencer itself has a large sequencing error; and the cost of synthesizing DNA from scratch is high. The above problems not only limit the randomness of information encoding, but also distort the results, resulting in a situation where decoding cannot be performed. In order to solve the above problems, the present invention is based on natural and non-natural bases, and utilizes the advantages of mass spectrometry sequencing analysis to establish a set of DNA information storage methods. This storage process does not require the step of synthesis, so there are no various restrictions on sequence synthesis. It can not only meet the needs of information storage, but also break the inherent framework of existing DNA information storage. At the same time, based on natural and non-natural base storage, the encoding density is high, providing a new direction for the development of DNA information storage technology.

本发明一个方面提供了一种基于天然和非天然碱基的DNA存储方法,所述DNA存储方法包括以下步骤:One aspect of the present invention provides a DNA storage method based on natural and unnatural bases, the DNA storage method comprising the following steps:

S11):提取待存储计算机数据信息的数据信息;S11): extracting data information of computer data information to be stored;

S12):设计编码表,所述的编码表为DNA信息与数据信息单位进行一一映射所形成的;S12): Designing a coding table, wherein the coding table is formed by one-to-one mapping of DNA information and data information units;

S13)将步骤S11)中的数据信息拆分形成数据信息单位,并在步骤S12)获得编码表中依次确认拆分后的数据信息单位对应的DNA信息;S13) splitting the data information in step S11) into data information units, and confirming the DNA information corresponding to the split data information units in the coding table obtained in step S12);

S14)获得步骤S13)确认的DNA信息所对应的脱氧核糖核苷酸或者由两个以上脱氧核糖核苷酸组成的多核苷酸序列,并按顺序依次排列在芯片的不同孔位,获得DNA信息存储载体;S14) obtaining the deoxyribonucleotides or polynucleotide sequences consisting of two or more deoxyribonucleotides corresponding to the DNA information confirmed in step S13), and arranging them in sequence in different wells of the chip to obtain a DNA information storage carrier;

所述DNA信息为脱氧核糖核苷酸或者由两个以上脱氧核糖核苷酸组成的多核苷酸序列,或者为脱氧核糖核苷酸或者由两个以上脱氧核糖核苷酸组成的多核苷酸序列中任意两种的组合;The DNA information is a deoxyribonucleotide or a polynucleotide sequence consisting of two or more deoxyribonucleotides, or a combination of any two of the deoxyribonucleotides or the polynucleotide sequence consisting of two or more deoxyribonucleotides;

在编码表中,每种DNA信息的分子量均不相同,且两两之间的分子量差距不低于10。In the coding table, the molecular weight of each DNA information is different, and the molecular weight difference between each is no less than 10.

进一步地,在步骤S11)中,计算机数据信息为文本、数字、图片、视频、程序、音频。 Furthermore, in step S11), the computer data information is text, numbers, pictures, videos, programs, and audio.

进一步地,在步骤S11)中,所述数据信息为字符信息、RGB信息、二进制数据信息、八进制数据信息、十六进制数据信息、十进制数据信息。Further, in step S11), the data information is character information, RGB information, binary data information, octal data information, hexadecimal data information, or decimal data information.

进一步地,在编码表中,每种DNA信息的分子量均不相同,且两两之间的分子量差距不低于10。Furthermore, in the coding table, the molecular weight of each DNA information is different, and the difference in molecular weight between each pair is no less than 10.

进一步地,所述脱氧核糖核苷酸为天然脱氧核糖核苷酸或非天然脱氧核糖核苷酸,所述非天然脱氧核糖核苷酸为进行了碱基修饰的脱氧核糖核苷酸。Furthermore, the deoxyribonucleotide is a natural deoxyribonucleotide or a non-natural deoxyribonucleotide, and the non-natural deoxyribonucleotide is a base-modified deoxyribonucleotide.

进一步地,所述两个以上脱氧核糖核苷酸组成的多核苷酸序列中的脱氧核糖核苷酸为天然脱氧核糖核苷酸或非天然脱氧核糖核苷酸。Furthermore, the deoxyribonucleotides in the polynucleotide sequence composed of two or more deoxyribonucleotides are natural deoxyribonucleotides or non-natural deoxyribonucleotides.

进一步地,所述两个以上脱氧核糖核苷酸组成的多核苷酸序列中通过调节多核苷序列中脱氧核糖核苷酸的种类和数量的组合使其具有不同分子量。Furthermore, the polynucleotide sequence composed of two or more deoxyribonucleotides has different molecular weights by adjusting the combination of the types and quantities of deoxyribonucleotides in the polynucleotide sequence.

进一步地,所述的编码表中形成的32种-1024种映射关系,例如为32种、64种、128种、256种、512种、1024种。Furthermore, the 32-1024 mapping relationships formed in the coding table are, for example, 32, 64, 128, 256, 512, and 1024.

进一步地,所述数据信息单位为将数据信息分割成用于记录信息的不同的单位,在一些实例中,数据信息单位为4位或8位的二进制数字、单独字符、单独像素能够对应的RGB像素信息。Furthermore, the data information unit is to divide the data information into different units for recording information. In some examples, the data information unit is RGB pixel information that can correspond to a 4-bit or 8-bit binary number, a single character, or a single pixel.

进一步地,所述核酸序列中的碱基是经过修饰或不经过修饰的。更进一步地,所述经过修饰是指经过DBCO修饰、AMCA修饰、硫代修饰、氨基修饰、生物素修饰、地高辛修饰、磷酸基团、巯基中的至少一种。Furthermore, the bases in the nucleic acid sequence are modified or unmodified. Furthermore, the modified bases are at least one of DBCO modification, AMCA modification, thio modification, amino modification, biotin modification, digoxigenin modification, phosphate group, and sulfhydryl group.

进一步地,所述编码表中的多核苷酸序列种类为32条,且通过两种多核苷酸序列的组合形成128种不同的组合多核苷酸序列。Furthermore, there are 32 types of polynucleotide sequences in the coding table, and 128 different combined polynucleotide sequences are formed by combining two polynucleotide sequences.

更进一步地,所述编码表中将DNA信息与4位或8位二进制数一一映射。Furthermore, the coding table maps DNA information to 4-bit or 8-bit binary numbers one by one.

更进一步地,所述编码表中将DNA信息与ASCII码表一一映射。Furthermore, the coding table maps DNA information to the ASCII code table one by one.

更进一步地,所述编码表中将DNA信息与128种字符一一映射。Furthermore, the coding table maps DNA information to 128 characters one by one.

更进一步地,所述编码表中将DNA信息与RGB像素信息一一映射。Furthermore, the coding table maps DNA information to RGB pixel information one by one.

在一些具体的实施方案中,所述编码表中DNA信息为128个或256个不同的具有修饰碱基的核苷酸,为分别针对A、T、C、G进行32种或64种不同种类的修饰,共获得128个或256个不同的具有修饰碱基的核苷酸。In some specific embodiments, the DNA information in the coding table is 128 or 256 different nucleotides with modified bases, and 32 or 64 different types of modifications are performed on A, T, C, and G, respectively, to obtain a total of 128 or 256 different nucleotides with modified bases.

更进一步地,所述编码表中将DNA信息中的一种与RGB像素信息中的色彩信息即R\G\B一一映射,DNA信息中的另一种与RGB像素信息色彩信息中的数字0-255一一映射,并将两种DNA信息的组合形成RGB像素信息的一一映射关系。Furthermore, in the coding table, one type of DNA information is mapped one-to-one with the color information in the RGB pixel information, namely R\G\B, and the other type of DNA information is mapped one-to-one with the numbers 0-255 in the color information of the RGB pixel information, and the combination of the two types of DNA information forms a one-to-one mapping relationship with the RGB pixel information.

在只包含四种天然碱基的实施方案中,所述编码表中包含128条组合多核苷酸序列,多核苷酸序列的长度分为8组,所述8组多核苷酸序列的长度依次延长,分别包含10-24个碱基的核苷酸,每组4条多核苷酸序列,每组的4条多核苷酸序列中的碱基种类和或数量不同。In an embodiment comprising only four natural bases, the coding table comprises 128 combined polynucleotide sequences, the lengths of the polynucleotide sequences are divided into 8 groups, the lengths of the 8 groups of polynucleotide sequences are successively extended, each comprising 10-24 base nucleotides, and each group comprises 4 polynucleotide sequences, and the types and or quantities of bases in the 4 polynucleotide sequences in each group are different.

本发明另一个方面提供了上述DNA存储方法获得DNA信息存储载体的信息读取方法,所述信息读取方法包含以下步骤:Another aspect of the present invention provides a method for reading information from a DNA information storage carrier obtained by the above DNA storage method, the information reading method comprising the following steps:

S21)将DNA信息存储载体中不同孔位待测序列进行质谱检测,获得每个孔位中的DNA信息的分子量信息;S21) performing mass spectrometry on the sequences to be tested at different wells in the DNA information storage carrier to obtain molecular weight information of the DNA information in each well;

S22)并根据DNA信息的分子量信息分析其碱基组合信息确认不同孔位对应的DNA信息;S22) analyzing the base combination information according to the molecular weight information of the DNA information to confirm the DNA information corresponding to the different pore positions;

S23)根据步骤S22)获得的DNA信息以及上述步骤S12)的编码表表解读数据信息单位;S23) interpreting the data information unit according to the DNA information obtained in step S22) and the coding table in step S12);

S24)根据S23)获得的数据信息单位进行拼接,解码数据信息获得存储的计算机数据信息。S24) splicing the data information units obtained in S23) and decoding the data information to obtain the stored computer data information.

进一步地,在步骤S21)中质谱检测的方法为MALDI质谱测序。Furthermore, in step S21), the mass spectrometry detection method is MALDI mass spectrometry sequencing.

进一步地,在步骤S21)中质谱检测的方法包括以下步骤:Further, in step S21), the method of mass spectrometry detection comprises the following steps:

S211)将待测序列进行酶切和或纯化;S211) digesting and/or purifying the sequence to be tested;

S212)将纯化后的片段进行质谱检测,获得分子量。S212) The purified fragments are subjected to mass spectrometry to obtain molecular weight.

更进一步地,步骤S211)中纯化的方法为乙醇沉淀、微量透析或MillporeZiptip微量层 析。Furthermore, the purification method in step S211) is ethanol precipitation, microdialysis or MillporeZiptip microlayer Analysis.

进一步地,计算机数据信息为能够在计算机上存在的数据信息,优选地,选自图片、文本、程序、音频、视频。Furthermore, computer data information is data information that can exist on a computer, preferably selected from pictures, texts, programs, audio, and video.

本发明再一个方面提供了一种基于天然和非天然碱基的DNA信息存储和解码方法,所述方法包括:Another aspect of the present invention provides a method for storing and decoding DNA information based on natural and unnatural bases, the method comprising:

如上所述的基于天然和非天然碱基的DNA存储方法,所述DNA存储方法包括以下步骤:The DNA storage method based on natural and unnatural bases as described above comprises the following steps:

S11):提取待存储计算机数据信息的二进制信息;S11): extracting binary information of computer data information to be stored;

S12):设计编码表,所述的编码表为DNA信息与数据信息单位进行一一映射所形成的;S12): Designing a coding table, wherein the coding table is formed by one-to-one mapping of DNA information and data information units;

S13):将步骤S11)中的数据信息拆分形成数据信息单位,并在步骤S12)获得编码表中依次确认拆分后的数据信息单位对应的DNA信息;S13): splitting the data information in step S11) into data information units, and confirming the DNA information corresponding to the split data information units in the coding table obtained in step S12);

S14)获得步骤S13)确认的DNA信息所对应的脱氧核糖核苷酸或者由两个以上脱氧核糖核苷酸组成的多核苷酸序列,并按顺序依次排列在芯片的不同孔位,获得DNA信息存储载体;S14) obtaining the deoxyribonucleotides or polynucleotide sequences consisting of two or more deoxyribonucleotides corresponding to the DNA information confirmed in step S13), and arranging them in sequence in different wells of the chip to obtain a DNA information storage carrier;

所述DNA信息为脱氧核糖核苷酸或者由两个以上脱氧核糖核苷酸组成的多核苷酸序列,或者为脱氧核糖核苷酸或者由两个以上脱氧核糖核苷酸组成的多核苷酸序列中任意两种的组合;The DNA information is a deoxyribonucleotide or a polynucleotide sequence consisting of two or more deoxyribonucleotides, or a combination of any two of the deoxyribonucleotides or the polynucleotide sequence consisting of two or more deoxyribonucleotides;

在编码表中,每种DNA信息的分子量均不相同,且两两之间的分子量差距不低于10;In the coding table, the molecular weight of each DNA information is different, and the molecular weight difference between each is no less than 10;

以及如上所述的DNA存储方法获得DNA信息存储载体的信息读取方法,所述信息读取方法包含以下步骤:And a method for reading information from a DNA information storage carrier obtained by the DNA storage method as described above, the information reading method comprising the following steps:

S21)将DNA信息存储载体中不同孔位待测序列进行质谱检测,获得每个孔位中的DNA信息的分子量信息;S21) performing mass spectrometry on the sequences to be tested at different wells in the DNA information storage carrier to obtain molecular weight information of the DNA information in each well;

S22)并根据DNA信息的分子量信息分析其碱基组合信息确认不同孔位对应的DNA信息;S22) analyzing the base combination information according to the molecular weight information of the DNA information to confirm the DNA information corresponding to the different pore positions;

S23)根据步骤S22)获得的DNA信息以及上述步骤S12)的编码表表解读数据信息单位;S23) interpreting the data information unit according to the DNA information obtained in step S22) and the coding table in step S12);

S24)根据S23)获得的数据信息单位进行拼接,解码数据信息获得存储的计算机数据信息。S24) splicing the data information units obtained in S23) and decoding the data information to obtain the stored computer data information.

本发明再一个方面提供了一种的基于天然和非天然碱基的DNA信息存储装置,所述装置包括:Another aspect of the present invention provides a DNA information storage device based on natural and unnatural bases, the device comprising:

数据信息提取单元,用于提取待存储计算机数据,并将待存储的计算机数据转换为信息对应的数据信息;A data information extraction unit, used for extracting the computer data to be stored and converting the computer data to be stored into data information corresponding to the information;

数据信息与DNA信息转换单元,用于根据预设的映射关系,将所述数据信息序列拆分或组装并转换为DNA信息;A data information and DNA information conversion unit, used to split or assemble the data information sequence and convert it into DNA information according to a preset mapping relationship;

合成和存储单元,用于合成数据信息与DNA信息转换单元转换获得的核酸序列,并按照顺序在存储单元芯片的不同孔位上保存DNA信息所对应的脱氧核糖核苷酸、由两个以上脱氧核糖核苷酸组成的多核苷酸序列或其组合;A synthesis and storage unit, used to synthesize the data information and the nucleic acid sequence converted by the DNA information conversion unit, and store the deoxyribonucleotides corresponding to the DNA information, a polynucleotide sequence consisting of two or more deoxyribonucleotides, or a combination thereof in order on different wells of the storage unit chip;

所述的映射关系为DNA信息与数据信息单位进行一一映射的关系;The mapping relationship is a one-to-one mapping relationship between DNA information and data information units;

所述DNA信息为脱氧核糖核苷酸或者由两个以上脱氧核糖核苷酸组成的多核苷酸序列,或者为脱氧核糖核苷酸或者由两个以上脱氧核糖核苷酸组成的多核苷酸序列中任意两种的组合;The DNA information is a deoxyribonucleotide or a polynucleotide sequence consisting of two or more deoxyribonucleotides, or a combination of any two of the deoxyribonucleotides or the polynucleotide sequence consisting of two or more deoxyribonucleotides;

在映射关系中,每种DNA信息的分子量均不相同,且两两之间的分子量差距不低于10。In the mapping relationship, the molecular weight of each DNA information is different, and the molecular weight difference between each pair is no less than 10.

数据信息与DNA信息转换单元可以包括DNA信息编码单元、DNA信息与数据信息匹配单元以及DNA信息信息转换单元。所述DNA信息编码单元用于记录每一种DNA信息对应的碱基种类和数量的组合。DNA信息和数据信息匹配单元用于调用DNA信息编码单元中的不同DNA信息与数据信息单位进行一一匹配和对应。所述DNA信息转换单元用于将数据信息提取单元中的数字信息根据DNA信息和数据信息匹配单元的信息一一转换为DNA信息。The data information and DNA information conversion unit may include a DNA information encoding unit, a DNA information and data information matching unit, and a DNA information information conversion unit. The DNA information encoding unit is used to record the combination of base types and quantities corresponding to each type of DNA information. The DNA information and data information matching unit is used to call different DNA information in the DNA information encoding unit to match and correspond to the data information units one by one. The DNA information conversion unit is used to convert the digital information in the data information extraction unit into DNA information one by one according to the information of the DNA information and data information matching unit.

本发明再一个方面提供了一种的基于天然和非天然碱基核酸存储的解码装置,所述装置 包括:Another aspect of the present invention provides a decoding device based on natural and non-natural base nucleic acid storage, the device include:

读取单元,用于通过质谱仪检测合成和存储单元中储存的待测序列,并根据分子量确认其DNA信息;A reading unit, used to detect the sequence to be tested stored in the synthesis and storage unit by a mass spectrometer, and confirm its DNA information according to the molecular weight;

DNA信息与数据信息转换单元,用于根据预设的映射关系即DNA信息存储装置中相同的数据信息与DNA信息映射关系,将读取单元获得的DNA信息转换为数据信息;A DNA information and data information conversion unit, used to convert the DNA information obtained by the reading unit into data information according to a preset mapping relationship, that is, the same data information and DNA information mapping relationship in the DNA information storage device;

计算机数据输出单元,用于将DNA信息与数据信息转换单元获得的数据信息转换为存储的计算机数据;A computer data output unit, used to convert the DNA information and the data information obtained by the data information conversion unit into stored computer data;

所述读取模块包含了用于检测存储信息的芯片中对每个孔位中的核酸序列进行质谱检测的质谱仪,还可以包含对于每个孔位中的核酸序列进行纯化和或酶解的单元。The reading module includes a mass spectrometer for performing mass spectrometry detection on the nucleic acid sequence in each well position in the chip for detecting stored information, and may also include a unit for purifying and/or enzymatically hydrolyzing the nucleic acid sequence in each well position.

本发明再一个方面提供了一种计算机可读存储介质,其上存储有计算机程序,其中,该计算机程序被处理器执行时实现上述基于天然和非天然碱基的DNA存储方法或上述存储方法获得DNA信息存储载体的信息读取方法的步骤。In yet another aspect, the present invention provides a computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the steps of implementing the above-mentioned DNA storage method based on natural and non-natural bases or the information reading method of obtaining a DNA information storage carrier using the above-mentioned storage method are implemented.

本发明再一个方面提供了一种计算机设备,包括存储器和处理器,在所述存储器上存储有能够在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述基于天然和非天然碱基的DNA存储方法或上述DNA存储方法获得DNA信息存储载体的信息读取方法的步骤。In another aspect, the present invention provides a computer device, including a memory and a processor, wherein the memory stores a computer program that can be run on the processor, and when the processor executes the computer program, it implements the steps of the above-mentioned DNA storage method based on natural and non-natural bases or the above-mentioned DNA storage method to obtain the information reading method of the DNA information storage carrier.

有益效果Beneficial Effects

1)本发明的方案可以使用任何修饰碱基或非修饰碱基和非天然碱基对也能应用于该项发明中。并且随着可用碱基数量的增多,DNA存储的效率也随之增强,采用多种非天然碱基,每个碱基能够编码8位二进制码从而本发明的逻辑编码能力可达8bit/nt,已经突破了4碱基编码的理论极限2bit/nt。1) The scheme of the present invention can use any modified base or unmodified base and unnatural base pairs. And as the number of available bases increases, the efficiency of DNA storage is also enhanced. By using a variety of unnatural bases, each base can encode 8-bit binary code, so the logical encoding capacity of the present invention can reach 8bit/nt, which has broken through the theoretical limit of 2bit/nt of 4-base encoding.

2)本发明的编码方式根据碱基的种类和数量进行设计,不用考虑序列中重复和二级结构等问题。2) The encoding method of the present invention is designed according to the type and number of bases, without considering issues such as repetition and secondary structure in the sequence.

3)本发明从源头解决了由合成和测序带来的一系列问题,直接弃用了这两项技术,而是采用碱基定点存放和质谱仪检测的方法。3) The present invention solves a series of problems caused by synthesis and sequencing from the source, directly abandons these two technologies, and instead adopts a method of fixed-point base storage and mass spectrometer detection.

4)现有技术中使用四种天然碱基直接映射四进制编码,编码效率仍旧较低,本发明可以对任何天然和非天然碱基进行编码,并且不同数量长度的碱基组合也能作为编码的一个变量进行编码,解决了编码效率和编码方式的限制,同时通过引入大量非天然碱基,极大提高了编码效率和编码的自由度,由于市售的非天然碱基种类繁多,且商业化程度高,能够充分满足编码要求。4) The prior art uses four natural bases to directly map to quaternary codes, but the coding efficiency is still low. The present invention can encode any natural and non-natural bases, and base combinations of different numbers and lengths can also be encoded as a variable of the code, which solves the limitations of the coding efficiency and coding method. At the same time, by introducing a large number of non-natural bases, the coding efficiency and the degree of freedom of coding are greatly improved. Since there are many types of commercially available non-natural bases and they are highly commercialized, they can fully meet the coding requirements.

5)由于信息读取方式和编码方式的特殊,不用合成DNA序列取而代之的是结合微流控技术的网格化存储,仅需合成短序列,无需合成长序列,从而打破了合成带来的桎梏。5) Due to the special way of information reading and encoding, there is no need to synthesize DNA sequences. Instead, grid storage combined with microfluidic technology is used. Only short sequences need to be synthesized, and no long sequences need to be synthesized, thus breaking the shackles brought by synthesis.

6)本发明的存储和读取方法,抛弃了现有技术中以测序仪鉴定记录信息的核酸序列的做法,创新性地使用质谱仪读取碱基的类型,克服了测序过程带来的错误和限制,可以通过不同碱基不同配比组合形成分子量的链种类。6) The storage and reading method of the present invention abandons the practice of using a sequencer to identify the nucleic acid sequence of the recorded information in the prior art, and innovatively uses a mass spectrometer to read the type of base, overcoming the errors and limitations brought by the sequencing process. Different molecular weight chain types can be formed by combining different bases in different ratios.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明基于天然和非天然碱基质谱解码的DNA存储方法的示意图。FIG. 1 is a schematic diagram of a DNA storage method based on natural and unnatural base matrix spectrum decoding according to the present invention.

图2为本发明具体实施方案部分实施例1基于天然和非天然碱基质谱解码的直接编码的DNA存储方法的示意图。2 is a schematic diagram of a direct-encoded DNA storage method based on natural and unnatural base matrix spectrum decoding according to Example 1 of a specific embodiment of the present invention.

图3为本发明DNA信息存储载体的信息读取方法的示意图。FIG. 3 is a schematic diagram of an information reading method of a DNA information storage carrier of the present invention.

图4为本发明质谱检测示意图。FIG4 is a schematic diagram of mass spectrometry detection according to the present invention.

图5为本发明实施例4提供的编码装置的结构示意图。FIG5 is a schematic diagram of the structure of an encoding device provided in Embodiment 4 of the present invention.

图6为本发明实施例4提供的解码装置的结构示意图。FIG6 is a schematic diagram of the structure of a decoding device provided in Embodiment 4 of the present invention.

图7为本发明实施例4提供的终端设备的结构示意图。FIG. 7 is a schematic diagram of the structure of a terminal device provided in Embodiment 4 of the present invention.

图8为本发明的整体技术流程图。FIG8 is an overall technical flow chart of the present invention.

图9为本发明待编码图片的示意图。 FIG. 9 is a schematic diagram of a picture to be encoded according to the present invention.

具体实施方式Detailed ways

为了使本发明的上述目的、特征和优点能够更加明显易懂,下面对本发明的具体实施方式做详细的说明,但不能理解为对本发明的可实施范围的限定。In order to make the above-mentioned objects, features and advantages of the present invention more obvious and understandable, the specific implementation modes of the present invention are described in detail below, but it should not be understood as limiting the applicable scope of the present invention.

应当理解,当在本发明说明书和所附权利要求书中使用时,术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It should be understood that when used in the present specification and the appended claims, the term "comprising" indicates the presence of described features, integers, steps, operations, elements and/or components, but does not exclude the presence or addition of one or more other features, integers, steps, operations, elements, components and/or combinations thereof.

还应当理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It should also be understood that the term “and/or” used in the specification and appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

另外,在本申请说明书和所附权利要求书的描述中,术语“第一”、“第二”、“第三”等仅用于区分描述,而不能理解为指示或暗示相对重要性。In addition, in the description of the present application specification and the appended claims, the terms "first", "second", "third", etc. are only used to distinguish the descriptions and cannot be understood as indicating or implying relative importance.

在本申请说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。下面结合附图,说明本发明的一些具体的实施方案。References to "one embodiment" or "some embodiments" etc. described in the specification of this application mean that one or more embodiments of the present application include specific features, structures or characteristics described in conjunction with the embodiment. Therefore, the statements "in one embodiment", "in some embodiments", "in some other embodiments", "in some other embodiments", etc. that appear in different places in this specification do not necessarily refer to the same embodiment, but mean "one or more but not all embodiments", unless otherwise specifically emphasized in other ways. The terms "including", "comprising", "having" and their variations all mean "including but not limited to", unless otherwise specifically emphasized in other ways. Some specific implementation plans of the present invention are described below in conjunction with the accompanying drawings.

现有的DNA信息存储技术中,无论是信息的写入和读取都受到了种种的技术限制,尤其是基于测序技术,极大限制了信息存储。为解决上述问题,本发明创新性地建立基于天然及非天然碱基的DNA存储方法,以质谱的方法替代测序方法,颠覆了传统信息存储和解码的方法。而且本发明的存储过程不需要过多的合成,甚至不需要合成的步骤,因此也不存在序列合成的种种限制。下面结合附图对本发明的方法以实施例形式进行说明。In the existing DNA information storage technology, both the writing and reading of information are subject to various technical limitations, especially based on sequencing technology, which greatly limits information storage. To solve the above problems, the present invention innovatively establishes a DNA storage method based on natural and non-natural bases, replaces the sequencing method with a mass spectrometry method, and subverts the traditional information storage and decoding method. Moreover, the storage process of the present invention does not require too much synthesis, or even the steps of synthesis, so there are no various limitations on sequence synthesis. The method of the present invention is described in the form of an embodiment in conjunction with the accompanying drawings.

实施例1基于天然及非天然碱基质谱解码的DNA存储方法:Example 1 DNA storage method based on natural and unnatural base matrix spectrum decoding:

S11)提取待存储计算机数据信息的数据信息;S11) extracting data information of computer data information to be stored;

其中,待存储计算机数据信息可以为任一格式的数据,例如可以为文本、数字、图片、视频、程序、音频等等。本发明具体实施方案中,数据信息可以为RGB信息、二进制数据信息、八进制数据信息、十六进制数据信息、十进制数据信息、字符信息。The computer data information to be stored may be data in any format, such as text, numbers, pictures, videos, programs, audio, etc. In a specific embodiment of the present invention, the data information may be RGB information, binary data information, octal data information, hexadecimal data information, decimal data information, or character information.

在一些具体实施方案中,将待存储计算机数据信息转换为数据信息,采用现有技术中任意已知的方法对其进行转换。In some specific embodiments, the computer data information to be stored is converted into digital information by any method known in the art.

在另一些具体的实施方案中,对于图片信息可以将其转换为RGD信息。RGD信息为RGD像素信息,将图片信息转换为不同像素点的RGD像素信息,RGD像素信息由色彩种类,即R\G\D信息以及色彩强度0-255构成。In some other specific implementation schemes, the image information can be converted into RGD information. The RGD information is RGD pixel information, and the image information is converted into RGD pixel information of different pixel points. The RGD pixel information consists of color types, namely R\G\D information and color intensity 0-255.

在一些具体的实施方案中,如果待存储的信息为字符信息,也可以不进行转换,直接将字符信息作为数据信息,进行后续的转换和编码。In some specific implementation schemes, if the information to be stored is character information, it is also possible not to perform conversion, and directly use the character information as data information for subsequent conversion and encoding.

S12)设计编码表,所述的编码表为DNA信息与数据信息单位进行一一映射所形成的;S12) designing a coding table, wherein the coding table is formed by one-to-one mapping of DNA information and data information units;

所述DNA信息为脱氧核糖核苷酸或者由两个以上脱氧核糖核苷酸组成的多核苷酸序列,或者为脱氧核糖核苷酸或者由两个以上脱氧核糖核苷酸组成的多核苷酸序列中任意两种的组合;The DNA information is a deoxyribonucleotide or a polynucleotide sequence consisting of two or more deoxyribonucleotides, or a combination of any two of the deoxyribonucleotides or the polynucleotide sequence consisting of two or more deoxyribonucleotides;

在编码表中,每种DNA信息的分子量均不相同,且两两之间的分子量差距不低于5。In the coding table, the molecular weight of each DNA information is different, and the molecular weight difference between each is no less than 5.

在一些具体的实施方案中,示例性地,两两DNA信息之间的分子量差距为10以上,例如为10-100之间的任意数字。In some specific embodiments, illustratively, the molecular weight difference between two DNA information is greater than 10, for example, any number between 10-100.

在一些具体的技术方案中,所述编码表中32种-1024种映射关系,例如为32种、64种、128种、256种、512种、1024种。In some specific technical solutions, the 32-1024 mapping relationships in the coding table are, for example, 32, 64, 128, 256, 512, and 1024.

在一些具体的技术方案中,DNA信息与数据信息编码表中的数据信息示例性地为4位二进制数据或8位二进制数据,并与编码表中的DNA信息数量相同,并形成一一映射关系。In some specific technical solutions, the data information in the DNA information and data information coding table is exemplarily 4-bit binary data or 8-bit binary data, and is the same as the number of DNA information in the coding table, forming a one-to-one mapping relationship.

在一些具体的技术方案中,DNA信息与数据信息编码表中的数据信息示例性地为字符信息,并与编码表中的DNA信息数量相同,并形成一一映射关系。 In some specific technical solutions, the DNA information and the data information in the data information coding table are exemplarily character information, and the number is the same as the DNA information in the coding table, forming a one-to-one mapping relationship.

在一些具体的技术方案中,DNA信息与数据信息编码表中的数据信息示例性地为RGB像素信息,并与编码表中的DNA信息形成一一映射关系。In some specific technical solutions, the data information in the DNA information and data information coding table is illustratively RGB pixel information, and forms a one-to-one mapping relationship with the DNA information in the coding table.

在一些具体的技术方案中,所述DNA信息可以为天然脱氧核糖核苷酸或非天然脱氧核糖核苷酸,所述非天然脱氧核糖核苷酸为进行了碱基修饰的脱氧核糖核苷酸。In some specific technical solutions, the DNA information may be natural deoxyribonucleotides or non-natural deoxyribonucleotides, and the non-natural deoxyribonucleotides are base-modified deoxyribonucleotides.

在一些具体的技术方案中,修饰碱基是指经过DBCO修饰、AMCA修饰、硫代修饰、氨基修饰、生物素修饰、地高辛修饰、磷酸基团、巯基、氨基类、NHBOC修饰、Fmoc修饰、羧酸修饰、Mal修饰、NHS修饰,叠氮修饰,Cy3/Cy5/Cy7修饰、THP修饰、苄基修饰、丙炔基修饰、溴代修饰、丙酸叔丁酯修饰、乙酸叔丁酯修饰、甲基修饰、生物素修饰、五氟苯酚修饰、磺酸酯修饰中至少一种或两种以上的组合。现在已有一些商业化的修饰碱基,本领域技术人员可以根据分子量需要进行选择,详细修饰类型可在商业公司官网中查询。In some specific technical solutions, the modified base refers to at least one or a combination of two or more of DBCO modification, AMCA modification, thio modification, amino modification, biotin modification, digoxigenin modification, phosphate group, sulfhydryl group, amino group, NHBOC modification, Fmoc modification, carboxylic acid modification, Mal modification, NHS modification, azide modification, Cy3/Cy5/Cy7 modification, THP modification, benzyl modification, propynyl modification, bromine modification, tert-butyl propionate modification, tert-butyl acetate modification, methyl modification, biotin modification, pentafluorophenol modification, and sulfonate modification. There are some commercial modified bases now, and those skilled in the art can choose according to the molecular weight requirements. The detailed modification types can be queried on the official website of the commercial company.

示例性地,所述非天然碱基选自2-氨基腺嘌呤-9-基,2-氨基腺嘌呤,2-F-腺嘌呤,2-硫尿嘧啶,2-硫代胸腺嘧啶,2-硫代胞嘧啶,腺嘌呤和鸟嘌呤的2-丙基和烷基衍生物,2-氨基-腺嘌呤,2-氨基-丙基-腺嘌呤,2-氨基吡啶,2-吡啶酮,2'-脱氧尿苷,2-氨基-2'-脱氧腺苷3-脱氮杂鸟嘌呤,3-脱氮杂腺嘌呤,4-硫代尿嘧啶,4-硫代胸腺嘧啶,尿嘧啶-5-基,次黄嘌呤-9-基(I),5-甲基-胞嘧啶,5-羟甲基胞嘧啶,黄嘌呤,次黄嘌呤,5-溴和5-三氟甲基尿嘧啶和胞嘧啶;5-卤代尿嘧啶,5-卤代胞嘧啶,5-丙炔基-尿嘧啶,5-丙炔基胞嘧啶,5-尿嘧啶,5-取代、5-卤代、5-取代嘧啶,5-羟基胞嘧啶,5-溴胞嘧啶,5-溴尿嘧啶,5-氯胞嘧啶,氯化胞嘧啶,环胞嘧啶,胞嘧啶阿拉伯糖苷,5-氟胞嘧啶,氟嘧啶,氟尿嘧啶,5,6-二氢胞嘧啶,5-碘胞嘧啶,羟基脲,碘尿嘧啶,5-硝基胞嘧啶,5-溴尿嘧啶,5-氯尿嘧啶,5-氟尿嘧啶和5-碘尿嘧啶,腺嘌呤和鸟嘌呤的6-烷基衍生物,6-氮杂嘧啶,6-偶氮-尿嘧啶,6-偶氮胞嘧啶,氮杂胞嘧啶,6-偶氮-胸腺嘧啶,6-硫鸟嘌呤,7-甲基鸟嘌呤,7-甲基腺嘌呤,7-脱氮杂鸟嘌呤,7-脱氮杂鸟苷,7-脱氮杂-腺嘌呤,7-脱氮杂-8-氮杂鸟嘌呤,8-氮杂鸟嘌呤,8-氮杂腺嘌呤,8-卤素、8-氨基、8-硫醇、8-硫代烷基和8-羟基取代的腺嘌呤和鸟嘌呤;N4-乙基胞嘧啶,N-2取代的嘌呤,N-6取代的嘌呤,O-6取代的嘌呤,增加双链体形成的稳定性的那些,通用核酸,疏水核酸,混杂核酸,尺寸扩展的核酸,氟化核酸,三环嘧啶,吩噁嗪胞苷([5,4-b][1,4]苯并噁嗪-2(3H)-酮),吩噻嗪胞苷(1H-嘧啶并[5,4-b][1,4]苯并噻嗪-2(3H)-酮),G-夹,吩噁嗪胞苷(9-(2-氨基乙氧基)-H-嘧啶并[5,4-b][1,4]苯并噁嗪-2(3H)-酮),咔唑胞苷(2H-嘧啶并[4,5-b]吲哚-2-酮),吡啶并吲哚胞苷(H-吡啶并[3',2':4,5]吡咯并[2,3-d]嘧啶-2-酮),5-氟尿嘧啶,5-溴尿嘧啶,5-氯尿嘧啶,5-碘尿嘧啶,次黄嘌呤,黄嘌呤,4-乙酰基胞嘧啶,5-(羧基羟甲基)尿嘧啶,5-羧甲基氨甲基-2-硫尿苷,5-羧甲基氨甲基尿嘧啶,二氢尿嘧啶,β-D-半乳糖基辫苷,肌苷,N6-异戊烯基腺嘌呤,1-甲基鸟嘌呤,1-甲基肌苷,2,2-二甲基鸟嘌呤,2-甲基腺嘌呤,2-甲基鸟嘌呤,3-甲基胞嘧啶,5-甲基胞嘧啶,N6-腺嘌呤,7-甲基鸟嘌呤,5-甲基氨甲基尿嘧啶,5-甲氧基氨甲基-2-硫尿嘧啶,β-D-甘露糖基辫苷,5'-甲氧基羧甲基尿嘧啶,5-甲氧基尿嘧啶,2-甲硫基-N6-异戊烯基腺嘌呤,尿嘧啶-5氧乙酸,怀丁氧苷,假尿嘧啶,辫苷,2-硫代胞嘧啶,5-甲基-2-硫尿嘧啶,2-硫尿嘧啶,4-硫尿嘧啶,5-甲基尿嘧啶,尿嘧啶-5-氧杂乙酸甲基酯,尿嘧啶-5-氧杂乙酸,5-甲基-2-硫尿嘧啶,3-(3-氨基-3-N-2-羧丙基)尿嘧啶,(acp3)w和2,6-二氨基嘌呤以及嘌呤或嘧啶碱基被杂环替代。Exemplarily, the unnatural base is selected from 2-aminoadenin-9-yl, 2-aminoadenine, 2-F-adenine, 2-thiouracil, 2-thiothymine, 2-thiocytosine, 2-propyl and alkyl derivatives of adenine and guanine, 2-amino-adenine, 2-amino-propyl-adenine, 2-aminopyridine, 2-pyridone, 2'-deoxyuridine, 2-amino-2'-deoxyadenosine, 3-deazaguanine, 3-deazaadenine, 4-thio Uracil, 4-thiothymine, uracil-5-yl, hypoxanthine-9-yl (I), 5-methyl-cytosine, 5-hydroxymethylcytosine, xanthine, hypoxanthine, 5-bromo- and 5-trifluoromethyl-uracil and cytosine; 5-halouracil, 5-halocytosine, 5-propynyl-uracil, 5-propynylcytosine, 5-uracil, 5-substituted, 5-halogenated, 5-substituted pyrimidines, 5-hydroxycytosine, 5-bromocytosine, 5-bromouracil, 5-chlorocytosine Pyrimidine, chlorinated cytosine, cyclocytosine, cytosine arabinoside, 5-fluorocytosine, fluoropyrimidine, fluorouracil, 5,6-dihydrocytosine, 5-iodocytosine, hydroxyurea, iodouracil, 5-nitrocytosine, 5-bromouracil, 5-chlorouracil, 5-fluorouracil and 5-iodouracil, 6-alkyl derivatives of adenine and guanine, 6-azapyrimidine, 6-azo-uracil, 6-azocytosine, azacytosine, 6-azo-thymine, 6-thioguanine guanine, 7-methylguanine, 7-methyladenine, 7-deazaguanine, 7-deazaguanosine, 7-deaza-adenine, 7-deaza-8-azaguanine, 8-azaguanine, 8-azaadenine, 8-halogen, 8-amino, 8-thiol, 8-thioalkyl and 8-hydroxy substituted adenine and guanine; N4-ethylcytosine, N-2 substituted purine, N-6 substituted purine, O-6 substituted purine, those that increase the stability of duplex formation , Universal Nucleic Acid, Hydrophobic Nucleic Acid, Hybrid Nucleic Acid, Size-Extended Nucleic Acid, Fluorinated Nucleic Acid, Tricyclic Pyrimidine, Phenoxazine Cytidine ([5,4-b][1,4]benzoxazin-2(3H)-one), Phenothiazine Cytidine (1H-pyrimido[5,4-b][1,4]benzothiazin-2(3H)-one), G-clamp, Phenoxazine Cytidine (9-(2-aminoethoxy)-H-pyrimido[5,4-b][1,4]benzoxazin-2(3H)-one), Carbazole Cytidine (2H-pyrimido[4,5-b]indol-2-one), pyridoindolecytidine (H-pyrido[3',2':4,5]pyrrolo[2,3-d]pyrimidin-2-one), 5-fluorouracil, 5-bromouracil, 5-chlorouracil, 5-iodouracil, hypoxanthine, xanthine, 4-acetylcytosine, 5-(carboxyhydroxymethyl)uracil, 5-carboxymethylaminomethyl-2-thiouridine, 5-carboxymethylaminomethyluracil, dihydrouracil, β-D-galactocyanate Glycosyl quercetin, inosine, N6-isopentenyl adenine, 1-methylguanine, 1-methylinosine, 2,2-dimethylguanine, 2-methyladenine, 2-methylguanine, 3-methylcytosine, 5-methylcytosine, N6-adenine, 7-methylguanine, 5-methylaminomethyluracil, 5-methoxyaminomethyl-2-thiouracil, β-D-mannosyl quercetin, 5'-methoxycarboxymethyluracil, 5-methoxyuracil, 2-methylthio-N6-iso Pentenyl adenine, uracil-5-oxoacetic acid, whibutoxoside, pseudouracil, braided uracil, 2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5-methyluracil, uracil-5-oxoacetic acid methyl ester, uracil-5-oxoacetic acid, 5-methyl-2-thiouracil, 3-(3-amino-3-N-2-carboxypropyl)uracil, (acp3)w and 2,6-diaminopurine as well as purine or pyrimidine bases are replaced by heterocycles.

在一些具体的实施方案中,DNA信息中所述两个以上脱氧核糖核苷酸组成的多核苷酸序列中的脱氧核糖核苷酸为天然脱氧核糖核苷酸或非天然脱氧核糖核苷酸。所述两个以上脱氧核糖核苷酸组成的多核苷酸序列中通过调节多核苷序列中脱氧核糖核苷酸的种类和数量的组合使其具有不同分子量。例如,采用包含四种天然碱基组合编码;或者是天然、非天然碱基混合编码。In some specific embodiments, the deoxyribonucleotides in the polynucleotide sequence composed of two or more deoxyribonucleotides in the DNA information are natural deoxyribonucleotides or non-natural deoxyribonucleotides. The polynucleotide sequence composed of two or more deoxyribonucleotides is made to have different molecular weights by adjusting the combination of the types and quantities of deoxyribonucleotides in the polynucleotide sequence. For example, a combination encoding containing four natural bases is used; or a mixed encoding of natural and non-natural bases is used.

由于本发明在解码过程中不需要测序,而是采用质谱的方式测定,因此,对于链种类中的核糖核苷酸或脱氧核糖核苷酸构成的链种类质量不同即可实现以质谱数据区分。Since the present invention does not require sequencing during the decoding process but adopts mass spectrometry for determination, the different masses of the chain types composed of ribonucleotides or deoxyribonucleotides in the chain types can be distinguished by mass spectrometry data.

示例性地,所述编码表中包含128条组合多核苷酸序列,多核苷酸序列的长度分为8组,所述8组多核苷酸序列的长度依次延长,分别包含10-24个碱基的核苷酸,每组4条链种类,每组的4条链种类中的碱基种类和或数量不同。 Exemplarily, the coding table contains 128 combined polynucleotide sequences, the lengths of the polynucleotide sequences are divided into 8 groups, the lengths of the 8 groups of polynucleotide sequences are successively extended, each containing 10-24 base nucleotides, 4 chain types in each group, and the base types and or quantities in the 4 chain types in each group are different.

示例性地,可以采用不同的修饰基团对四种脱氧核糖核苷酸进行修饰,在不同修饰基团的作用下,例如针对每种脱氧核糖核苷酸进行32种修饰,则可以得到128种不同的脱氧核糖核苷酸,而采用64种修饰,则可以得到256种不同的脱氧核糖核苷酸。如果增加核苷酸种类,则编码的种类数可以在上述基础上增加1倍。For example, four deoxyribonucleotides can be modified with different modification groups. Under the action of different modification groups, for example, 32 modifications are performed on each deoxyribonucleotide, and 128 different deoxyribonucleotides can be obtained, while 64 modifications can be used to obtain 256 different deoxyribonucleotides. If the number of nucleotides is increased, the number of encoded types can be doubled on the basis of the above.

在一些具体的实施方案中,还可以采用两种多核苷酸的组合、两种非天然脱氧核糖核苷酸的组合,或者多核苷酸与非天然脱氧核糖核苷酸的组合。In some specific embodiments, a combination of two polynucleotides, a combination of two non-natural deoxyribonucleotides, or a combination of a polynucleotide and a non-natural deoxyribonucleotide may also be used.

示例性地,以组合的形式能够增加编码表中DNA信息的种类,提高编码效率,例如准备32种不同核苷酸序列,并将其两两进行组合,可以获得最多1024种组合,可以选择其中的128种或256种用于设计编码表。Exemplarily, the types of DNA information in the coding table can be increased in the form of combinations, thereby improving coding efficiency. For example, by preparing 32 different nucleotide sequences and combining them two by two, a maximum of 1024 combinations can be obtained, of which 128 or 256 can be selected for designing the coding table.

在一个具体的实施方案中,所述编码表中的DNA信息可以通过以下方法确认。首先,在该实施方案中仅采用天然核苷酸构成的多核苷酸,即A、T、C、G的脱氧核糖核酸构成的长度在10~24之间的多核苷酸序列,共设置8个长度梯度,每个长度梯度又设置四种不同碱基含量,相同长度的4种链可以任意组合出16种不同搭配(如表1所示,例如在第一长度梯度内的16种组合为a1a1,a1a2,a1a3,a1a4,a2a1,a2a2,a2a3,a2a4,a3a1,a3a2,a3a3,a3a4,a4a1,a4a2,a4a3,a4a4),因此总共有8×16=128种。例如,a1a1对应A4T2C2G2。In a specific embodiment, the DNA information in the coding table can be confirmed by the following method. First, in this embodiment, only polynucleotides composed of natural nucleotides are used, that is, polynucleotide sequences with a length of 10 to 24 composed of deoxyribonucleic acid of A, T, C, and G, and a total of 8 length gradients are set, and each length gradient is set with four different base contents. The four chains of the same length can be arbitrarily combined into 16 different combinations (as shown in Table 1, for example, the 16 combinations in the first length gradient are a1a1, a1a2, a1a3, a1a4, a2a1, a2a2, a2a3, a2a4, a3a1, a3a2, a3a3, a3a4, a4a1, a4a2, a4a3, a4a4), so there are a total of 8×16=128 types. For example, a1a1 corresponds to A4T2C2G2.

表1多核苷酸长度梯度与碱基组成对照表

Table 1 Comparison table of polynucleotide length gradient and base composition

将其分别对应于ASCII码表的128个元素,每个元素由8个bit的二进制数表示。They correspond to the 128 elements of the ASCII code table, and each element is represented by an 8-bit binary number.

能够理解的,多核苷酸序列中核苷酸的数量是可以调节的,如果多核苷酸序列的数量选取的数量范围的上界越小所需的碱基数量则越少。上述示例性方案选择了4种碱基,而如果增加参与编码的碱基种类,则所需的碱基数量也会急剧减少。It is understandable that the number of nucleotides in a polynucleotide sequence can be adjusted, and if the upper limit of the number range selected for the number of polynucleotide sequences is smaller, the number of bases required will be less. The above exemplary scheme selects 4 bases, and if the types of bases involved in coding are increased, the number of bases required will also be sharply reduced.

将多核苷酸序列与数据信息一一映射,并形成编码表,示例性地,以上述表1所述的多核苷酸序列,即DNA信息以及8位二进制信息一一映射,制成编码128种信息的DNA信息与二进制数据编码表,纵列四位与横排四位二进制数字共同组成8位二进制数字,每一组8位二进制数字对应一组DNA信息,如表2所示:The polynucleotide sequence is mapped to the data information one by one, and a coding table is formed. For example, the polynucleotide sequence described in Table 1 above, i.e., DNA information and 8-bit binary information are mapped one by one to form a DNA information and binary data coding table encoding 128 kinds of information. Four vertical digits and four horizontal digits of binary numbers together form 8-bit binary numbers. Each group of 8-bit binary numbers corresponds to a group of DNA information, as shown in Table 2:

表2 DNA信息与数据信息编码表
Table 2 DNA information and data information coding table

例如:00000000对应a1a1,而a1a1对应的多核苷酸序列组合为A8T4C4G4构成的序列。For example: 00000000 corresponds to a1a1, and the polynucleotide sequence combination corresponding to a1a1 is a sequence composed of A8T4C4G4.

在另一个具体实施方案中,可以采用不同的修饰基团对四种脱氧核糖核苷酸进行修饰,在不同修饰基团的作用下,例如针对每种脱氧核糖核苷酸进行32种修饰,则可以得到128种不同的脱氧核糖核苷酸,而采用64种修饰,则可以得到256种不同的脱氧核糖核苷酸,。In another specific embodiment, four deoxyribonucleotides can be modified with different modification groups. Under the action of different modification groups, for example, 32 modifications are performed on each deoxyribonucleotide, then 128 different deoxyribonucleotides can be obtained, and 64 modifications can be used to obtain 256 different deoxyribonucleotides.

将具有修饰碱基的核苷酸与数据信息匹配,并形成编码表。The nucleotides with modified bases are matched with the data information and a coding table is formed.

具体地,以上述256种具有修饰碱基的核苷酸为例,将其与8位二进制信息匹配,可以 获得256种的具有修饰碱基的核苷酸与二进制数据匹配表,如表3所示:Specifically, taking the above 256 nucleotides with modified bases as an example, matching them with 8-bit binary information can be A matching table of 256 nucleotides with modified bases and binary data is obtained, as shown in Table 3:

表3具有修饰碱基的核苷酸与数据信息匹配表

Table 3 Matching table of nucleotides with modified bases and data information

在表3中,其中A1表示第1种修饰的脱氧腺嘌呤核苷酸,依次类推,A、T、C、G+数字分别代表第N种修饰的脱氧核糖核苷酸。In Table 3, A1 represents the first modified deoxyadenine nucleotide, and so on, A, T, C, G+numbers represent the Nth modified deoxyribonucleotides.

示例性地,还可以选择32种不同修饰碱基的核糖核苷酸,共有128个不同的修饰碱基,直接与字符进行匹配,构成的编码表如下表4所示:Exemplarily, 32 ribonucleotides with different modified bases can also be selected, with a total of 128 different modified bases, which are directly matched with the characters, and the encoding table is shown in Table 4 below:

表4

Table 4

在一些具体实施方案中,也可以采用直接编码的方式,将每个组合与字符进行直接映射,从而达到文本文件直接转换成DNA信息文件。例如采用上述表4编码表中的DNA信息,其与ASII表中的128种字符信息一一映射,形成编码表,则采用该编码表可以直接将文本中的字符信息转化为DNA信息。In some specific implementation schemes, direct coding can also be used to directly map each combination with a character, so as to achieve direct conversion of a text file into a DNA information file. For example, the DNA information in the coding table of Table 4 is used, and it is mapped one by one with the 128 types of character information in the ASII table to form a coding table, and the character information in the text can be directly converted into DNA information using the coding table.

S13)将步骤S11)中的数据信息拆分,并在步骤S12)获得的DNA信息与数据信息编码表依次确认拆分后的数据信息对应的DNA信息。S13) splitting the data information in step S11), and confirming the DNA information corresponding to the split data information in sequence with the DNA information obtained in step S12) and the data information coding table.

示例性地,当上述DNA信息与数据信息编码表中映射的数据信息为8位二进制数时,将步骤S11)中的数据信息拆分为8位二进制数,并依次在上述DNA信息与数据信息编码表 寻找8位二进制数据对应的DNA信息。Exemplarily, when the data information mapped in the above DNA information and data information encoding table is an 8-bit binary number, the data information in step S11) is split into 8-bit binary numbers and mapped in the above DNA information and data information encoding table in sequence. Find the DNA information corresponding to 8-bit binary data.

S14)获得步骤S13)确认的链种类,并按顺序依次排列芯片的不同孔位,获得DNA信息存储载体;S14) obtaining the chain types confirmed in step S13) and sequentially arranging different wells of the chip to obtain a DNA information storage carrier;

根据步骤S13)确认的DNA信息,获得核苷酸或者多核苷酸序列方法可以是直接采用市售核苷酸或者针对不同的多核苷酸序列种类进行合成。也可以根据步骤S12)编码表中的种类进行大规模合成和储备,在储存时提取。According to the DNA information confirmed in step S13), the method for obtaining nucleotide or polynucleotide sequence can be directly using commercially available nucleotides or synthesizing for different polynucleotide sequence types. It can also be synthesized and stored on a large scale according to the types in the coding table in step S12), and extracted during storage.

确认的核苷酸或者多核苷酸序列可以不进一步连接,而是直接按顺序排列在芯片的不同孔位,这可以减少合成步骤。The confirmed nucleotide or polynucleotide sequences can be directly arranged in order in different wells of the chip without further connection, which can reduce the number of synthesis steps.

下面以几个具体的待存储数据演示本发明的方法,第一个具体的实施案例为存储英文单词和字符:Hello world!The method of the present invention is demonstrated below with several specific data to be stored. The first specific implementation case is storing English words and characters: Hello world!

待存储的信息为字符串“Hello world!”。先将该字符串中12个字符依次转换成12个8bit的二进制数,然后按照表1和表2的编码表将这些二进制数转换成不同多核苷酸序列组合。将不同字符对应的链种类合成后依次置于芯片的不同孔位,获得的包含所有信息的芯片,即为上述信息的存储介质。The information to be stored is the string "Hello world!". First, the 12 characters in the string are converted into 12 8-bit binary numbers in sequence, and then these binary numbers are converted into different polynucleotide sequence combinations according to the coding tables in Table 1 and Table 2. The chain types corresponding to different characters are synthesized and placed in different wells of the chip in sequence. The chip containing all the information is the storage medium for the above information.

此外,还可以以直接编码的形式对上述字符进行编码,如图2所示以字符与不同的DNA序列或带有不同修饰基团的非天然碱基的核糖核苷酸进行映射,如上表4所示,并进行编码和存储。In addition, the above characters can also be encoded in the form of direct encoding, as shown in Figure 2, by mapping the characters with different DNA sequences or ribonucleotides of non-natural bases with different modification groups, as shown in Table 4 above, and encoded and stored.

在第二个具体的实施方案中,待存储的信息为Goldman在12年的文章“Towards practical,high-capacity,low-maintenance information storage in synthesized DNA”编码的文本文件“wssnt10.txt”。In a second specific implementation scheme, the information to be stored is the text file "wssnt10.txt" encoded in Goldman's 2012 article "Towards practical, high-capacity, low-maintenance information storage in synthesized DNA".

所述的wssnt10.txt如下所示:

The wssnt10.txt is as follows:

首先将上述大小为107,738byte的文本直接编码转换成碱基组合信息文件,如“!”对应的多核苷酸组合为e1e2即碱基序列为“4A4T4C6G4A4T4C6G”。转换为DNA信息后示例性片段如下所示:
First, the above 107,738-byte text is directly encoded and converted into a base combination information file. For example, the polynucleotide combination corresponding to "!" is e1e2, that is, the base sequence is "4A4T4C6G4A4T4C6G". After conversion into DNA information, an exemplary fragment is as follows:

进一步转换为以碱基种类和数量显示的链种类,示例性片段如下所示:
Further converted to chain types displayed in terms of base types and numbers, an example fragment is shown below:

根据获得的DNA信息,制备生物芯片,用于DNA信息存储。Based on the obtained DNA information, a biochip is prepared for DNA information storage.

在第三个具体的实施方案中,待存储的信息为图片。In a third specific implementation manner, the information to be stored is a picture.

首先将图片信息转换成二进制数据,示例性片段如下所示:

First, convert the image information into binary data. The sample snippet is as follows:

生成的二进制文件根据上述表1和2的编码表进行编码操作,如“0100 0001”对应的多核苷酸组合为e1e2即碱基序列为“4A4T4C6G4A4T4C6G”。The generated binary file is encoded according to the encoding tables in Tables 1 and 2 above. For example, the polynucleotide combination corresponding to "0100 0001" is e1e2, that is, the base sequence is "4A4T4C6G4A4T4C6G".

转换为多核苷酸后示例性片段如下所示:
An exemplary fragment after conversion to polynucleotides is shown below:

进一步转换为以碱基种类和数量显示的DNA信息,示例性片段如下所示:
Further converted to DNA information displayed in terms of base types and numbers, an example fragment is shown below:

对于碱基编码方式,前三个具体实施案例选取了4种天然碱基ATCG进行演示,但是本领域技术人员在采用本发明的方法进行编码时可以根据需要进行选择,可以多核苷酸序列,甚至其组合,也可以是单个的非天然碱基的脱氧核糖核苷酸,而不仅限于天然碱基的脱氧核糖核苷酸。这是本发明创新性地以质谱作为检测手段,质谱不但能区分天然碱基还能鉴定非天然碱基,做到了DNA测序无法做到的功能。As for the base encoding method, the first three specific implementation cases selected four natural bases ATCG for demonstration, but those skilled in the art can choose as needed when using the method of the present invention for encoding, which can be a polynucleotide sequence, or even a combination thereof, or a single non-natural base deoxyribonucleotide, not limited to natural base deoxyribonucleotides. This is because the present invention innovatively uses mass spectrometry as a detection method, which can not only distinguish natural bases but also identify non-natural bases, achieving a function that DNA sequencing cannot achieve.

下面以非天然碱基氨脱氧核糖核酸为例,提供几种具体实施方案。The following provides several specific implementation plans using non-natural base amino deoxyribonucleic acid as an example.

第四个具体的实施案例以存储傲慢与偏见原版第一章为示例进行演示,该文本文件大小为4,501字节,首先进行二进制编码的文本转换生成的二进制文件大小为36,008字节,如下所示部分节选:
The fourth specific implementation case is demonstrated by storing the first chapter of the original version of Pride and Prejudice. The text file size is 4,501 bytes. The binary file size generated by the binary-encoded text conversion is 36,008 bytes, as shown in the following excerpt:

在该实施例中,将上述文本先转换为二进制数据,部分节选二进制数据信息如下所示,
In this embodiment, the above text is first converted into binary data, and some excerpts of binary data information are shown below:

2.将二进制数据根据上述编码表,即表3转换成多碱基序列文件大小为13,478字节,节选的编码后的序列信息如下所示:
2. Convert the binary data into a polybase sequence file with a size of 13,478 bytes according to the above encoding table, i.e., Table 3. The excerpt of the encoded sequence information is as follows:

将编码获得的DNA信息对应的核苷酸按照其顺序分别在生物芯片的不同孔位上进行存储。The nucleotides corresponding to the encoded DNA information are stored in different wells of the biochip according to their sequence.

从此实施例可以看出,每个核苷酸都能够编码8位二进制码从而本发明的逻辑编码能力可达8bit/nt,已经突破了现有的四碱基编码的理论极限。It can be seen from this embodiment that each nucleotide can encode an 8-bit binary code, so the logical encoding capacity of the present invention can reach 8 bits/nt, which has broken through the theoretical limit of the existing four-base encoding.

第五个实施例待存储的信息依然是傲慢与偏见第一章文本,但与第四个实施例不同的是,在本实施例中,不将文本信息转换为二进制数据,而是直接以文本信息中的字符作为进行编码的信息数据。The information to be stored in the fifth embodiment is still the text of Chapter 1 of Pride and Prejudice, but unlike the fourth embodiment, in this embodiment, the text information is not converted into binary data, but the characters in the text information are directly used as the information data to be encoded.

采用上述的编码表,表4进行编码后可以直接获得对应的DNA信息,示例性的如下所示:
Using the above coding table, the corresponding DNA information can be directly obtained after encoding Table 4, as shown below:

通过对比上述两个具体实施例的编码效率可知,直接编码(第五实施例)与间接编码(第四实施例)虽然编码密度都是8bit/nt,但是直接编码使用的修饰碱基种类更少,更为简便。By comparing the coding efficiency of the above two specific embodiments, it can be seen that although the coding density of direct coding (fifth embodiment) and indirect coding (fourth embodiment) is 8bit/nt, direct coding uses fewer types of modified bases and is simpler.

将上述编码后的核苷酸依次存储在芯片的不同孔位中,获得存储介质。The encoded nucleotides are sequentially stored in different wells of the chip to obtain a storage medium.

第六个实施方案中待存储的信息为图片文件,待存储的图片信息参见图9,图9原图为彩色图片。其RGB信息如下所示:In the sixth implementation scheme, the information to be stored is a picture file. The picture information to be stored is shown in FIG9 . The original picture in FIG9 is a color picture. Its RGB information is as follows:

RGB格式信息:
RGB format information:

将图片信息直接编码转换为核苷酸信息,编码后的序列信息如下所示:
The image information is directly encoded and converted into nucleotide information. The encoded sequence information is as follows:

将核苷酸依次存储在芯片的不同孔位中,获得存储介质。The nucleotides are sequentially stored in different wells of the chip to obtain a storage medium.

还可以将该图片信息根据像素划分为不同的像素点,以不同的像素点的RGB像素信息作为数据信息。进一步设计编码表,所述的编码表中包含不同色彩,即RGB,分别在0-255深度时对应的DNA信息,并以此进行编码和存储。The image information can also be divided into different pixel points according to the pixels, and the RGB pixel information of different pixel points is used as the data information. A coding table is further designed, which contains DNA information corresponding to different colors, namely RGB, at a depth of 0-255, and is encoded and stored in this way.

实施例2基于天然及非天然碱基质谱解码的DNA信息解码方法Example 2 DNA information decoding method based on natural and non-natural base matrix spectrum decoding

S21)将DNA信息存储载体中不同孔位待测序列进行质谱检测,获得每个孔位中的DNA信息的分子量信息;S21) performing mass spectrometry on the sequences to be tested at different wells in the DNA information storage carrier to obtain molecular weight information of the DNA information in each well;

本发明不同于现有技术中采用测序仪,需要对碱基的顺序进行测序,而是利用MALDI质谱测序将芯片上每一个位置的链种类中的碱基组合读取出来。Different from the prior art which uses a sequencer and needs to sequence the order of bases, the present invention uses MALDI mass spectrometry sequencing to read out the base combination in the chain type at each position on the chip.

在信息纪录过程中,不同的核苷酸或者多核苷酸序列或者其组合被分置于不同的芯片孔位中,分别针对不同孔位中的待测序列进行质谱检测,并按照不同孔位的顺序将物质的峰打出。During the information recording process, different nucleotide or polynucleotide sequences or their combinations are placed in different chip wells, and mass spectrometry detection is performed on the sequences to be tested in different wells, and the peaks of the substances are ejected in the order of different wells.

在质谱结果中可以看到待测序列的分子量以及其受到轰击后的碎片峰,根据这些数据能够确认待测序在编码表中的DNA信息种类。The mass spectrometry results show the molecular weight of the sequence to be tested and its fragment peaks after bombardment. Based on these data, the type of DNA information to be sequenced in the coding table can be confirmed.

在质谱测序前还可以包括:Before mass spectrometry sequencing, the following may also be included:

S211)将待测序列进行酶切以及纯化的步骤;S211) performing enzyme digestion and purification on the sequence to be tested;

S212)将纯化后的片段进行质谱检测,获得分子量。S212) The purified fragments are subjected to mass spectrometry to obtain molecular weight.

在步骤S211)中,进行纯化的方法为乙醇沉淀、微量透析或MillporeZiptip微量层析。In step S211), the purification method is ethanol precipitation, microdialysis or MillporeZiptip microchromatography.

在步骤S21)中质谱测序的方法为MALDI质谱测序。In step S21), the mass spectrometry sequencing method is MALDI mass spectrometry sequencing.

S22)并根据DNA信息的分子量信息分析其碱基组合信息确认不同孔位对应的DNA信息;S22) analyzing the base combination information according to the molecular weight information of the DNA information to confirm the DNA information corresponding to the different pore positions;

MALDI质谱测序能够将具有不同碱基的核糖核酸或脱氧核糖核酸以不同飞行时间序列的峰显示,从这些时间序列的峰中鉴定其所包含的碱基组合情况,将这些碱基组合直接翻译 成其对应的不同的核酸序列。MALDI mass spectrometry sequencing can display RNA or DNA with different bases as peaks of different flight time sequences, identify the base combinations contained in these time sequence peaks, and directly translate these base combinations into Into their corresponding different nucleic acid sequences.

S23)根据步骤S22)获得的DNA信息以及上述步骤S12)的编码表表解读数据信息单位;S23) interpreting the data information unit according to the DNA information obtained in step S22) and the coding table in step S12);

步骤S23)中链种类与数据信息编码表为上述基于天然和非天然碱基的DNA存储方法中步骤S12)获得的链种类与数据信息编码表。The chain type and data information coding table in step S23) is the chain type and data information coding table obtained in step S12) in the above-mentioned DNA storage method based on natural and non-natural bases.

在一个具体的实施方案中,根据步骤S22)获得的链种类,在表2中确认对应的二进制数字,由于在表2中每两条链种类对应8位二进制数字,因此按照每两个序列链对应一个ASCII字符的方式进行转换,得到全部二进制数据;In a specific embodiment, according to the chain type obtained in step S22), the corresponding binary number is confirmed in Table 2. Since every two chain types in Table 2 correspond to 8-bit binary numbers, the conversion is performed in a manner that every two serial chains correspond to one ASCII character to obtain all binary data;

S24)根据S23)获得的数据信息单位进行拼接,解码数据信息获得存储的计算机数据信息。S24) splicing the data information units obtained in S23) and decoding the data information to obtain the stored computer data information.

根据实施例1展示的6个具体的待存储数据的具体实施方案,分别对应给出其解码过程中的步骤:According to the six specific implementation plans of the data to be stored shown in Example 1, the steps in the decoding process are respectively given:

首先针对存储英文单词和字符:Hello world!的芯片,先将芯片中不同12个孔位中的多核苷酸序列组合进行酶切和纯化,然后以MALDI质谱方式得到链种类中具体的核苷酸种类和数量。根据MALDI质谱结果可以确认每个孔位中所对应的多核苷酸序列的类型,并根据编码过程中所采用的DNA信息与数据信息编码表的对应关系,确认对应的二进制数,根据二进制数直接转换为对应的字符,即得到原始数据信息即字符串Hello world!。First, for the chip storing English words and characters: Hello world!, the polynucleotide sequence combinations in the 12 different wells in the chip are first digested and purified, and then the specific nucleotide types and quantities in the chain types are obtained by MALDI mass spectrometry. According to the MALDI mass spectrometry results, the type of polynucleotide sequence corresponding to each well can be confirmed, and the corresponding binary number can be confirmed according to the correspondence between the DNA information used in the encoding process and the data information encoding table. The binary number is directly converted into the corresponding character, that is, the original data information, that is, the string Hello world!, is obtained.

在第二个具体实施方案,即存储的信息为Goldman在12年的文章“Towards practical,high-capacity,low-maintenance information storage in synthesized DNA”编码的文本文件“wssnt10.txt”。In the second specific embodiment, the stored information is the text file "wssnt10.txt" encoded in Goldman's 2012 article "Towards practical, high-capacity, low-maintenance information storage in synthesized DNA".

先将芯片中不同孔位中的多核苷酸序列分别进行酶切和纯化,然后以MALDI质谱方式得到链种类中具体的核苷酸种类和数量。根据MALDI质谱结果可以确认每个孔位中所对应的DNA信息的类型,并根据编码过程中所采用的DNA信息与数据信息编码表的对应关系,将不同的编码碱基转换为对应的字符,即得到原始数据信息即文本文件“wssnt10.txt”。First, the polynucleotide sequences in different wells in the chip are digested and purified respectively, and then the specific nucleotide types and quantities in the chain types are obtained by MALDI mass spectrometry. According to the MALDI mass spectrometry results, the type of DNA information corresponding to each well can be confirmed, and according to the corresponding relationship between the DNA information used in the encoding process and the data information encoding table, the different coding bases are converted into corresponding characters, that is, the original data information, that is, the text file "wssnt10.txt" is obtained.

在第三个具体实施方案,即存储的信息为图片信息。与前两个具体实施方案类型,区别仅在于将获得的二进制数据换转换为图片信息。In the third specific embodiment, the stored information is picture information. The difference from the first two specific embodiments is that the obtained binary data is converted into picture information.

在第四个具体实施方案中,针对存储的傲慢与偏见第一章:先将芯片中不同孔位中的具有修饰碱基的核苷酸进行纯化,然后以MALDI质谱方式得到具有修饰碱基的核苷酸中具体的核苷酸种类和数量。根据MALDI质谱结果可以确认每个孔位中所对应的具有修饰碱基的核苷酸的类型,并根据编码过程中所采用的具有修饰碱基的核苷酸与数据信息匹配表的对应关系,确认对应的二进制数,根据二进制数直接转换为对应的字符,即得到原始数据信息即傲慢与偏见第一章。In a fourth specific implementation scheme, for the stored Pride and Prejudice Chapter 1: first purify the nucleotides with modified bases in different wells in the chip, and then obtain the specific types and quantities of nucleotides in the nucleotides with modified bases by MALDI mass spectrometry. According to the MALDI mass spectrometry results, the type of nucleotides with modified bases corresponding to each well can be confirmed, and according to the correspondence between the nucleotides with modified bases used in the encoding process and the data information matching table, the corresponding binary number can be confirmed, and the binary number is directly converted into the corresponding character, that is, the original data information, namely Pride and Prejudice Chapter 1, can be obtained.

在第五个具体实施方案,存储的信息为傲慢与偏见第一章。In the fifth specific implementation scheme, the stored information is Chapter 1 of Pride and Prejudice.

先将芯片中不同孔位中的具有修饰碱基的核苷酸分别进行纯化,然后以MALDI质谱方式得到具有修饰碱基的核苷酸中具体的核苷酸种类、。根据MALDI质谱结果可以确认每个孔位中所对应的具有修饰碱基的核苷酸的类型,并根据编码过程中所采用的具有修饰碱基的核苷酸与数据信息匹配表的对应关系,可以直接确认对应的字符,根据字符进行拼接,即得到原始计算机数据信息。First, the nucleotides with modified bases in different wells in the chip are purified separately, and then the specific nucleotide types in the nucleotides with modified bases are obtained by MALDI mass spectrometry. The type of nucleotides with modified bases corresponding to each well can be confirmed according to the MALDI mass spectrometry results, and the corresponding characters can be directly confirmed according to the correspondence between the nucleotides with modified bases and the data information matching table used in the encoding process, and spliced according to the characters to obtain the original computer data information.

在第六个具体实施方案,即存储的信息为图片信息。与第4和5个具体实施方案类似,区别仅在于将获得的二进制数据换转换为图片信息。In the sixth specific embodiment, the stored information is picture information. It is similar to the fourth and fifth specific embodiments, except that the obtained binary data is converted into picture information.

实施例3基于天然和非天然碱基质谱解码DNA存储的编码装置Example 3 Encoding device for decoding DNA storage based on natural and unnatural base matrix spectra

对应于上文实施例1所述的编码方法,图5示出了本发明实施例3提供的编码装置的结构框图,为了便于说明,图5中仅示出了与本发明实施例3相关的部分。Corresponding to the encoding method described in the above embodiment 1, FIG5 shows a structural block diagram of the encoding device provided by embodiment 3 of the present invention. For the sake of ease of explanation, FIG5 only shows the part related to embodiment 3 of the present invention.

参照图5,该编码装置可以包括:Referring to FIG5 , the encoding device may include:

数据信息提取单元,用于提取待存储计算机数据,并将待存储的计算机数据转换为信息对应的数据信息;A data information extraction unit, used for extracting the computer data to be stored and converting the computer data to be stored into data information corresponding to the information;

数据信息与DNA信息转换单元,用于根据预设的映射关系,将所述数据信息序列拆分或组装并转换为DNA信息;A data information and DNA information conversion unit, used to split or assemble the data information sequence and convert it into DNA information according to a preset mapping relationship;

合成和存储单元,用于合成数据信息与DNA信息转换单元转换获得的DNA序列,并按 照顺序在存储单元芯片的不同孔位上保存DNA序列。The synthesis and storage unit is used to synthesize the DNA sequence obtained by converting the data information with the DNA information conversion unit, and The DNA sequences are stored in different wells of the memory cell chip in sequence.

数据信息提取单元可以包含信息存储单元以及转换单元,信息存储单元能够用于存储和调用待存储的计算机信息,如文本、数字、图片、音频、视频等。转换单元能够将计算机信息以常规方法转换为任意一种数字信息,如字符、二进制数据信息、八进制数据信息、十六进制数据信息、十进制数据信息、RGB像素信息等。The data information extraction unit may include an information storage unit and a conversion unit, wherein the information storage unit can be used to store and call computer information to be stored, such as text, numbers, pictures, audio, video, etc. The conversion unit can convert computer information into any digital information in a conventional way, such as characters, binary data information, octal data information, hexadecimal data information, decimal data information, RGB pixel information, etc.

数据信息与DNA信息转换单元可以包括DNA信息编码单元、DNA信息与数据信息匹配单元以及DNA信息信息转换单元。所述DNA信息编码单元用于记录每一种DNA信息对应的碱基种类和数量的组合。DNA信息和数据信息匹配单元用于调用DNA信息编码单元中的不同DNA信息与数据信息单位进行一一匹配和对应。所述DNA信息转换单元用于将数据信息提取单元中的数字信息根据DNA信息和数据信息匹配单元的信息一一转换为DNA信息。The data information and DNA information conversion unit may include a DNA information encoding unit, a DNA information and data information matching unit, and a DNA information information conversion unit. The DNA information encoding unit is used to record the combination of base types and quantities corresponding to each type of DNA information. The DNA information and data information matching unit is used to call different DNA information in the DNA information encoding unit to match and correspond to the data information units one by one. The DNA information conversion unit is used to convert the digital information in the data information extraction unit into DNA information one by one according to the information of the DNA information and data information matching unit.

在一个具体的技术方案中,可以仅选择4种不同碱基的脱氧核糖核苷酸即可形成至少128种不同的链种类。即通过不同数量的4种不同碱基的脱氧核糖核酸进行组合。链种类的长度在10~24之间共设置8个长度梯度,每个长度梯度又设置四种不同碱基含量的DNA链,相同长度的4种链可以任意组合出16种不同搭配。8种长度梯度,因此总共有8×16=128种。通过增加核苷酸种类或者调解链种类的长度都可以放大或缩小链种类的数量,以满足不同需求的信息存储量。In a specific technical solution, only 4 deoxyribonucleotides with different bases can be selected to form at least 128 different chain types. That is, different numbers of deoxyribonucleic acids with different bases are combined. The length of the chain type is between 10 and 24, and a total of 8 length gradients are set. Each length gradient is set with four DNA chains with different base contents. The four chains of the same length can be arbitrarily combined into 16 different combinations. There are 8 length gradients, so there are a total of 8×16=128 types. The number of chain types can be enlarged or reduced by increasing the number of nucleotide types or adjusting the length of the chain types to meet the information storage requirements of different needs.

在另一个具体的技术方案中,可以仅选择4种脱氧核糖核苷酸,每种脱氧核糖核苷酸均进行32种后者64种不同的修饰,即可形成至少128种或256种不同的具有修饰碱基的核苷酸。通过增加核苷酸种类或者调节具有修饰碱基的核苷酸的修饰种类和数量都可以放大或缩小具有修饰碱基的核苷酸的数量,以满足不同需求的信息存储量。In another specific technical solution, only 4 deoxyribonucleotides can be selected, and each deoxyribonucleotide is subjected to 32 or 64 different modifications, so as to form at least 128 or 256 different nucleotides with modified bases. By increasing the number of nucleotides or adjusting the type and number of modifications of the nucleotides with modified bases, the number of nucleotides with modified bases can be enlarged or reduced to meet the information storage capacity of different requirements.

合成和存储单元包括合成单元和存储单元,合成能够获得数据信息与DNA信息转换单元确认的存储信息用的DNA信息。存储单元能够存储记录了数据信息的DNA序列,存储单元为具有多个孔位的芯片,每个孔位容纳一条序列,换言之,每个孔位中对应DNA信息为一种。芯片上的孔位按顺序排列。The synthesis and storage unit includes a synthesis unit and a storage unit, and the synthesis can obtain the DNA information for the storage information confirmed by the data information and the DNA information conversion unit. The storage unit can store the DNA sequence that records the data information. The storage unit is a chip with multiple wells, and each well accommodates a sequence. In other words, each well corresponds to one type of DNA information. The wells on the chip are arranged in order.

实施例4基于天然和非天然碱基质谱解码核酸存储的解码装置Example 4 Decoding device for decoding nucleic acid storage based on natural and unnatural base matrix spectra

对应于上文实施例2所述的解码方法,图6示出了本发明实施例4提供的解码装置的结构框图,为了便于说明,图6中仅示出了与本发明实施例4相关的部分。Corresponding to the decoding method described in the above embodiment 2, FIG6 shows a structural block diagram of a decoding device provided in embodiment 4 of the present invention. For ease of explanation, FIG6 only shows the part related to embodiment 4 of the present invention.

参照图6,所述解码装置可以包括:6, the decoding device may include:

读取单元,用于通过质谱仪检测合成和存储单元中储存的待测序列,并根据分子量确认其DNA信息;A reading unit, used to detect the sequence to be tested stored in the synthesis and storage unit by a mass spectrometer, and confirm its DNA information according to the molecular weight;

DNA信息与数据信息转换单元,用于根据预设的映射关系即DNA信息存储装置中相同的数据信息与DNA信息映射关系,将读取单元获得的DNA信息转换为数据信息;A DNA information and data information conversion unit, used to convert the DNA information obtained by the reading unit into data information according to a preset mapping relationship, that is, the same data information and DNA information mapping relationship in the DNA information storage device;

所述读取模块包含了用于检测存储信息的芯片中对每个孔位中的核酸序列进行质谱检测的质谱仪,还可以包含对于每个孔位中的核酸序列进行纯化和或酶解等预处理。The reading module includes a mass spectrometer for performing mass spectrometry detection on the nucleic acid sequence in each well position in the chip for detecting stored information, and may also include pre-treatment such as purification and/or enzymatic hydrolysis of the nucleic acid sequence in each well position.

计算机数据输出单元,用于将DNA信息与数据信息转换单元获得的数据信息转换为存储的计算机数据;A computer data output unit, used to convert the DNA information and the data information obtained by the data information conversion unit into stored computer data;

需要说明的是,上述装置/单元之间的信息交互、执行过程等内容,由于与本发明方法实施例基于同一构思,其具体功能及带来的技术效果,具体可参见方法实施例部分,此处不再赘述。It should be noted that the information interaction, execution process, etc. between the above-mentioned devices/units are based on the same concept as the method embodiment of the present invention. Their specific functions and technical effects can be found in the method embodiment part and will not be repeated here.

图7为本发明实施例提供的计算机设备的结构示意图。如图7所示,该实施例的计算机设备包括:至少一个处理器(图7中仅示出一个)、存储以及存储在所述存储器中并可在所述至少一个处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现本发明任意的存储方法以及解码办法。Fig. 7 is a schematic diagram of the structure of a computer device provided by an embodiment of the present invention. As shown in Fig. 7, the computer device of this embodiment includes: at least one processor (only one is shown in Fig. 7), storage, and a computer program stored in the memory and executable on the at least one processor, and when the processor executes the computer program, any storage method and decoding method of the present invention are implemented.

所述计算机设备可以是笔记本电脑、台式电脑、平板电脑、手机等计算设备。该计算机设备设备至少包括处理器、存储器。本领域技术人员可以理解,图7仅仅是计算机设备的示意图,并不构成对计算机设备的限定,还可以包含其他部件,例如信息输入或输出部件。 The computer device may be a computing device such as a laptop computer, a desktop computer, a tablet computer, a mobile phone, etc. The computer device includes at least a processor and a memory. Those skilled in the art will appreciate that FIG. 7 is only a schematic diagram of a computer device and does not constitute a limitation on the computer device, and may also include other components, such as information input or output components.

本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时可实现上述各个方法实施例中的步骤。An embodiment of the present application further provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps in the above-mentioned method embodiments can be implemented.

所述计算机可读存储介质还可以是计算机能够存取的任何可用介质或者是数据存储设备,例如可以用介质集成的服务器、数据中心等。所述可用介质可以是磁性介质、DVD或者半导体介质等。The computer-readable storage medium may also be any available medium or data storage device that can be accessed by a computer, such as a server or data center integrated with the medium. The available medium may be a magnetic medium, a DVD, or a semiconductor medium.

本发明实施例提供了一种计算机程序产品,当计算机程序产品在终端设备上运行时,使得终端设备执行时可实现上述各个方法实施例中的步骤。An embodiment of the present invention provides a computer program product. When the computer program product is run on a terminal device, the terminal device can implement the steps in the above-mentioned method embodiments when executing the computer program product.

所述终端设备可以是通用计算机、掌上计算机、手机、专用计算机、计算机网络、或者其他可编程装置、或具有编程功能的存储装置。所述计算机程序可以存储在计算机可读存储介质中,或者通过网络向另一个计算机可读存储介质传输。The terminal device may be a general-purpose computer, a handheld computer, a mobile phone, a special-purpose computer, a computer network, or other programmable devices, or storage devices with programming functions. The computer program may be stored in a computer-readable storage medium, or transmitted to another computer-readable storage medium via a network.

另外,本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.

在上述各个实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本发明各实施例所述的步骤和方法。In the above embodiments, all or part of the above embodiments may be implemented by software, hardware, firmware or any combination thereof. When the computer program instructions are loaded and executed on the computer, all or part of the steps and methods described in the embodiments of the present invention are generated.

可以理解,本申请中描述的系统、装置和方法也可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述功能单位的划分,可以根据实际需要进行重新划分,并不影响其满足或完成本发明上述的功能和步骤。例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。可以根据存储和解码的方法进行上述装置的单元进行合并或者重新的划分,也可以根据实际需要添加额外的功能单元以满足上述步骤和方法的要求。It is to be understood that the systems, devices and methods described in the present application may also be implemented in other ways. For example, the device embodiments described above are merely illustrative, and for example, the division of the functional units may be re-divided according to actual needs without affecting the satisfaction or completion of the above-mentioned functions and steps of the present invention. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. The units of the above-mentioned devices may be merged or re-divided according to the storage and decoding methods, and additional functional units may be added according to actual needs to meet the requirements of the above-mentioned steps and methods.

以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。 The embodiments described above are only used to illustrate the technical solutions of the present application, rather than to limit them. Although the present application has been described in detail with reference to the aforementioned embodiments, a person skilled in the art should understand that the technical solutions described in the aforementioned embodiments may still be modified, or some of the technical features may be replaced by equivalents. Such modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present application, and should all be included in the protection scope of the present application.

Claims (17)

基于天然和非天然碱基的DNA信息存储方法,其特征在于,所述DNA信息存储方法包括以下步骤:The method for storing DNA information based on natural and non-natural bases is characterized in that the method for storing DNA information comprises the following steps: S11):提取待存储计算机数据信息的数据信息;S11): extracting data information of computer data information to be stored; S12):设计编码表,所述的编码表为DNA信息与数据信息单位进行一一映射所形成的;S12): Designing a coding table, wherein the coding table is formed by one-to-one mapping of DNA information and data information units; S13)将步骤S11)中的数据信息拆分形成数据信息单位,并在步骤S12)获得编码表中依次确认拆分后的数据信息单位对应的DNA信息;S13) splitting the data information in step S11) into data information units, and confirming the DNA information corresponding to the split data information units in the coding table obtained in step S12); S14)获得步骤S13)确认的DNA信息所对应的脱氧核糖核苷酸或者由两个以上脱氧核糖核苷酸组成的多核苷酸序列,并按顺序依次排列在芯片的不同孔位,获得DNA信息存储载体;S14) obtaining the deoxyribonucleotides or polynucleotide sequences consisting of two or more deoxyribonucleotides corresponding to the DNA information confirmed in step S13), and arranging them in sequence in different wells of the chip to obtain a DNA information storage carrier; 所述DNA信息为脱氧核糖核苷酸或者由两个以上脱氧核糖核苷酸组成的多核苷酸序列,或者为脱氧核糖核苷酸或者由两个以上脱氧核糖核苷酸组成的多核苷酸序列中任意两种的组合;The DNA information is a deoxyribonucleotide or a polynucleotide sequence consisting of two or more deoxyribonucleotides, or a combination of any two of the deoxyribonucleotides or the polynucleotide sequence consisting of two or more deoxyribonucleotides; 在编码表中,每种DNA信息的分子量均不相同,且两两之间的分子量差距不低于10。In the coding table, the molecular weight of each DNA information is different, and the molecular weight difference between each is no less than 10. 根据权利要求1所述的DNA存储方法,其特征在于,所述脱氧核糖核苷酸为天然脱氧核糖核苷酸或非天然脱氧核糖核苷酸,所述非天然脱氧核糖核苷酸为进行了碱基修饰的脱氧核糖核苷酸。The DNA storage method according to claim 1, characterized in that the deoxyribonucleotide is a natural deoxyribonucleotide or a non-natural deoxyribonucleotide, and the non-natural deoxyribonucleotide is a base-modified deoxyribonucleotide. 根据权利要求1所述的DNA存储方法,其特征在于,所述两个以上脱氧核糖核苷酸组成的多核苷酸序列中的脱氧核糖核苷酸为天然脱氧核糖核苷酸或非天然脱氧核糖核苷酸;The DNA storage method according to claim 1, characterized in that the deoxyribonucleotides in the polynucleotide sequence composed of two or more deoxyribonucleotides are natural deoxyribonucleotides or non-natural deoxyribonucleotides; 优选地,所述两个以上脱氧核糖核苷酸组成的多核苷酸序列中通过调节多核苷序列中脱氧核糖核苷酸的种类和数量的组合使其具有不同分子量。Preferably, the polynucleotide sequence composed of two or more deoxyribonucleotides has different molecular weights by adjusting the combination of the types and quantities of deoxyribonucleotides in the polynucleotide sequence. 根据权利要求1所述的DNA存储方法,其特征在于,所述的编码表中形成的32种-1024种映射关系。The DNA storage method according to claim 1 is characterized in that the 32 to 1024 mapping relationships are formed in the coding table. 根据权利要求1所述的DNA存储方法,其特征在于,在步骤S11)中,所述数据信息为字符信息、RGB信息、二进制数据信息、八进制数据信息、十六进制数据信息、十进制数据信息。The DNA storage method according to claim 1 is characterized in that, in step S11), the data information is character information, RGB information, binary data information, octal data information, hexadecimal data information, or decimal data information. 根据权利要求1所述的DNA存储方法,其特征在于,在编码表中,每种DNA信息的分子量均不相同,且两两之间的分子量差距不低于10。The DNA storage method according to claim 1 is characterized in that, in the coding table, the molecular weight of each DNA information is different, and the difference in molecular weight between each two is not less than 10. 根据权利要求1所述的DNA存储方法,其特征在于,所述编码表中将DNA信息与4位或8位二进制数一一映射;或者The DNA storage method according to claim 1, characterized in that the coding table maps DNA information to 4-bit or 8-bit binary numbers one by one; or 所述编码表中将DNA信息与ASCII码表一一映射;或者The coding table maps DNA information to the ASCII code table one by one; or 所述编码表中将DNA信息与128种字符一一映射;或者The coding table maps DNA information to 128 characters one by one; or 所述编码表中将DNA信息与RGB像素信息一一映射;或者The coding table maps DNA information to RGB pixel information one by one; or 所述编码表中DNA信息为128个或256个不同的具有修饰碱基的核苷酸,为分别针对A、T、C、G进行32种或64种不同种类的修饰,共获得128个或256个不同的具有修饰碱基的核苷酸;或者The DNA information in the coding table is 128 or 256 different nucleotides with modified bases, and 32 or 64 different types of modifications are performed on A, T, C, and G, respectively, to obtain a total of 128 or 256 different nucleotides with modified bases; or 所述编码表中将DNA信息中的一种与RGB像素信息中的色彩信息即R\G\B一一映射,DNA信息中的另一种与RGB像素信息色彩信息中的数字0-255一一映射,并将两种DNA信息的组合形成RGB像素信息的一一映射关系;或者In the coding table, one of the DNA information is mapped one by one with the color information in the RGB pixel information, i.e., R\G\B, and the other of the DNA information is mapped one by one with the numbers 0-255 in the color information of the RGB pixel information, and the combination of the two DNA information forms a one-to-one mapping relationship with the RGB pixel information; or 所述编码表中包含128条组合多核苷酸序列,多核苷酸序列的长度分为8组,所述8组多核苷酸序列的长度依次延长,分别包含10-24个碱基的核苷酸,每组4条多核苷酸序列,每组的4条多核苷酸序列中的碱基种类和或数量不同。The coding table contains 128 combined polynucleotide sequences, the lengths of the polynucleotide sequences are divided into 8 groups, the lengths of the 8 groups of polynucleotide sequences are successively extended, each containing 10-24 base nucleotides, each group has 4 polynucleotide sequences, and the types and or quantities of bases in the 4 polynucleotide sequences in each group are different. 如权利要求1-7任一项所述的DNA存储方法获得DNA信息存储载体的信息读取方法,其特征在于,所述信息读取方法包含以下步骤:The method for reading information from a DNA information storage carrier obtained by the DNA storage method according to any one of claims 1 to 7, characterized in that the information reading method comprises the following steps: S21)将DNA信息存储载体中不同孔位待测序列进行质谱检测,获得每个孔位中的DNA信息的分子量信息;S21) performing mass spectrometry on the sequences to be tested at different wells in the DNA information storage carrier to obtain molecular weight information of the DNA information in each well; S22)并根据DNA信息的分子量信息分析其碱基组合信息确认不同孔位对应的DNA信息; S22) analyzing the base combination information according to the molecular weight information of the DNA information to confirm the DNA information corresponding to the different pore positions; S23)根据步骤S22)获得的DNA信息以及上述步骤S12)的编码表表解读数据信息单位;S23) interpreting the data information unit according to the DNA information obtained in step S22) and the coding table in step S12); S24)根据S23)获得的数据信息单位进行拼接,解码数据信息获得存储的计算机数据信息。S24) splicing the data information units obtained in S23) and decoding the data information to obtain the stored computer data information. 根据权利要求8所述的信息读取方法,其特征在于,在步骤S21)中质谱测序的方法为MALDI质谱测序。The information reading method according to claim 8 is characterized in that the mass spectrometry sequencing method in step S21) is MALDI mass spectrometry sequencing. 根据权利要求8所述的信息读取方法,其特征在于,在步骤S21)中质谱测序的方法包括以下步骤:The information reading method according to claim 8, characterized in that in step S21), the mass spectrometry sequencing method comprises the following steps: S211)将待测序列进行酶切和或纯化;S211) digesting and/or purifying the sequence to be tested; S212)将纯化后的片段进行质谱检测,获得分子量。S212) The purified fragments are subjected to mass spectrometry to obtain molecular weight. 根据权利要求8所述的信息读取方法,其特征在于,步骤S211)中纯化的方法为乙醇沉淀、微量透析或MillporeZiptip微量层析。The information reading method according to claim 8 is characterized in that the purification method in step S211) is ethanol precipitation, microdialysis or MillporeZiptip microchromatography. 一种的基于天然和非天然碱基DNA信息存储的编码装置,其特征在于,所述DNA信息存储装置包括:A coding device based on natural and unnatural base DNA information storage, characterized in that the DNA information storage device comprises: 数据信息提取单元,用于提取待存储计算机数据,并将待存储的计算机数据转换为信息对应的数据信息;A data information extraction unit, used for extracting the computer data to be stored and converting the computer data to be stored into data information corresponding to the information; 数据信息与DNA信息转换单元,用于根据预设的映射关系,将所述数据信息序列拆分或组装并转换为DNA信息;A data information and DNA information conversion unit, used to split or assemble the data information sequence and convert it into DNA information according to a preset mapping relationship; 合成和存储单元,用于合成数据信息与DNA信息转换单元转换获得的核酸序列,并按照顺序在存储单元芯片的不同孔位上保存DNA信息所对应的脱氧核糖核苷酸、由两个以上脱氧核糖核苷酸组成的多核苷酸序列或其组合;A synthesis and storage unit, used to synthesize the data information and the nucleic acid sequence converted by the DNA information conversion unit, and store the deoxyribonucleotides corresponding to the DNA information, a polynucleotide sequence consisting of two or more deoxyribonucleotides, or a combination thereof in order on different wells of the storage unit chip; 所述的映射关系为DNA信息与数据信息单位进行一一映射的关系;The mapping relationship is a one-to-one mapping relationship between DNA information and data information units; 所述DNA信息为脱氧核糖核苷酸或者由两个以上脱氧核糖核苷酸组成的多核苷酸序列,或者为脱氧核糖核苷酸或者由两个以上脱氧核糖核苷酸组成的多核苷酸序列中任意两种的组合;The DNA information is a deoxyribonucleotide or a polynucleotide sequence consisting of two or more deoxyribonucleotides, or a combination of any two of the deoxyribonucleotides or the polynucleotide sequence consisting of two or more deoxyribonucleotides; 在映射关系中,每种DNA信息的分子量均不相同,且两两之间的分子量差距不低于10。In the mapping relationship, the molecular weight of each DNA information is different, and the molecular weight difference between each pair is no less than 10. 根据权利要求12所述的DNA信息存储装置,其特征在于,数据信息与DNA信息转换单元可以包括DNA信息编码单元、DNA信息与数据信息匹配单元以及DNA信息转换单元。所述DNA信息编码单元用于记录每一种DNA信息对应的碱基种类和数量的组合。DNA信息和数据信息匹配单元用于调用DNA信息编码单元中的不同DNA信息与数据信息单位进行一一匹配和对应。所述DNA信息转换单元用于将数据信息提取单元中的数字信息根据DNA信息和数据信息匹配单元的信息一一转换为DNA信息。The DNA information storage device according to claim 12 is characterized in that the data information and DNA information conversion unit may include a DNA information encoding unit, a DNA information and data information matching unit, and a DNA information conversion unit. The DNA information encoding unit is used to record the combination of base types and quantities corresponding to each type of DNA information. The DNA information and data information matching unit is used to call different DNA information and data information units in the DNA information encoding unit for one-to-one matching and correspondence. The DNA information conversion unit is used to convert the digital information in the data information extraction unit into DNA information one-to-one according to the information of the DNA information and data information matching unit. 一种的基于天然和非天然碱基的DNA信息存储的解码装置,其特征在于,所述解码装置包括:A decoding device for DNA information storage based on natural and unnatural bases, characterized in that the decoding device comprises: 读取单元,用于通过质谱仪检测权利要求12或13中DNA信息存储装置合成和存储单元中储存的待测序列,并根据分子量确认其DNA信息;A reading unit, used to detect the sequence to be tested synthesized and stored in the storage unit of the DNA information storage device of claim 12 or 13 by a mass spectrometer, and confirm its DNA information according to the molecular weight; DNA信息与数据信息转换单元,用于根据预设的映射关系即DNA信息存储装置中相同的数据信息与DNA信息映射关系,将读取单元获得的DNA信息转换为数据信息;A DNA information and data information conversion unit, used to convert the DNA information obtained by the reading unit into data information according to a preset mapping relationship, that is, the same data information and DNA information mapping relationship in the DNA information storage device; 计算机数据输出单元,用于将DNA信息与数据信息转换单元获得的数据信息转换为存储的计算机数据;A computer data output unit, used to convert the DNA information and the data information obtained by the data information conversion unit into stored computer data; 所述读取模块包含了用于检测存储信息的芯片中对每个孔位中的核酸序列进行质谱检测的质谱仪。The reading module comprises a mass spectrometer for performing mass spectrometry detection on the nucleic acid sequence in each well in the chip for detecting the stored information. 根据权利要求14所述的解码装置,其特征在于,所述读取模块包含了用于检测存储信息的芯片中对每个孔位中的核酸序列进行质谱检测的质谱仪,还可以包含对于每个孔位中的待测序列进行预处理的单元。The decoding device according to claim 14 is characterized in that the reading module includes a mass spectrometer for performing mass spectrometry detection on the nucleic acid sequence in each well position in the chip for detecting the stored information, and may also include a unit for preprocessing the sequence to be detected in each well position. 一种计算机可读存储介质,其特征在于,其上存储有计算机程序,其中,该计算机程序被处理器执行时实现权利要求1-7任一项所述的基于天然和非天然碱基的DNA存储方法或权利要求8-11任一项所述的DNA存储方法获得DNA信息存储载体的信息读取方法的 步骤。A computer-readable storage medium, characterized in that a computer program is stored thereon, wherein when the computer program is executed by a processor, the method for storing DNA information based on natural and non-natural bases according to any one of claims 1 to 7 or the method for reading information from a DNA information storage carrier obtained by the DNA storage method according to any one of claims 8 to 11 is implemented. step. 一种计算机设备,其特征在于,包括存储器和处理器,在所述存储器上存储有能够在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述权利要求1-7任一项所述的基于天然和非天然碱基的DNA存储方法或权利要求8-11任一项所述的DNA存储方法获得DNA信息存储载体的信息读取方法的步骤。 A computer device, characterized in that it comprises a memory and a processor, wherein a computer program that can be run on the processor is stored in the memory, and when the processor executes the computer program, the steps of the method for reading information of a DNA information storage carrier obtained by the DNA storage method based on natural and non-natural bases as described in any one of claims 1 to 7 or the DNA storage method as described in any one of claims 8 to 11 are implemented.
PCT/CN2023/133791 2022-12-13 2023-11-23 Dna information storage method based on natural and non-natural bases Ceased WO2024125260A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211594766.8A CN116030895B (en) 2022-12-13 2022-12-13 A DNA information storage method based on natural and unnatural bases
CN202211594766.8 2022-12-13

Publications (1)

Publication Number Publication Date
WO2024125260A1 true WO2024125260A1 (en) 2024-06-20

Family

ID=86076754

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/133791 Ceased WO2024125260A1 (en) 2022-12-13 2023-11-23 Dna information storage method based on natural and non-natural bases

Country Status (2)

Country Link
CN (1) CN116030895B (en)
WO (1) WO2024125260A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119649874A (en) * 2024-11-28 2025-03-18 天津大学 A multi-base encoding and readout method for composite base DNA storage

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116030895B (en) * 2022-12-13 2025-08-29 中国科学院深圳先进技术研究院 A DNA information storage method based on natural and unnatural bases
CN116436974B (en) * 2023-06-15 2023-08-11 国能日新科技股份有限公司 Data transmission method and system
CN119475378B (en) * 2024-10-28 2025-12-05 南京大学 Information encryption and decoding methods based on non-natural DNA
CN120336330B (en) * 2025-06-19 2025-09-23 中国电子技术标准化研究院((工业和信息化部电子工业标准化研究院)(工业和信息化部电子第四研究院)) A DNA-encoding-based information storage method

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019040871A1 (en) * 2017-08-24 2019-02-28 Miller Julian Device for information encoding and, storage using artificially expanded alphabets of nucleic acids and other analogous polymers
WO2019178551A1 (en) * 2018-03-16 2019-09-19 Catalog Technologies, Inc. Chemical methods for nucleic acid-based data storage
WO2019222561A1 (en) * 2018-05-16 2019-11-21 Catalog Technologies, Inc. Compositions and methods for nucleic acid-based data storage
US20200242482A1 (en) * 2017-10-10 2020-07-30 Roswell Biotechnologies, Inc. Methods, apparatus and systems for amplification-free dna data storage
CN113096742A (en) * 2021-04-14 2021-07-09 湖南科技大学 DNA information storage parallel addressing writing method and system
WO2022062621A1 (en) * 2020-09-28 2022-03-31 清华大学 Array-type nucleic acid information storage method and apparatus
WO2022069022A1 (en) * 2020-09-29 2022-04-07 Ecole Polytechnique Federale De Lausanne (Epfl) Systems and methods for digital information decoding and data storage in hybrid macromolecules
CN116030895A (en) * 2022-12-13 2023-04-28 中国科学院深圳先进技术研究院 A DNA information storage method based on natural and unnatural bases

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114283886B (en) * 2021-12-22 2024-12-03 博奥生物集团有限公司 A method, system and electronic device for identifying drug-resistant genes
CN115206430B (en) * 2022-06-20 2025-09-23 清华大学深圳国际研究生院 DNA-based information encoding method, decoding method, and computer-readable storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019040871A1 (en) * 2017-08-24 2019-02-28 Miller Julian Device for information encoding and, storage using artificially expanded alphabets of nucleic acids and other analogous polymers
US20200242482A1 (en) * 2017-10-10 2020-07-30 Roswell Biotechnologies, Inc. Methods, apparatus and systems for amplification-free dna data storage
WO2019178551A1 (en) * 2018-03-16 2019-09-19 Catalog Technologies, Inc. Chemical methods for nucleic acid-based data storage
WO2019222561A1 (en) * 2018-05-16 2019-11-21 Catalog Technologies, Inc. Compositions and methods for nucleic acid-based data storage
WO2022062621A1 (en) * 2020-09-28 2022-03-31 清华大学 Array-type nucleic acid information storage method and apparatus
WO2022069022A1 (en) * 2020-09-29 2022-04-07 Ecole Polytechnique Federale De Lausanne (Epfl) Systems and methods for digital information decoding and data storage in hybrid macromolecules
CN113096742A (en) * 2021-04-14 2021-07-09 湖南科技大学 DNA information storage parallel addressing writing method and system
CN116030895A (en) * 2022-12-13 2023-04-28 中国科学院深圳先进技术研究院 A DNA information storage method based on natural and unnatural bases

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119649874A (en) * 2024-11-28 2025-03-18 天津大学 A multi-base encoding and readout method for composite base DNA storage

Also Published As

Publication number Publication date
CN116030895B (en) 2025-08-29
CN116030895A (en) 2023-04-28

Similar Documents

Publication Publication Date Title
WO2024125260A1 (en) Dna information storage method based on natural and non-natural bases
US11149308B2 (en) Sequence assembly
US10706017B2 (en) Methods and systems for storing sequence read data
US20210057045A1 (en) Determining the Clinical Significance of Variant Sequences
US11164661B2 (en) Integrated system for nucleic acid-based storage and retrieval of digital data using keys
US9177100B2 (en) Method and systems for processing polymeric sequence data and related information
WO2018149405A1 (en) Information storage and reading method
WO2022082573A1 (en) Method and apparatus for processing dna sequence storing data information
Hamdan et al. The Brazilian Atlantic bushmaster Lachesis (linnaeus, 1766) mitogenome with insights on snake evolution and divergence (serpentes: viperidae: crotalinae)
HK40082992A (en) Gene data processing method, device and related equipment under the background of high-throughput sequencing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23902459

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 23902459

Country of ref document: EP

Kind code of ref document: A1