[go: up one dir, main page]

WO2025137825A1 - Nucleic acid molecule sequencing method and related device - Google Patents

Nucleic acid molecule sequencing method and related device Download PDF

Info

Publication number
WO2025137825A1
WO2025137825A1 PCT/CN2023/141583 CN2023141583W WO2025137825A1 WO 2025137825 A1 WO2025137825 A1 WO 2025137825A1 CN 2023141583 W CN2023141583 W CN 2023141583W WO 2025137825 A1 WO2025137825 A1 WO 2025137825A1
Authority
WO
WIPO (PCT)
Prior art keywords
nucleic acid
image
simulated
sequencing
optical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/CN2023/141583
Other languages
French (fr)
Chinese (zh)
Inventor
张昊
沈梦哲
苏泽宇
刘阳
李俊锋
黎宇翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BGI Shenzhen Co Ltd
Original Assignee
BGI Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BGI Shenzhen Co Ltd filed Critical BGI Shenzhen Co Ltd
Priority to PCT/CN2023/141583 priority Critical patent/WO2025137825A1/en
Publication of WO2025137825A1 publication Critical patent/WO2025137825A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • C12Q1/6874Methods for sequencing involving nucleic acid arrays, e.g. sequencing by hybridisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • reducing the distance between adjacent nucleic acid molecules and increasing the arrangement density of nucleic acid molecules on the chip can effectively increase the number of bases in a unit field of view, thereby increasing the sequencing throughput and reducing the sequencing cost per unit throughput.
  • the optical diffraction limit when the distance between nucleic acid molecules is less than the resolution of the imaging system, the fluorescent signals of adjacent nucleic acid molecules will crosstalk, thereby greatly affecting the accuracy of base calling. Therefore, how to effectively improve the accuracy of base calling in the process of sequencing nucleic acid molecules has become a major problem that needs to be solved urgently in the industry.
  • the present disclosure aims to solve at least one of the technical problems existing in the prior art. To this end, the present disclosure proposes a nucleic acid molecule sequencing method and a related device, which can effectively improve the accuracy of base calling in the process of sequencing nucleic acid molecules.
  • the nucleic acid molecule sequencing method according to the first aspect of the present disclosure includes:
  • the sequencing image is an image acquired by collecting an image of the target nucleic acid sample using a preset optical sequencing system
  • the image processing model is a model trained using a constructed true value image of nucleic acid molecules and a corresponding simulated nucleic acid image
  • the simulated nucleic acid image is an image obtained by performing simulated sequencing on a simulated nucleic acid sample corresponding to the true value image of nucleic acid molecules based on a simulated optical system
  • the simulated optical system is an optical system obtained by simulating the preset optical sequencing system
  • Nucleic acid molecule sequencing is performed according to the target image to obtain a sequencing result corresponding to the target nucleic acid sample.
  • the performing image processing on the sequencing image based on the image processing model to obtain the target image further includes training the image processing model, specifically including:
  • the trained image processing model is obtained.
  • performing simulated sequencing based on the simulated nucleic acid sample by the simulated optical system to obtain the simulated nucleic acid image includes:
  • the nucleic acid distribution information is simulated based on the simulated optical calibration information to obtain the simulated nucleic acid image.
  • the simulating the optical calibration information of the preset optical sequencing system includes:
  • the pixel size is the size of a single pixel in the preset optical sequencing system corresponding to the imaging plane;
  • the pixel size is integrated with the optical transfer function to obtain the simulated optical calibration information.
  • performing optical imaging analysis on the fluorescence signal corresponding to the base in the simulated nucleic acid sample to obtain an optical transfer function includes:
  • integrating the pixel size with the optical transfer function to obtain the simulated optical calibration information includes:
  • the pixel size, the optical transfer function and the simulated noise information are integrated to obtain the simulated optical calibration information.
  • simulating the nucleic acid distribution information based on the simulated optical calibration information to obtain the simulated nucleic acid image includes:
  • Environmental noise simulation is performed on the second simulated image based on the simulated noise information to obtain the simulated nucleic acid image.
  • the step of inputting the true value image of the nucleic acid molecule and the simulated nucleic acid image corresponding to the true value image of the nucleic acid molecule into the original image processing model and iteratively training the image processing model includes:
  • weight parameters of the image processing model are updated.
  • obtaining the trained image processing model includes:
  • the training deviation data reflects that the image processing model converges during iterative training
  • FIG2 is a schematic diagram of an image processing model of a deep residual channel attention neural network structure provided by an embodiment of the present disclosure
  • FIG6 is a flow chart of step S501 in FIG5 ;
  • FIG9 is a flow chart of step S603 in FIG6 ;
  • FIG10 is a schematic diagram of determining the average value, standard deviation of the signal in the blank area and the average maximum value of the nucleic acid molecule area provided by an embodiment of the present disclosure
  • FIG11 is a flow chart of step S502 in FIG5 ;
  • FIG. 12( a) to FIG. 12( c) are schematic diagrams of obtaining simulated nucleic acid images provided by embodiments of the present disclosure.
  • FIG13 is a flow chart of step S403 in FIG4 ;
  • FIG14 is a sequence result of a multi-spaced nucleic acid molecule chip by a sequencer provided in a specific embodiment of the present disclosure
  • FIG. 15 is a sequencing result of a multi-spacing nucleic acid molecule chip using a high sampling optical machine provided in the second specific embodiment of the present disclosure
  • FIG. 17 is a schematic diagram of the hardware structure of an electronic device provided in an embodiment of the present disclosure.
  • orientations such as up, down, left, right, front, back, etc., indicating orientations or positional relationships
  • orientations or positional relationships are based on the orientations or positional relationships shown in the accompanying drawings, and are only for the convenience of describing the present disclosure and simplifying the description, rather than indicating or implying that the device or element referred to has a specific orientation, is constructed and operated in a specific orientation, and therefore cannot be understood as a limitation on the present disclosure.
  • High-throughput sequencing is a technology for sequencing nucleic acid molecules, which can perform parallel sequence determination on a large number of nucleic acid molecules at a time. It can be pointed out that in the high-throughput sequencing process, nucleic acid molecules can be connected based on the solid surface, complementary probes with fluorescent groups can be connected to nucleic acid molecules, and then the base sequences can be confirmed in sequence through fluorescent imaging.
  • nucleic acid molecules are fixed on arrayed sequencing chips. Through each round of reaction between nucleic acid molecules, specific enzymes and fluorescent probes, different bases will emit fluorescent signals of different wavelengths. This process is collected by the imaging system. On this basis, the collected images are reconstructed and identified, and the base sequence can be determined from them. Among them, reducing the distance between adjacent nucleic acid molecules and increasing the arrangement density of nucleic acid molecules on the chip can effectively increase the number of bases per unit field of view, thereby increasing the sequencing throughput and reducing the sequencing cost per unit throughput.
  • the present disclosure aims to solve at least one of the technical problems existing in the prior art. To this end, the present disclosure proposes a nucleic acid molecule sequencing method and a related device, which can effectively improve the accuracy of base calling in the process of sequencing nucleic acid molecules.
  • Step S102 performing image processing on the sequencing image based on an image processing model to obtain a target image
  • the image processing model is a model trained using a constructed true value image of a nucleic acid molecule and a corresponding simulated nucleic acid image
  • the simulated nucleic acid image is an image obtained by performing simulated sequencing on a simulated nucleic acid sample corresponding to the true value image of the nucleic acid molecule based on a simulated optical system
  • the simulated optical system is an optical system obtained by simulating a preset optical sequencing system
  • a sequencing image of a target nucleic acid sample can be first obtained, and the sequencing image is an image obtained by collecting an image of the target nucleic acid sample using a preset optical sequencing system; the sequencing image is image processed based on an image processing model to obtain a target image, and the image processing model is a model trained using a constructed nucleic acid molecule true value image and a corresponding simulated nucleic acid image, and the simulated nucleic acid image is an image obtained by simulated sequencing of a simulated nucleic acid sample corresponding to the nucleic acid molecule true value image based on a simulated optical system, and the simulated optical system is an optical system obtained by simulating the preset optical sequencing system; and then nucleic acid molecule sequencing is performed according to the target image to obtain a sequencing result corresponding to the target nucleic acid sample.
  • the image processing model is trained with the nucleic acid molecule true value image and the corresponding simulated nucleic acid image, and the simulated nucleic acid image is an image obtained by simulated sequencing of a simulated nucleic acid sample corresponding to the nucleic acid molecule true value image based on a simulated optical system, and the simulated optical system is an optical system obtained by simulating the preset optical sequencing system, the image processing model is used to process the sequencing image, and the resolution can be improved, so that the accuracy of base calling can be effectively improved in the process of sequencing nucleic acid molecules.
  • a sequencing image of a target nucleic acid sample is obtained, and the sequencing image is an image obtained by capturing an image of the target nucleic acid sample using a preset optical sequencing system.
  • a sequencing image of the target nucleic acid sample can be first obtained.
  • the target nucleic acid sample refers to a nucleic acid sample that serves as a sequencing target, and by capturing an image of the target nucleic acid sample, a sequencing image of the target nucleic acid sample can be obtained.
  • This process is collected by the imaging system, and the collected image is reconstructed and identified on this basis, and the base sequence can be determined therefrom. It can be pointed out that both the target nucleic acid sample and the simulated nucleic acid sample can be prepared in the above manner, the difference being that the target nucleic acid sample is a nucleic acid sample used as a sequencing target in actual applications, and the simulated nucleic acid sample is used to construct the training data of the image processing model.
  • Some embodiments can effectively increase the number of bases per unit field of view by reducing the distance between adjacent nucleic acid molecules and increasing the arrangement density of nucleic acid molecules on the chip, thereby increasing sequencing throughput and reducing unit throughput sequencing costs.
  • the optical diffraction limit when the distance between nucleic acid molecules is less than the resolution of the imaging system, the fluorescent signals of adjacent nucleic acid molecules will crosstalk, thereby greatly affecting the accuracy of base calling.
  • the sequencing image can be processed based on the image processing model to obtain the target image.
  • the image processing of the sequencing image based on the image processing model is intended to improve the resolution of the sequencing image through the image processing model, that is, super-resolution processing.
  • super-resolution processing is to improve the resolution of the original image through hardware or software methods, and the process of obtaining a high-resolution image through a series of low-resolution images is super-resolution reconstruction.
  • the true value image of nucleic acid molecules and the corresponding simulated nucleic acid image to train the image processing model can improve the image processing model's super-resolution processing capability for images.
  • a target image with a higher resolution can be obtained, so as to facilitate nucleic acid molecule sequencing according to the target image in subsequent steps.
  • the spacing between nucleic acid molecules is less than the resolution of the imaging system, the crosstalk effect caused by the fluorescent signals of adjacent nucleic acid molecules can be reduced, and the bases can be effectively improved in the process of sequencing nucleic acid molecules. The accuracy of the judgment.
  • the image processing model can be a deep residual channel attention neural network structure.
  • the target image can be obtained by performing image processing on the sequencing image based on the image processing model.
  • the sequencing image can be first subjected to shallow feature extraction to obtain the sequencing image features, and then sequentially passed through multiple layers of RG and Conv for output, and input together with the initial sequencing image features into the reconstruction module to generate the target image.
  • RG residual shallow feature extraction module
  • FCAB channel attention module
  • Conv convolution layer
  • RELU linear rectification activation layer
  • FFT Fourier transform layer
  • Sigmoid S-type function activation layer.
  • nucleic acid molecules are sequenced according to the target image to obtain sequencing results corresponding to the target nucleic acid sample. It can be explained that, since the sequencing image is processed by the image processing model, the resolution of the nucleic acid molecules in the target image is improved compared to the resolution of the nucleic acid molecules in the sequencing image. On this basis, sequencing nucleic acid molecules according to the target image helps to reduce the crosstalk caused by the fluorescent signals of adjacent nucleic acid molecules when the spacing between nucleic acid molecules is less than the resolution of the imaging system, so that the accuracy of base calling can be effectively improved in the process of sequencing nucleic acid molecules.
  • Step S403 inputting the true value image of the nucleic acid molecule and the simulated nucleic acid image corresponding to the true value image of the nucleic acid molecule into the original image processing model, and iteratively training the image processing model;
  • step S401 constructs a true value image of nucleic acid molecules based on a simulated nucleic acid sample.
  • nucleic acid molecules are fixed on an arrayed sequencing chip, and through each round of reaction between nucleic acid molecules, specific enzymes and fluorescent probes, different bases will emit fluorescent signals of different wavelengths. This process is collected by the imaging system, and the collected images are reconstructed and identified on this basis, and the bases can be determined from them.
  • the simulated nucleic acid sample can be prepared in the above manner, and the simulated nucleic acid sample is used to construct the training data of the image processing model.
  • the base sequence of the simulated nucleic acid sample is known, and on this basis, the true value image of the nucleic acid molecule can be constructed based on the known base sequence of the simulated nucleic acid sample.
  • each nucleic acid molecule of the simulated nucleic acid sample is regularly loaded into the sequencing chip for arrangement, and the arrangement unit of the nucleic acid arrangement is called a block.
  • Each imaging field of view contains multiple arrangement unit blocks, and the spacing between adjacent arrangement unit blocks is called a tracking line.
  • the information such as the number of arrangement unit blocks, the arrangement order, and the spacing between nucleic acid molecules divided in each imaging field of view of the simulated nucleic acid sample is recorded in the mask file. Since the mask file contains the known nucleic acid arrangement in the simulated nucleic acid sample, the true value image of the nucleic acid molecule can be constructed based on the mask file to obtain the true value image of the nucleic acid molecule.
  • a group of simulations can include multiple mask files describing different imaging fields.
  • each nucleic acid molecule is composed of a base sequence, which can be randomly generated or from a standard genome library such as Escherichia coli and humans.
  • a base sequence which can be randomly generated or from a standard genome library such as Escherichia coli and humans.
  • one base can be imaged in turn when the nucleic acid molecule is imaged for multiple rounds. After several rounds of imaging of the nucleic acid molecule in each imaging field of view, the base of the nucleic acid molecule in each imaging field of view can be clearly presented in the true value image of the nucleic acid molecule obtained by simulation sequencing.
  • the brightness of the fluorescent signal corresponding to each base can be represented by a set of normalized four-dimensional vectors [i_a, i_c, i_g, i_t].
  • the base type is one of A, C, G, and T
  • the brightness of the corresponding position is set to a certain range of values. If there is no base at a certain position, the four elements of the vector are all 0. In this way, the true value of the simulated nucleic acid sample can be clearly presented in the true value image of the nucleic acid molecule.
  • a simulated nucleic acid sample is simulated and sequenced by a simulated optical system to obtain a simulated nucleic acid image.
  • the simulated optical system is an optical system obtained by simulating a preset optical sequencing system, and thus the simulated optical system simulates the optical conditions of the preset optical sequencing system, and can perform simulated sequencing based on the simulated nucleic acid sample to obtain a simulated nucleic acid image.
  • the sequencing image is an image obtained by collecting the target nucleic acid sample using a preset optical sequencing system. Therefore, in order to enable the trained image processing model to be adapted to super-resolution processing of the sequencing image, in the process of training the image processing model, a simulated nucleic acid sample can be simulated sequenced by a simulated optical system to obtain a simulated nucleic acid image.
  • step S402 performs simulated sequencing based on a simulated nucleic acid sample through a simulated optical system to obtain a simulated nucleic acid image, which may include, but is not limited to, the following steps S501 to S502 .
  • Step S501 simulating optical calibration information of a preset optical sequencing system
  • Step S502 simulating the nucleic acid distribution information based on the simulated optical calibration information to obtain a simulated nucleic acid image.
  • the optical calibration information of the preset optical sequencing system is simulated. It can be explained that since the simulated optical system is an optical system obtained by simulating the preset optical sequencing system, the simulated optical system simulates the optical conditions of the preset optical sequencing system, and can perform simulated sequencing based on the simulated nucleic acid sample to obtain a simulated nucleic acid image. Specifically, the simulated optical system simulates the optical conditions of the preset optical sequencing system, which can be a simulation of the optical calibration information of the preset optical sequencing system.
  • the so-called optical calibration information means the optical physical quantity information that can be obtained by the simulated optical system in the image acquisition process of simulating the preset optical sequencing system.
  • the preset optical sequencing system is a nucleic acid image acquisition and sequencing system that is pre-set with certain optical conditions
  • the optical calibration information of the preset optical sequencing system can be simulated by querying its pre-set parameters.
  • the nucleic acid molecules in the simulated nucleic acid sample include a plurality of bases, each base being labeled with a fluorescent signal.
  • Step S501 simulates the optical calibration information of the preset optical sequencing system, which may include, but is not limited to, the following steps S601 to S603.
  • Step S601 determining the pixel size of a preset optical sequencing system; wherein the pixel size is the pixel size corresponding to a single pixel in the preset optical sequencing system.
  • Step S602 performing optical imaging analysis based on the fluorescence signals corresponding to the bases in the simulated nucleic acid sample to obtain an optical transfer function
  • Step S603 Integrate the pixel size and the optical transfer function to obtain simulated optical calibration information.
  • the pixel size of the preset optical sequencing system is determined; wherein the pixel size is the size of a single pixel in the preset optical sequencing system corresponding to the imaging plane.
  • the preset optical sequencing system uses a camera to collect images, then the pixel size represents the size of a single pixel in the camera corresponding to the imaging plane, for example, the physical size of the camera pixel is 2 ⁇ m, and the magnification of the preset optical sequencing system is 20 times, then the corresponding pixel size of the preset optical sequencing system is 0.1 ⁇ m.
  • the determination of the pixel size of the preset optical sequencing system is intended to establish a mapping of the simulated nucleic acid sample to the camera imaging space.
  • the pixel size of the preset optical sequencing system can be determined by using a high-precision stage to move the sequencing chip loaded with the simulated nucleic acid sample a large distance (e.g., 100 ⁇ m), and in this process, the number of displacement pixels at the corresponding position is measured from the image, and then the distance moved is divided by the number of displacement pixels to calculate the system pixel size. It should be understood that there are various implementation methods for determining the pixel size of the preset optical sequencing system, which may include, but are not limited to, the specific embodiments listed above.
  • optical imaging analysis is performed based on the fluorescent signals corresponding to the bases in the simulated nucleic acid sample to obtain an optical transfer function.
  • the optical transfer function is used to model the imaging performance of the optical system, and the modeled imaging performance can be used to generate a simulated nucleic acid image.
  • step S602 performs optical imaging analysis based on the fluorescence signals corresponding to the bases in the simulated nucleic acid sample to obtain the optical transfer function, which may include, but is not limited to, the following steps S701 to S704 .
  • Step S701 scanning and imaging the simulated nucleic acid sample to obtain a scanned nucleic acid image
  • Step S702 performing a local maximum search based on the scanned nucleic acid image to obtain a fluorescent image reflecting each fluorescent signal;
  • Step S703 performing Gaussian fitting and averaging processing on the fluorescence images corresponding to the multiple fluorescence signals to obtain a point spread function
  • Step S704 Perform Fourier transform on the point spread function to obtain an optical transfer function.
  • the simulated nucleic acid sample is scanned and imaged to obtain a scanned nucleic acid image.
  • scanning imaging refers to the process of capturing a physical object to accurately represent its geometric shape in a digital environment. Since the optical transfer function is used to model the imaging performance of an optical system, some embodiments can scan and image a sparsely dotted area in a simulated nucleic acid sample, so that a scanned nucleic acid image can be obtained more efficiently in a sparsely dotted simulated nucleic acid sample. It should be understood that the sparsely dotted area in a simulated nucleic acid sample refers to an area where the spacing between nucleic acid molecules is much greater than the optical resolution.
  • a local maximum search is first performed based on the scanned nucleic acid image to obtain a fluorescent image reflecting each fluorescent signal; then the fluorescent images corresponding to multiple fluorescent signals are subjected to Gaussian fitting and averaging to obtain a point spread function; the point spread function is subjected to Fourier transformation to obtain an optical transfer function.
  • the scanned nucleic acid image can also be filtered by a Gaussian difference function, and then a single fluorescent sphere corresponding to each base is extracted through a local maximum search to form a fluorescent image containing multiple fluorescent spheres, and then a point spread function is obtained through Gaussian fitting and averaging, and then the system optical transfer function can be obtained after Fourier transformation.
  • the simulated nucleic acid sample is first scanned and imaged to obtain a scanned nucleic acid image; then a local maximum search is performed based on the scanned nucleic acid image to obtain a fluorescent image reflecting each fluorescent signal; the fluorescent images corresponding to multiple fluorescent signals are Gaussian-fitted and averaged to obtain a point spread function; and the point spread function is Fourier transformed to obtain an optical transfer function.
  • the optical transfer function obtained in this way can more accurately model the imaging performance of the optical system.
  • nucleic acid molecules in the simulated nucleic acid sample include a plurality of bases, each of which is labeled with a fluorescent signal.
  • the simulated nucleic acid sample is scanned and imaged to obtain a scanned nucleic acid image.
  • the scanned nucleic acid image shown in FIG8(a) A plurality of fluorescent microspheres discretely distributed on a sequencing chip loaded with a simulated nucleic acid sample;
  • the region where the fluorescent microspheres are located is extracted based on the scanned nucleic acid image
  • FIG8( d ) shows one of the fluorescence images of a single fluorescent microsphere
  • FIG8( e ) shows another fluorescence image of a single fluorescent microsphere
  • a Gaussian function is used to fit the intercepted single fluorescent ball image to obtain the image center, and multiple fluorescent ball images are aligned and averaged, that is, the fluorescent images corresponding to multiple fluorescent signals are Gaussian-fitted and averaged to obtain the point spread function;
  • FIG8(f) and FIG8(g) are schematic diagrams of performing Fourier transform on the point spread function to obtain the optical transfer function.
  • the pixel size is integrated with the optical transfer function to obtain simulated optical calibration information. It can be explained that the pixel size is used to establish a mapping from the simulated nucleic acid sample to the camera imaging space, and the optical transfer function is used to model the imaging performance of the optical system. By integrating the pixel size with the optical transfer function, the simulated optical system can be obtained as an image acquisition process of a simulated preset optical sequencing system, thereby determining the simulated optical calibration information.
  • step S603 integrates the pixel size with the optical transfer function to obtain simulated optical calibration information, which may include, but is not limited to, steps S901 to S902 .
  • Step S901 extracting noise from blank areas between nucleic acid molecules in a simulated nucleic acid sample to obtain simulated noise information
  • Step S902 integrating the pixel size, the optical transfer function and the simulated noise information to obtain simulated optical calibration information.
  • step S901 to step S902 noise is extracted from the blank areas between the nucleic acid molecules in the simulated nucleic acid sample to obtain simulated noise information, and then the pixel size, optical transfer function and simulated noise information are integrated to obtain simulated optical calibration information.
  • the blank areas between the nucleic acid molecules in the simulated nucleic acid sample do not contain nucleic acids that can be sequenced, so the blank areas between the nucleic acid molecules are suitable for extracting noise therefrom to obtain simulated noise information.
  • the disclosed embodiment aims to incorporate the optical noise of the preset optical sequencing system during image acquisition into the consideration of simulation modeling, improve the accuracy of the optical calibration information in simulating the nucleic acid distribution information, and obtain a higher quality simulated nucleic acid image.
  • the optical noise can be extracted manually or by algorithm.
  • the image software can read the average signal bg_ave, standard deviation bg_std of the blank area on the tracking line, and the maximum average sig of the nucleic acid molecule area; when extracted by algorithm, the algorithm can automatically locate the tracking line and the blank area and extract the corresponding signal.
  • simulating noise a normalized Gaussian noise can be generated first, and each pixel is independent. The average value of the Gaussian noise is bg_ave/sig, and the standard deviation is bg_std/sig. Then, the Gaussian noise can be added to the simulated signal and the pixel value of the superimposed shot noise can be calculated by the Poisson distribution function.
  • the white box is the tracking line background area, and the average value and standard deviation of the signal in this area are measured as bg_ave and bg_std.
  • the white cross indicates that the bright spot is a nucleic acid molecule signal, and the brightness of the center position of 10 bright spots is measured and the average value is sig.
  • Step S1102 performing imaging performance simulation on the first simulation image based on the optical transfer function to obtain a second simulation image
  • Step S1103 performing environmental noise simulation on the second simulated image based on the simulated noise information to obtain a simulated nucleic acid image.
  • the nucleic acid distribution information is first mapped to the imaging space based on the pixel size to obtain a first simulated image of the simulated nucleic acid sample in the imaging space; then the imaging performance of the first simulated image is simulated based on the optical transfer function to obtain a second simulated image; the environmental noise of the second simulated image is simulated based on the simulated noise information to obtain a simulated nucleic acid image.
  • the pixel size is used to establish a mapping of the simulated nucleic acid sample to the camera imaging space.
  • the nucleic acid distribution information can be mapped to the imaging space based on the pixel size to obtain the first simulated image of the simulated nucleic acid sample in the imaging space. Since the optical transfer function is used to model the imaging performance of the optical system, the imaging performance of the first simulated image can be simulated based on the optical transfer function to obtain the second simulated image. Since the simulated noise information is the optical noise when the preset optical sequencing system performs image acquisition, the environmental noise of the second simulated image can be simulated based on the simulated noise information, and finally the simulated nucleic acid image is obtained.
  • the optical calibration information helps to simulate the preset optical sequencing system more accurately, so that the simulated nucleic acid image obtained by simulation can more realistically simulate the actual acquisition of the sequencing image.
  • the distribution of nucleic acid molecules in each imaging field of view can be first determined, and a simulated nucleic acid molecule list can be generated, as shown in Figure 12(a); the nucleic acid molecule list can be mapped to a two-color or four-color imaging space, as shown in Figure 12(b), which is mapping the nucleic acid molecule list to a four-color imaging space; the mapped image can then be low-pass filtered using the measured optical transfer function, and finally simulated noise can be added to obtain a sequencing image that is close to the low-resolution during nucleic acid molecule sequencing, that is, a simulated nucleic acid image, as shown in Figure 12(c).
  • the actual signal and noise levels are extracted based on the images actually captured by the system, and the decrease in signal-to-noise ratio in long-cycle sequencing can also be simulated.
  • the optical calibration information of the preset optical sequencing system is first simulated, and then the nucleic acid distribution information is simulated based on the simulated optical calibration information to obtain a simulated nucleic acid image.
  • the optical calibration information obtained by simulating the preset optical sequencing system is used to simulate the simulated nucleic acid image, which helps to improve the simulation effect, so that the simulated nucleic acid image can more realistically simulate the actual collection of the sequencing image.
  • step S403 of some embodiments the true value image of nucleic acid molecules and the simulated nucleic acid image corresponding to the true value image of nucleic acid molecules are input into the original image processing model, and the image processing model is iteratively trained.
  • the simulated nucleic acid image is an image obtained by simulating the simulated nucleic acid sample corresponding to the true value image of nucleic acid molecules, which can simulate the acquisition of sequencing images more realistically.
  • Inputting the simulated nucleic acid image into the original image processing model for training helps to make the trained image processing model adaptable to super-resolution processing of sequencing images; the true value image of nucleic acid molecules can reflect the real and accurate arrangement of base sequences in the simulated nucleic acid sample, and inputting the simulated nucleic acid image into the original image processing model for training can improve the super-resolution processing capability of the image processing model.
  • step S403 inputs the true value image of the nucleic acid molecule and the simulated nucleic acid image corresponding to the true value image of the nucleic acid molecule into the original image processing model, and iteratively trains the image processing model, which may include, but is not limited to, the following steps S1301 to S1302. Step S1303:
  • Step S1301 in each round of iterative training, the simulated nucleic acid image is input into the image processing model for image processing training to obtain the processing result of this round;
  • Step S1302 comparing the processing result of this round with the true value image of the nucleic acid molecule to obtain training deviation data
  • Step S1303 updating the weight parameters of the image processing model based on the training deviation data.
  • step S1301 of some embodiments in each round of iterative training, the simulated nucleic acid image is input into the image processing model for image processing training to obtain the processing result of this round.
  • the simulated nucleic acid image is an image obtained by simulating the simulated nucleic acid sample corresponding to the true value image of the nucleic acid molecule, it can more realistically simulate the collection of sequencing images, and several rounds of iterative training are intended to optimize the super-resolution processing capability of the image processing model, therefore, in each round of iterative training, the simulated nucleic acid image can be input into the image processing model for image processing training to obtain the processing result of this round.
  • the weight parameters of the image processing model are updated based on the training deviation data. It can be explained that after obtaining the training deviation data, the weight parameters of the image processing model can be updated. It can be explained that there are generally two types of parameters in the neural network model: one type of parameter is the tuning parameters in the machine learning algorithm, which can be flexibly set according to existing or existing experience, also known as hyperparameters. For example, the regularization coefficient ⁇ , the depth of the tree in the decision tree model.
  • a hyperparameter is also a parameter, which has the characteristics of a parameter, such as unknown, that is, it is not a known constant, but a configurable value, which can be assigned a "correct" value based on existing or existing experience, that is, a flexibly set value, which is not obtained through system learning; another type of parameter can be learned and estimated from the data, called a model parameter, that is, a learnable parameter of the model itself.
  • a model parameter that is, a learnable parameter of the model itself.
  • the weight coefficient (slope) and the deviation term (intercept) of the linear regression line are both model parameters.
  • Learnable parameters specifically refer to parameter values learned during the training process of the neural network model. For learnable parameters, they usually start from a set of random values, and then update these values in an iterative manner as the neural network model learns.
  • the simulated nucleic acid image is input into the image processing model for image processing training to obtain the processing result of this round, and then the processing result of this round is compared with the true value image of the nucleic acid molecule to obtain the training deviation data, and then the weight parameters of the image processing model are updated based on the training deviation data.
  • the super-resolution processing capability of the image processing model for images can be continuously optimized in several rounds of iterative training.
  • step S404 of some embodiments when the image processing model meets the first predetermined condition in iterative training, a trained image processing model is obtained. It can be explained that when the image processing model meets the first predetermined condition in iterative training, it means that the image processing model's super-resolution processing capability for images has reached the expected level for practical application, and the image processing model can be adapted to super-resolution processing of sequencing images. In this case, the iterative training of the image processing model can be terminated to obtain the trained image processing model.
  • the image processing model meets the first predetermined condition in the iterative training, which may be that after the image processing model performs image processing training on the simulated nucleic acid image in the iterative training, the processing result reaches the resolution level of the true value image of the nucleic acid molecule;
  • the first predetermined condition is met during training, or the image processing model's super-resolution processing capability for images converges to a certain level during iterative training.
  • step S404 when the image processing model meets the first predetermined condition in iterative training, the trained image processing model is obtained, which may include, but is not limited to: when the training deviation data reflects that the image processing model converges in the iterative training, it is determined that the image processing model meets the first predetermined condition in the iterative training, and the trained image processing model is obtained.
  • the training deviation data reflects that the image processing model converges in the iterative training
  • it means that the super-resolution processing capability of the image processing model for the image has been optimized to a current limit level and is difficult to continue to improve.
  • it can be determined that the image processing model meets the first predetermined condition in the iterative training and the trained image processing model is obtained.
  • a true image of nucleic acid molecules is first constructed based on a simulated nucleic acid sample; then a simulated sequencing is performed based on the simulated nucleic acid sample through a simulated optical system to obtain a simulated nucleic acid image; the true image of nucleic acid molecules and the simulated nucleic acid image corresponding to the true image of nucleic acid molecules are input into the original image processing model, and the image processing model is iteratively trained; when the image processing model meets the first predetermined condition in the iterative training, a trained image processing model is obtained.
  • the image processing model can be improved in the process of training the image processing model for super-resolution processing of images, until the image processing model for super-resolution processing of images has reached the expected level that can be actually applied, and the image processing model can be adapted to super-resolution processing of sequencing images.
  • the trained image processing model is used for image processing of sequencing images, which helps to improve the image resolution, and helps to reduce the crosstalk caused by the fluorescent signals of adjacent nucleic acid molecules when the spacing between nucleic acid molecules is less than the resolution of the imaging system, so that the accuracy of base calling can be effectively improved in the process of sequencing nucleic acid molecules.
  • Specific embodiment 1 sequencing a multi-spaced nucleic acid molecule chip on a sequencer optical machine.
  • the bases of the target image are called and compared.
  • Processor 1701 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits to execute related programs to implement the technical solutions provided by the embodiments of the present disclosure;
  • a general-purpose CPU Central Processing Unit
  • microprocessor e.g., a central processing unit
  • ASIC Application Specific Integrated Circuit
  • a bus 1705 that transmits information between each component of the device (e.g., the processor 1701, the memory 1702, the input/output interface 1703, and the communication interface 1704);
  • the processor 1701 , the memory 1702 , the input/output interface 1703 and the communication interface 1704 are connected to each other in communication within the device via the bus 1705 .
  • the present disclosure also provides a computer program product, which includes a computer program.
  • a processor of a computer device reads and executes the computer program, so that the computer device executes and implements the above-mentioned nucleic acid molecule sequencing method.
  • At least one (item) means one or more, and “more” means two or more.
  • “And/or” is used to describe the association relationship of associated objects, indicating that three relationships may exist. For example, “A and/or B” can mean: only A exists, only B exists. And there are three situations where A and B exist at the same time, where A and B can be singular or plural.
  • the character “/” generally indicates that the objects before and after are in an “or” relationship.
  • At least one of the following” or similar expressions refers to any combination of these items, including any combination of single or plural items.
  • At least one of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c", where a, b, c can be single or plural.
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of units is only a logical function division. There may be other division methods in actual implementation.
  • multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed can be an indirect coupling or communication connection through some interfaces, devices or units, which can be electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the computer software product is stored in a storage medium, including multiple instructions for a computer device (which can be a personal computer, server, or network device, etc.) to perform all or part of the steps of each embodiment method of the present disclosure.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), disk or optical disk, and other media that can store program codes.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Wood Science & Technology (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Zoology (AREA)
  • Biochemistry (AREA)
  • Analytical Chemistry (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Immunology (AREA)
  • Biotechnology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

A nucleic acid molecule sequencing method of the present disclosure, comprising: first acquiring a sequencing image of a target nucleic acid sample; performing image processing on the sequencing image by means of an image processing model, so as to obtain a target image, the image processing model being a model obtained by training constructed nucleic acid molecule true value images and corresponding simulation nucleic acid images; and performing nucleic acid molecule sequencing on the basis of the target image, so as to obtain a sequencing result corresponding to the target nucleic acid sample.

Description

核酸分子测序方法及相关装置Nucleic acid molecule sequencing method and related device 技术领域Technical Field

本公开涉及图像处理领域、生物分子测序技术领域,尤其是涉及一种核酸分子测序方法及相关装置。The present disclosure relates to the field of image processing and the field of biological molecule sequencing technology, and in particular to a nucleic acid molecule sequencing method and related devices.

背景技术Background Art

高通量测序,是一种针对核酸分子进行测序的技术,能够一次并行对大量核酸分子实现平行序列的测定。可以指出,高通量测序可以将核酸分子固定在阵列化的测序芯片上,通过每轮核酸分子、特定的酶与荧光探针之间的反应作用,不同碱基将会发出不同波长的荧光信号,这一过程被成像系统采集,在此基础上针对采集到的图像进行重建和识别,即可从中测定碱基序列。High-throughput sequencing is a technology for sequencing nucleic acid molecules, which can measure the parallel sequences of a large number of nucleic acid molecules at a time. It can be pointed out that high-throughput sequencing can fix nucleic acid molecules on arrayed sequencing chips. Through each round of reaction between nucleic acid molecules, specific enzymes and fluorescent probes, different bases will emit fluorescent signals of different wavelengths. This process is collected by the imaging system, and the collected images are reconstructed and identified on this basis, and the base sequence can be determined from them.

根据相关技术,上述对碱基序列的测定过程中,降低相邻核酸分子的间距、提高芯片上核酸分子的排列密度,可以有效增加单位视野内的碱基数量,从而增加测序通量,降低单位通量测序成本。然而,受限于光学衍射极限,当核酸分子的间距小于成像系统的分辨率时,相邻核酸分子的荧光信号会发生串扰,从而大幅影响碱基判读的正确率。因此,如何在对核酸分子进行测序的过程中有效提升碱基判读的准确率,已经成为业内亟待解决的一大难题。According to relevant technologies, in the above-mentioned process of determining the base sequence, reducing the distance between adjacent nucleic acid molecules and increasing the arrangement density of nucleic acid molecules on the chip can effectively increase the number of bases in a unit field of view, thereby increasing the sequencing throughput and reducing the sequencing cost per unit throughput. However, due to the optical diffraction limit, when the distance between nucleic acid molecules is less than the resolution of the imaging system, the fluorescent signals of adjacent nucleic acid molecules will crosstalk, thereby greatly affecting the accuracy of base calling. Therefore, how to effectively improve the accuracy of base calling in the process of sequencing nucleic acid molecules has become a major problem that needs to be solved urgently in the industry.

发明内容Summary of the invention

本公开旨在至少解决现有技术中存在的技术问题之一。为此,本公开提出一种核酸分子测序方法及相关装置,能够在对核酸分子进行测序的过程中有效提升碱基判读的准确率。The present disclosure aims to solve at least one of the technical problems existing in the prior art. To this end, the present disclosure proposes a nucleic acid molecule sequencing method and a related device, which can effectively improve the accuracy of base calling in the process of sequencing nucleic acid molecules.

根据本公开的第一方面实施例的核酸分子测序方法,包括:The nucleic acid molecule sequencing method according to the first aspect of the present disclosure includes:

获取目标核酸样本的测序图像,所述测序图像为采用预设光学测序系统对所述目标核酸样本进行图像采集得到的图像;Acquire a sequencing image of the target nucleic acid sample, wherein the sequencing image is an image acquired by collecting an image of the target nucleic acid sample using a preset optical sequencing system;

基于图像处理模型对所述测序图像进行图像处理,得到目标图像,所述图像处理模型为采用构建的核酸分子真值图像以及对应的仿真核酸图像训练得到的模型,所述仿真核酸图像为基于仿真光学系统对所述核酸分子真值图像对应的仿真核酸样本进行仿真测序得到的图像,所述仿真光学系统为对所述预设光学测序系统进行仿真得到的光学系统;Performing image processing on the sequencing image based on an image processing model to obtain a target image, wherein the image processing model is a model trained using a constructed true value image of nucleic acid molecules and a corresponding simulated nucleic acid image, wherein the simulated nucleic acid image is an image obtained by performing simulated sequencing on a simulated nucleic acid sample corresponding to the true value image of nucleic acid molecules based on a simulated optical system, and the simulated optical system is an optical system obtained by simulating the preset optical sequencing system;

根据所述目标图像进行核酸分子测序,得到所述目标核酸样本对应的测序结果。Nucleic acid molecule sequencing is performed according to the target image to obtain a sequencing result corresponding to the target nucleic acid sample.

根据本公开的一些实施例,所述基于图像处理模型对所述测序图像进行图像处理,得到目标图像之前,还包括对所述图像处理模型进行训练,具体包括:According to some embodiments of the present disclosure, the performing image processing on the sequencing image based on the image processing model to obtain the target image further includes training the image processing model, specifically including:

基于所述仿真核酸样本构建所述核酸分子真值图像;Constructing the true value image of the nucleic acid molecule based on the simulated nucleic acid sample;

通过所述仿真光学系统基于所述仿真核酸样本进行仿真测序,得到所述仿真核酸图像;Performing simulated sequencing based on the simulated nucleic acid sample by the simulated optical system to obtain the simulated nucleic acid image;

将所述核酸分子真值图像、与所述核酸分子真值图像对应的所述仿真核酸图像输入原始的所述图像处理模型,对所述图像处理模型进行迭代训练;Inputting the true value image of the nucleic acid molecule and the simulated nucleic acid image corresponding to the true value image of the nucleic acid molecule into the original image processing model, and iteratively training the image processing model;

当所述图像处理模型在迭代训练中符合第一预定条件,得到训练后的所述图像处理模型。When the image processing model meets the first predetermined condition during iterative training, the trained image processing model is obtained.

根据本公开的一些实施例,所述通过所述仿真光学系统基于所述仿真核酸样本进行仿真测序,得到所述仿真核酸图像,包括:According to some embodiments of the present disclosure, performing simulated sequencing based on the simulated nucleic acid sample by the simulated optical system to obtain the simulated nucleic acid image includes:

对所述预设光学测序系统的光学标定信息进行模拟;Simulating optical calibration information of the preset optical sequencing system;

基于模拟的所述光学标定信息对所述核酸分布信息进行仿真,得到所述仿真核酸图像。 The nucleic acid distribution information is simulated based on the simulated optical calibration information to obtain the simulated nucleic acid image.

根据本公开的一些实施例,所述仿真核酸样本中的核酸分子包括多个碱基,每一个所述碱基被标记有一个荧光信号;According to some embodiments of the present disclosure, the nucleic acid molecules in the simulated nucleic acid sample include a plurality of bases, each of which is labeled with a fluorescent signal;

所述对所述预设光学测序系统的光学标定信息进行模拟,包括:The simulating the optical calibration information of the preset optical sequencing system includes:

确定所述预设光学测序系统的像素尺寸;其中,所述像素尺寸为所述预设光学测序系统中单个像元对应在成像平面的尺寸;Determine the pixel size of the preset optical sequencing system; wherein the pixel size is the size of a single pixel in the preset optical sequencing system corresponding to the imaging plane;

基于所述仿真核酸样本中所述碱基对应的所述荧光信号进行光学成像解析处理,得到光学传递函数;Performing optical imaging analysis based on the fluorescence signal corresponding to the base in the simulated nucleic acid sample to obtain an optical transfer function;

将所述像素尺寸与所述光学传递函数进行整合,得到模拟的所述光学标定信息。The pixel size is integrated with the optical transfer function to obtain the simulated optical calibration information.

根据本公开的一些实施例,所述基于所述仿真核酸样本中所述碱基对应的所述荧光信号进行光学成像解析处理,得到光学传递函数,包括:According to some embodiments of the present disclosure, performing optical imaging analysis on the fluorescence signal corresponding to the base in the simulated nucleic acid sample to obtain an optical transfer function includes:

对所述仿真核酸样本进行扫描成像,得到扫描核酸图像;Scanning and imaging the simulated nucleic acid sample to obtain a scanned nucleic acid image;

基于所述扫描核酸图像进行局部极大值搜索,得到反映每一个所述荧光信号的荧光图像;Performing a local maximum search based on the scanned nucleic acid image to obtain a fluorescent image reflecting each of the fluorescent signals;

将多个所述荧光信号对应的所述荧光图像进行高斯拟合平均化处理,得到点扩展函数;Performing Gaussian fitting and averaging processing on the fluorescence images corresponding to the multiple fluorescence signals to obtain a point spread function;

对所述点扩展函数进行傅里叶变换,得到所述光学传递函数。Performing Fourier transform on the point spread function to obtain the optical transfer function.

根据本公开的一些实施例,所述将所述像素尺寸与所述光学传递函数进行整合,得到模拟的所述光学标定信息,包括:According to some embodiments of the present disclosure, integrating the pixel size with the optical transfer function to obtain the simulated optical calibration information includes:

对所述仿真核酸样本中所述核酸分子之间的空白区域进行噪声提取,得到仿真噪声信息;Extracting noise from blank areas between the nucleic acid molecules in the simulated nucleic acid sample to obtain simulated noise information;

将所述像素尺寸、所述光学传递函数与所述仿真噪声信息进行整合,得到模拟的所述光学标定信息。The pixel size, the optical transfer function and the simulated noise information are integrated to obtain the simulated optical calibration information.

根据本公开的一些实施例,所述基于模拟的所述光学标定信息对所述核酸分布信息进行仿真,得到所述仿真核酸图像,包括:According to some embodiments of the present disclosure, simulating the nucleic acid distribution information based on the simulated optical calibration information to obtain the simulated nucleic acid image includes:

基于所述像素尺寸将所述核酸分布信息映射到成像空间,得到所述仿真核酸样本在所述成像空间的第一仿真图像;Mapping the nucleic acid distribution information to an imaging space based on the pixel size to obtain a first simulated image of the simulated nucleic acid sample in the imaging space;

基于所述光学传递函数对所述第一仿真图像进行成像性能仿真,得到第二仿真图像;Performing imaging performance simulation on the first simulation image based on the optical transfer function to obtain a second simulation image;

基于所述仿真噪声信息对所述第二仿真图像进行环境噪声仿真,得到所述仿真核酸图像。Environmental noise simulation is performed on the second simulated image based on the simulated noise information to obtain the simulated nucleic acid image.

根据本公开的一些实施例,所述将所述核酸分子真值图像、与所述核酸分子真值图像对应的所述仿真核酸图像输入原始的所述图像处理模型,对所述图像处理模型进行迭代训练,包括:According to some embodiments of the present disclosure, the step of inputting the true value image of the nucleic acid molecule and the simulated nucleic acid image corresponding to the true value image of the nucleic acid molecule into the original image processing model and iteratively training the image processing model includes:

每一轮迭代训练中,将所述仿真核酸图像输入所述图像处理模型进行图像处理训练,得到本轮处理结果;In each round of iterative training, the simulated nucleic acid image is input into the image processing model for image processing training to obtain the processing result of this round;

将本轮处理结果与所述核酸分子真值图像进行比对,得到训练偏差数据;Comparing the processing result of this round with the true value image of the nucleic acid molecule to obtain training deviation data;

基于所述训练偏差数据,对所述图像处理模型的权重参数进行更新。Based on the training bias data, weight parameters of the image processing model are updated.

根据本公开的一些实施例,所述当所述图像处理模型在迭代训练中符合第一预定条件,得到训练后的所述图像处理模型,包括:According to some embodiments of the present disclosure, when the image processing model meets the first predetermined condition in iterative training, obtaining the trained image processing model includes:

当所述训练偏差数据反映所述图像处理模型在迭代训练中收敛,确定所述图像处理模型在迭代训练中符合所述第一预定条件,得到训练后的所述图像处理模型。When the training deviation data reflects that the image processing model converges during iterative training, it is determined that the image processing model meets the first predetermined condition during the iterative training, and the trained image processing model is obtained.

根据本公开的第二方面实施例的核酸分子测序装置,包括:According to the second aspect of the present disclosure, a nucleic acid molecule sequencing device includes:

图像获取模块,用于获取目标核酸样本的测序图像,所述测序图像为采用预设光学测序系统对所述目标核酸样本进行图像采集得到的图像;An image acquisition module, used to acquire a sequencing image of a target nucleic acid sample, wherein the sequencing image is an image acquired by acquiring an image of the target nucleic acid sample using a preset optical sequencing system;

图像处理模块,用于基于图像处理模型对所述测序图像进行图像处理,得到目标图像,所述图像处理模型为采用构建的核酸分子真值图像以及对应的仿真核酸图像训练得到的模型,所述仿真核酸图像为基于仿真光学系统对所述核酸分子真值图像对应的仿真核酸样本进行仿真测序得到的图像,所述仿真光学系统为对所述预设光学测序系统进行仿真得到的光学系统; An image processing module, used for performing image processing on the sequencing image based on an image processing model to obtain a target image, wherein the image processing model is a model trained by using a constructed true value image of nucleic acid molecules and a corresponding simulated nucleic acid image, wherein the simulated nucleic acid image is an image obtained by performing simulated sequencing on a simulated nucleic acid sample corresponding to the true value image of nucleic acid molecules based on a simulated optical system, and the simulated optical system is an optical system obtained by simulating the preset optical sequencing system;

序列测定模块,用于根据所述目标图像进行核酸分子测序,得到所述目标核酸样本对应的测序结果。The sequence determination module is used to perform nucleic acid molecule sequencing according to the target image to obtain a sequencing result corresponding to the target nucleic acid sample.

第三方面,本公开实施例提供了一种电子设备,包括:存储器、处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现如本公开第一方面实施例中任意一项所述的核酸分子测序方法。In a third aspect, an embodiment of the present disclosure provides an electronic device, comprising: a memory and a processor, wherein the memory stores a computer program, and when the processor executes the computer program, it implements the nucleic acid molecule sequencing method as described in any one of the embodiments of the first aspect of the present disclosure.

第四方面,本公开实施例提供了一种计算机可读存储介质,所述存储介质存储有程序,所述程序被处理器执行实现如本公开第一方面实施例中任意一项所述的核酸分子测序方法。In a fourth aspect, an embodiment of the present disclosure provides a computer-readable storage medium, wherein the storage medium stores a program, and the program is executed by a processor to implement a nucleic acid molecule sequencing method as described in any one of the embodiments of the first aspect of the present disclosure.

根据本公开实施例的核酸分子测序方法及相关装置,至少具有如下有益效果:The nucleic acid molecule sequencing method and related device according to the embodiments of the present disclosure have at least the following beneficial effects:

本公开核酸分子测序方法,可以先获取目标核酸样本的测序图像,测序图像为采用预设光学测序系统对目标核酸样本进行图像采集得到的图像;基于图像处理模型对测序图像进行图像处理,得到目标图像,图像处理模型为采用构建的核酸分子真值图像以及对应的仿真核酸图像训练得到的模型,仿真核酸图像为基于仿真光学系统对核酸分子真值图像对应的仿真核酸样本进行仿真测序得到的图像,仿真光学系统为对预设光学测序系统进行仿真得到的光学系统;再根据目标图像进行核酸分子测序,得到目标核酸样本对应的测序结果。由于图像处理模型经过核酸分子真值图像以及对应的仿真核酸图像的训练,并且仿真核酸图像为基于仿真光学系统对核酸分子真值图像对应的仿真核酸样本进行仿真测序得到的图像,仿真光学系统为对预设光学测序系统进行仿真得到的光学系统。因此,将图像处理模型用于对测序图像进行图像处理,可以实现分辨率的提升,如此一来,便能够在对核酸分子进行测序的过程中有效提升碱基判读的准确率。The disclosed nucleic acid molecule sequencing method can first obtain a sequencing image of a target nucleic acid sample, wherein the sequencing image is an image obtained by using a preset optical sequencing system to collect an image of the target nucleic acid sample; the sequencing image is processed based on an image processing model to obtain a target image, wherein the image processing model is a model trained using a constructed nucleic acid molecule true value image and a corresponding simulated nucleic acid image, wherein the simulated nucleic acid image is an image obtained by performing simulated sequencing on a simulated nucleic acid sample corresponding to the nucleic acid molecule true value image based on a simulated optical system, and the simulated optical system is an optical system obtained by simulating the preset optical sequencing system; and then nucleic acid molecule sequencing is performed according to the target image to obtain a sequencing result corresponding to the target nucleic acid sample. Since the image processing model is trained with the nucleic acid molecule true value image and the corresponding simulated nucleic acid image, and the simulated nucleic acid image is an image obtained by performing simulated sequencing on a simulated nucleic acid sample corresponding to the nucleic acid molecule true value image based on a simulated optical system, and the simulated optical system is an optical system obtained by simulating the preset optical sequencing system. Therefore, the image processing model is used to process the sequencing image, and the resolution can be improved, so that the accuracy of base calling can be effectively improved in the process of sequencing nucleic acid molecules.

本公开的附加方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本公开的实践了解到。Additional aspects and advantages of the present disclosure will be given in part in the following description and in part will be obvious from the following description or will be learned through practice of the present disclosure.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

本公开的上述和/或附加的方面和优点从结合下面附图对实施例的描述中将变得明显和容易理解,其中:The above and/or additional aspects and advantages of the present disclosure will become apparent and easily understood from the description of the embodiments in conjunction with the following drawings, in which:

图1是本公开实施例提供的核酸分子测序方法的流程图;FIG1 is a flow chart of a nucleic acid molecule sequencing method provided by an embodiment of the present disclosure;

图2是本公开实施例提供的深层残差通道注意力神经网络结构的图像处理模型示意图;FIG2 is a schematic diagram of an image processing model of a deep residual channel attention neural network structure provided by an embodiment of the present disclosure;

图3是本公开实施例提供的核酸分子测序方法原理示意图;FIG3 is a schematic diagram of the principle of the nucleic acid molecule sequencing method provided by an embodiment of the present disclosure;

图4中的步骤S102之前对图像处理模型进行训练的流程图;Flow chart of training the image processing model before step S102 in FIG4 ;

图5是图4中的步骤S402的流程图;FIG5 is a flow chart of step S402 in FIG4 ;

图6是图5中的步骤S501的流程图;FIG6 is a flow chart of step S501 in FIG5 ;

图7是图6中的步骤S602的流程图;FIG7 is a flow chart of step S602 in FIG6 ;

图8(a)至图8(g)是本公开实施例提供的得到光学传递函数的示意图;FIG8(a) to FIG8(g) are schematic diagrams of obtaining optical transfer functions provided by embodiments of the present disclosure;

图9是图6中的步骤S603的流程图;FIG9 is a flow chart of step S603 in FIG6 ;

图10是本公开实施例提供的确定空白区域信号平均值、标准差和核酸分子区域的极大值平均的示意图;FIG10 is a schematic diagram of determining the average value, standard deviation of the signal in the blank area and the average maximum value of the nucleic acid molecule area provided by an embodiment of the present disclosure;

图11是图5中的步骤S502的流程图;FIG11 is a flow chart of step S502 in FIG5 ;

图12(a)至图12(c)是本公开实施例提供的得到仿真核酸图像的示意图;FIG. 12( a) to FIG. 12( c) are schematic diagrams of obtaining simulated nucleic acid images provided by embodiments of the present disclosure;

图13是图4中的步骤S403的流程图;FIG13 is a flow chart of step S403 in FIG4 ;

图14是本公开具体实施例一提供的测序仪对多间距核酸分子芯片的测序结果;FIG14 is a sequence result of a multi-spaced nucleic acid molecule chip by a sequencer provided in a specific embodiment of the present disclosure;

图15是本公开具体实施例二提供的高采样光机对多间距核酸分子芯片的测序结果;FIG. 15 is a sequencing result of a multi-spacing nucleic acid molecule chip using a high sampling optical machine provided in the second specific embodiment of the present disclosure;

图16是本公开实施例提供的核酸分子测序装置的结构示意图;FIG16 is a schematic diagram of the structure of a nucleic acid molecule sequencing device provided in an embodiment of the present disclosure;

图17是本公开实施例提供的电子设备的硬件结构示意图。FIG. 17 is a schematic diagram of the hardware structure of an electronic device provided in an embodiment of the present disclosure.

具体实施方式 DETAILED DESCRIPTION

下面详细描述本公开的实施例,实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本公开,而不能理解为对本公开的限制。Embodiments of the present disclosure are described in detail below, and examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals throughout represent the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain the present disclosure, and cannot be understood as limiting the present disclosure.

在本公开的描述中,多个的含义是两个以上,大于、小于、超过等理解为不包括本数,以上、以下、以内等理解为包括本数。如果有描述到第一、第二只是用于区分技术特征为目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量或者隐含指明所指示的技术特征的先后关系。In the description of the present disclosure, the meaning of "a plurality" is more than two, "greater than", "less than", "exceed", etc. are understood to exclude the number itself, and "above", "below", "within", etc. are understood to include the number itself. If there is a description of "first" or "second", it is only used for the purpose of distinguishing technical features, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of the indicated technical features or implicitly indicating the order of the indicated technical features.

在本公开的描述中,可以理解的是,涉及到方位描述,例如上、下、左、右、前、后等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本公开和简化描述,而不是指示或暗示所指的装置或元件具有特定的方位、以特定的方位构造和操作,因此不能理解为对本公开的限制。In the description of the present disclosure, it can be understood that the descriptions involving orientations, such as up, down, left, right, front, back, etc., indicating orientations or positional relationships, are based on the orientations or positional relationships shown in the accompanying drawings, and are only for the convenience of describing the present disclosure and simplifying the description, rather than indicating or implying that the device or element referred to has a specific orientation, is constructed and operated in a specific orientation, and therefore cannot be understood as a limitation on the present disclosure.

在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示意性实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本公开的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of this specification, the description with reference to the terms "one embodiment", "some embodiments", "illustrative embodiments", "examples", "specific examples", or "some examples" means that the specific features, structures, materials, or characteristics described in conjunction with the embodiment or example are included in at least one embodiment or example of the present disclosure. In this specification, the schematic representations of the above terms do not necessarily refer to the same embodiment or example. Moreover, the specific features, structures, materials, or characteristics described may be combined in any one or more embodiments or examples in a suitable manner.

本公开的描述中,可以说明的是,除非另有明确的限定,设置、安装、连接等词语应做广义理解,所属技术领域技术人员可以结合技术方案的具体内容合理确定上述词语在本公开中的具体含义。另外,下文中对于具体步骤的标识并不代表对于步骤顺序与执行逻辑的限定,每个步骤之间的执行顺序与执行逻辑应参照实施例所表述的内容进行理解与推定。In the description of the present disclosure, it can be explained that, unless otherwise clearly defined, the terms such as setting, installing, connecting, etc. should be understood in a broad sense, and the technical personnel in the relevant technical field can reasonably determine the specific meanings of the above terms in the present disclosure in combination with the specific content of the technical solution. In addition, the identification of specific steps below does not represent a limitation on the order of steps and execution logic, and the execution order and execution logic between each step should be understood and inferred with reference to the contents described in the embodiment.

高通量测序,是一种针对核酸分子进行测序的技术,能够一次并行对大量核酸分子实现平行序列的测定。可以指出,高通量测序过程中,可以基于固相表面连接核酸分子,将带有荧光基团的互补探针与核酸分子连接,进而通过荧光成像依次确认碱基序列。High-throughput sequencing is a technology for sequencing nucleic acid molecules, which can perform parallel sequence determination on a large number of nucleic acid molecules at a time. It can be pointed out that in the high-throughput sequencing process, nucleic acid molecules can be connected based on the solid surface, complementary probes with fluorescent groups can be connected to nucleic acid molecules, and then the base sequences can be confirmed in sequence through fluorescent imaging.

根据相关技术,将核酸分子固定在阵列化的测序芯片上,通过每轮核酸分子、特定的酶与荧光探针之间的反应作用,不同碱基将会发出不同波长的荧光信号,这一过程被成像系统采集,在此基础上针对采集到的图像进行重建和识别,即可从中测定碱基序列。其中,降低相邻核酸分子的间距、提高芯片上核酸分子的排列密度,可以有效增加单位视野内的碱基数量,从而增加测序通量,降低单位通量测序成本。然而,受限于光学衍射极限,当核酸分子的间距小于成像系统的分辨率时,相邻核酸分子的荧光信号会发生串扰,从而大幅影响碱基判读的正确率。因此,如何在对核酸分子进行测序的过程中有效提升碱基判读的准确率,已经成为业内亟待解决的一大难题。According to relevant technologies, nucleic acid molecules are fixed on arrayed sequencing chips. Through each round of reaction between nucleic acid molecules, specific enzymes and fluorescent probes, different bases will emit fluorescent signals of different wavelengths. This process is collected by the imaging system. On this basis, the collected images are reconstructed and identified, and the base sequence can be determined from them. Among them, reducing the distance between adjacent nucleic acid molecules and increasing the arrangement density of nucleic acid molecules on the chip can effectively increase the number of bases per unit field of view, thereby increasing the sequencing throughput and reducing the sequencing cost per unit throughput. However, due to the optical diffraction limit, when the distance between nucleic acid molecules is less than the resolution of the imaging system, the fluorescent signals of adjacent nucleic acid molecules will crosstalk, thereby greatly affecting the accuracy of base calling. Therefore, how to effectively improve the accuracy of base calling in the process of sequencing nucleic acid molecules has become a major problem that needs to be solved in the industry.

本公开旨在至少解决现有技术中存在的技术问题之一。为此,本公开提出一种核酸分子测序方法及相关装置,能够在对核酸分子进行测序的过程中有效提升碱基判读的准确率。The present disclosure aims to solve at least one of the technical problems existing in the prior art. To this end, the present disclosure proposes a nucleic acid molecule sequencing method and a related device, which can effectively improve the accuracy of base calling in the process of sequencing nucleic acid molecules.

下面以附图为依据作出说明。The following description is based on the accompanying drawings.

参照图1,根据本公开实施例提供的核酸分子测序方法,可以包括,但不限于下述步骤S101至步骤S103。1 , the nucleic acid molecule sequencing method provided according to an embodiment of the present disclosure may include, but is not limited to, the following steps S101 to S103 .

步骤S101,获取目标核酸样本的测序图像,测序图像为采用预设光学测序系统对目标核酸样本进行图像采集得到的图像;Step S101, obtaining a sequencing image of a target nucleic acid sample, where the sequencing image is an image obtained by collecting an image of the target nucleic acid sample using a preset optical sequencing system;

步骤S102,基于图像处理模型对测序图像进行图像处理,得到目标图像,图像处理模型为采用构建的核酸分子真值图像以及对应的仿真核酸图像训练得到的模型,仿真核酸图像为基于仿真光学系统对核酸分子真值图像对应的仿真核酸样本进行仿真测序得到的图像,仿真光学系统为对预设光学测序系统进行仿真得到的光学系统;Step S102, performing image processing on the sequencing image based on an image processing model to obtain a target image, wherein the image processing model is a model trained using a constructed true value image of a nucleic acid molecule and a corresponding simulated nucleic acid image, wherein the simulated nucleic acid image is an image obtained by performing simulated sequencing on a simulated nucleic acid sample corresponding to the true value image of the nucleic acid molecule based on a simulated optical system, and the simulated optical system is an optical system obtained by simulating a preset optical sequencing system;

步骤S103,根据目标图像进行核酸分子测序,得到目标核酸样本对应的测序结果。 Step S103, performing nucleic acid molecule sequencing according to the target image to obtain a sequencing result corresponding to the target nucleic acid sample.

通过本公开步骤S101至步骤S103示出的核酸分子测序方法,可以先获取目标核酸样本的测序图像,测序图像为采用预设光学测序系统对目标核酸样本进行图像采集得到的图像;基于图像处理模型对测序图像进行图像处理,得到目标图像,图像处理模型为采用构建的核酸分子真值图像以及对应的仿真核酸图像训练得到的模型,仿真核酸图像为基于仿真光学系统对核酸分子真值图像对应的仿真核酸样本进行仿真测序得到的图像,仿真光学系统为对预设光学测序系统进行仿真得到的光学系统;再根据目标图像进行核酸分子测序,得到目标核酸样本对应的测序结果。由于图像处理模型经过核酸分子真值图像以及对应的仿真核酸图像的训练,并且仿真核酸图像为基于仿真光学系统对核酸分子真值图像对应的仿真核酸样本进行仿真测序得到的图像,仿真光学系统为对预设光学测序系统进行仿真得到的光学系统,因此,将图像处理模型用于对测序图像进行图像处理,可以实现分辨率的提升,如此一来,便能够在对核酸分子进行测序的过程中有效提升碱基判读的准确率。According to the nucleic acid molecule sequencing method shown in steps S101 to S103 of the present disclosure, a sequencing image of a target nucleic acid sample can be first obtained, and the sequencing image is an image obtained by collecting an image of the target nucleic acid sample using a preset optical sequencing system; the sequencing image is image processed based on an image processing model to obtain a target image, and the image processing model is a model trained using a constructed nucleic acid molecule true value image and a corresponding simulated nucleic acid image, and the simulated nucleic acid image is an image obtained by simulated sequencing of a simulated nucleic acid sample corresponding to the nucleic acid molecule true value image based on a simulated optical system, and the simulated optical system is an optical system obtained by simulating the preset optical sequencing system; and then nucleic acid molecule sequencing is performed according to the target image to obtain a sequencing result corresponding to the target nucleic acid sample. Since the image processing model is trained with the nucleic acid molecule true value image and the corresponding simulated nucleic acid image, and the simulated nucleic acid image is an image obtained by simulated sequencing of a simulated nucleic acid sample corresponding to the nucleic acid molecule true value image based on a simulated optical system, and the simulated optical system is an optical system obtained by simulating the preset optical sequencing system, the image processing model is used to process the sequencing image, and the resolution can be improved, so that the accuracy of base calling can be effectively improved in the process of sequencing nucleic acid molecules.

一些实施例的步骤S101,获取目标核酸样本的测序图像,测序图像为采用预设光学测序系统对目标核酸样本进行图像采集得到的图像。可以说明的是,为了对核酸分子进行测序,可以先获取目标核酸样本的测序图像。其中,目标核酸样本指的是作为测序目标的核酸样本,针对目标核酸样本进行图像采集,即可获取目标核酸样本的测序图像。应理解,针对目标核酸样本进行图像采集可以依托于预设光学测序系统,而预设光学测序系统指的是预先设置有一定光学条件的核酸图像采集及测序系统。可以指出,基于不同光学条件下的预设光学测序系统对目标核酸样本进行图像采集,将会有不同的效果。In step S101 of some embodiments, a sequencing image of a target nucleic acid sample is obtained, and the sequencing image is an image obtained by capturing an image of the target nucleic acid sample using a preset optical sequencing system. It can be explained that in order to sequence nucleic acid molecules, a sequencing image of the target nucleic acid sample can be first obtained. Among them, the target nucleic acid sample refers to a nucleic acid sample that serves as a sequencing target, and by capturing an image of the target nucleic acid sample, a sequencing image of the target nucleic acid sample can be obtained. It should be understood that image capture for the target nucleic acid sample can rely on a preset optical sequencing system, and the preset optical sequencing system refers to a nucleic acid image capture and sequencing system that is pre-set with certain optical conditions. It can be pointed out that capturing images of the target nucleic acid sample based on a preset optical sequencing system under different optical conditions will have different effects.

一些实施例的步骤S102,基于图像处理模型对测序图像进行图像处理,得到目标图像,图像处理模型为采用构建的核酸分子真值图像以及对应的仿真核酸图像训练得到的模型,仿真核酸图像为基于仿真光学系统对核酸分子真值图像对应的仿真核酸样本进行仿真测序得到的图像,仿真光学系统为对预设光学测序系统进行仿真得到的光学系统。可以强调的是,将核酸分子固定在阵列化的测序芯片上,通过每轮核酸分子、特定的酶与荧光探针之间的反应作用,不同碱基将会发出不同波长的荧光信号,这一过程被成像系统采集,在此基础上针对采集到的图像进行重建和识别,即可从中测定碱基序列。可以指出,目标核酸样本和仿真核酸样本都可以通过以上方式来制备,区别在于,目标核酸样本是实际应用中作为测序目标的核酸样本,仿真核酸样本则用于构建图像处理模型的训练数据。In step S102 of some embodiments, the sequencing image is processed based on the image processing model to obtain the target image, the image processing model is a model obtained by training with the constructed true value image of nucleic acid molecules and the corresponding simulated nucleic acid image, the simulated nucleic acid image is an image obtained by simulating sequencing of the simulated nucleic acid sample corresponding to the true value image of nucleic acid molecules based on the simulated optical system, and the simulated optical system is an optical system obtained by simulating the preset optical sequencing system. It can be emphasized that the nucleic acid molecules are fixed on the arrayed sequencing chip, and different bases will emit fluorescent signals of different wavelengths through each round of reaction between nucleic acid molecules, specific enzymes and fluorescent probes. This process is collected by the imaging system, and the collected image is reconstructed and identified on this basis, and the base sequence can be determined therefrom. It can be pointed out that both the target nucleic acid sample and the simulated nucleic acid sample can be prepared in the above manner, the difference being that the target nucleic acid sample is a nucleic acid sample used as a sequencing target in actual applications, and the simulated nucleic acid sample is used to construct the training data of the image processing model.

一些实施例可以通过降低相邻核酸分子的间距、提高芯片上核酸分子的排列密度,来有效增加单位视野内的碱基数量,从而增加测序通量,降低单位通量测序成本。然而,受限于光学衍射极限,当核酸分子的间距小于成像系统的分辨率时,相邻核酸分子的荧光信号会发生串扰,从而大幅影响碱基判读的正确率。Some embodiments can effectively increase the number of bases per unit field of view by reducing the distance between adjacent nucleic acid molecules and increasing the arrangement density of nucleic acid molecules on the chip, thereby increasing sequencing throughput and reducing unit throughput sequencing costs. However, due to the optical diffraction limit, when the distance between nucleic acid molecules is less than the resolution of the imaging system, the fluorescent signals of adjacent nucleic acid molecules will crosstalk, thereby greatly affecting the accuracy of base calling.

为了解决这一问题,本公开实施例步骤S102中可以基于图像处理模型对测序图像进行图像处理,得到目标图像。可以指出,基于图像处理模型对测序图像进行图像处理,旨在通过图像处理模型对测序图像进行提升分辨率的处理,也即超分辨率处理。其中,超分辨率处理是通过硬件或软件的方法提高原有图像的分辨率,通过一系列低分辨率的图像来得到一幅高分辨率的图像过程就是超分辨率重建。In order to solve this problem, in step S102 of the embodiment of the present disclosure, the sequencing image can be processed based on the image processing model to obtain the target image. It can be pointed out that the image processing of the sequencing image based on the image processing model is intended to improve the resolution of the sequencing image through the image processing model, that is, super-resolution processing. Among them, super-resolution processing is to improve the resolution of the original image through hardware or software methods, and the process of obtaining a high-resolution image through a series of low-resolution images is super-resolution reconstruction.

可以明确的是,图像处理模型为采用构建的核酸分子真值图像以及对应的仿真核酸图像训练得到的模型,其中仿真核酸图像为基于仿真光学系统对核酸分子真值图像对应的仿真核酸样本进行仿真测序得到的图像,仿真光学系统为对预设光学测序系统进行仿真得到的光学系统。应理解,通过仿真光学系统对核酸分子真值图像对应的仿真核酸样本进行仿真测序得到的图像,能够较为真实地模拟对测序图像的实际采集情况。因此,将核酸分子真值图像以及对应的仿真核酸图像用于对图像处理模型进行训练,可以提升图像处理模型针对图像的超分辨率处理能力,在基于图像处理模型对测序图像进行图像处理之后,方可得到分辨率更高的目标图像,以便于在后续步骤根据目标图像进行核酸分子测序。如此一来,便可以在核酸分子的间距小于成像系统的分辨率时,减少相邻核酸分子的荧光信号带来的串扰影响,在对核酸分子进行测序的过程中有效提升碱基 判读的准确率。It can be clearly understood that the image processing model is a model obtained by training with the constructed true value image of nucleic acid molecules and the corresponding simulated nucleic acid image, wherein the simulated nucleic acid image is an image obtained by simulating sequencing of a simulated nucleic acid sample corresponding to the true value image of nucleic acid molecules based on a simulated optical system, and the simulated optical system is an optical system obtained by simulating a preset optical sequencing system. It should be understood that the image obtained by simulating sequencing of a simulated nucleic acid sample corresponding to the true value image of nucleic acid molecules by a simulated optical system can more realistically simulate the actual acquisition of the sequencing image. Therefore, using the true value image of nucleic acid molecules and the corresponding simulated nucleic acid image to train the image processing model can improve the image processing model's super-resolution processing capability for images. After the sequencing image is processed based on the image processing model, a target image with a higher resolution can be obtained, so as to facilitate nucleic acid molecule sequencing according to the target image in subsequent steps. In this way, when the spacing between nucleic acid molecules is less than the resolution of the imaging system, the crosstalk effect caused by the fluorescent signals of adjacent nucleic acid molecules can be reduced, and the bases can be effectively improved in the process of sequencing nucleic acid molecules. The accuracy of the judgment.

参照图2,一些较为具体的实施例中,图像处理模型可以是深层残差通道注意力神经网络结构。基于图像处理模型对测序图像进行图像处理,即可得到目标图像。具体而言,可以先对测序图像进行浅层特征提取,得到测序图像特征,然后依次经过多层RG、经过Conv后进行输出,与最初的测序图像特征共同输入重建模块,生成目标图像。其中,RG:残差浅层特征提取模块,FCAB:通道注意力模块,Conv:卷积层,RELU:线性整流激活层,FFT:傅里叶变换层;Sigmoid:S型函数激活层。Referring to Figure 2, in some more specific embodiments, the image processing model can be a deep residual channel attention neural network structure. The target image can be obtained by performing image processing on the sequencing image based on the image processing model. Specifically, the sequencing image can be first subjected to shallow feature extraction to obtain the sequencing image features, and then sequentially passed through multiple layers of RG and Conv for output, and input together with the initial sequencing image features into the reconstruction module to generate the target image. Among them, RG: residual shallow feature extraction module, FCAB: channel attention module, Conv: convolution layer, RELU: linear rectification activation layer, FFT: Fourier transform layer; Sigmoid: S-type function activation layer.

一些实施例的步骤S103,根据目标图像进行核酸分子测序,得到目标核酸样本对应的测序结果。可以说明的是,由于经过了图像处理模型对测序图像进行图像处理,目标图像中核酸分子的分辨率相较于测序图像中核酸分子的分辨率有了提升。在此基础上,根据目标图像进行核酸分子测序,有助于在核酸分子的间距小于成像系统的分辨率时,减少相邻核酸分子的荧光信号带来的串扰影响,如此便可以在对核酸分子进行测序的过程中有效提升碱基判读的准确率。In step S103 of some embodiments, nucleic acid molecules are sequenced according to the target image to obtain sequencing results corresponding to the target nucleic acid sample. It can be explained that, since the sequencing image is processed by the image processing model, the resolution of the nucleic acid molecules in the target image is improved compared to the resolution of the nucleic acid molecules in the sequencing image. On this basis, sequencing nucleic acid molecules according to the target image helps to reduce the crosstalk caused by the fluorescent signals of adjacent nucleic acid molecules when the spacing between nucleic acid molecules is less than the resolution of the imaging system, so that the accuracy of base calling can be effectively improved in the process of sequencing nucleic acid molecules.

一些实施例中,可以将基因测序仪用于进行核酸分子测序。可以说明的是,基因测序仪又称DNA测序仪,是测定DNA片段的碱基顺序、种类和定量的仪器。主要应用在人类基因组测序、人类遗传病、传染病和癌症的基因诊断、法医的亲子鉴定和个体识别、生物工程药物的筛选、动植物杂交育种等方面。目前DNA测序仪的工作原理主要基于双脱氧链末端终止法或化学降解法,这两种方法在原理上虽然不同,但都是根据在固定的位点开始核苷酸链的延伸,随机在某一个特定的碱基处终止,产生以A、T、C、G为末端的四组不同长度的一系列核苷酸链,在变性聚丙烯酰胺凝胶上电泳进行片段的分离和检测,从而获得DNA序列。由于双脱氧链末端终止法更简便和更适合于光学自动探测,因此在单纯以测定DNA序列为目的的全自动DNA测序仪中应用广泛。而化学降解法在研究DNA的二级结构以及蛋白质-DNA相互作用中具有重要的应用价值。应理解,根据目标图像进行核酸分子测序的方式多种多样,可以包括,但不限于上述举出的具体实施例。In some embodiments, a gene sequencer can be used for nucleic acid molecule sequencing. It can be explained that a gene sequencer, also known as a DNA sequencer, is an instrument for determining the base sequence, type and quantity of DNA fragments. It is mainly used in human genome sequencing, genetic diagnosis of human genetic diseases, infectious diseases and cancer, forensic paternity testing and individual identification, screening of bioengineering drugs, animal and plant hybrid breeding, etc. At present, the working principle of a DNA sequencer is mainly based on the dideoxy chain end termination method or the chemical degradation method. Although these two methods are different in principle, they are both based on starting the extension of the nucleotide chain at a fixed site, randomly terminating at a certain specific base, and generating a series of nucleotide chains of four groups of different lengths with A, T, C, and G as the ends. The fragments are separated and detected by electrophoresis on a denaturing polyacrylamide gel, thereby obtaining a DNA sequence. Since the dideoxy chain end termination method is simpler and more suitable for optical automatic detection, it is widely used in fully automatic DNA sequencers that are simply intended to determine the DNA sequence. The chemical degradation method has important application value in studying the secondary structure of DNA and protein-DNA interactions. It should be understood that there are various ways to perform nucleic acid molecule sequencing based on the target image, which may include, but not be limited to, the specific embodiments listed above.

参照图3示出的核酸分子测序方法原理示意图。核酸分子真值图像对应的仿真核酸样本,在经过仿真测序后得到仿真核酸图像,然后基于核酸分子真值图像以及对应的仿真核酸图像训练得到图像处理模型。在核酸分子测序的过程中,先获取目标核酸样本的测序图像,然后基于图像处理模型对测序图像进行图像处理,得到目标图像,根据目标图像进行核酸分子测序,得到目标核酸样本对应的测序结果。将图像处理模型用于对测序图像进行图像处理,可以实现分辨率的提升,如此一来,便能够在对核酸分子进行测序的过程中有效提升碱基判读的准确率。Referring to the schematic diagram of the principle of the nucleic acid molecule sequencing method shown in FIG3 . The simulated nucleic acid sample corresponding to the true value image of the nucleic acid molecule obtains a simulated nucleic acid image after simulated sequencing, and then the image processing model is trained based on the true value image of the nucleic acid molecule and the corresponding simulated nucleic acid image. In the process of nucleic acid molecule sequencing, the sequencing image of the target nucleic acid sample is first obtained, and then the sequencing image is processed based on the image processing model to obtain the target image, and the nucleic acid molecule is sequenced according to the target image to obtain the sequencing result corresponding to the target nucleic acid sample. The image processing model is used for image processing of the sequencing image to achieve an improvement in resolution, so that the accuracy of base calling can be effectively improved in the process of sequencing nucleic acid molecules.

由于本公开实施例的步骤S101至步骤S103已在上文详细描述,接下来将对步骤S102之前可能包含步骤进行详细描述。Since steps S101 to S103 of the embodiment of the present disclosure have been described in detail above, the steps that may be included before step S102 will be described in detail below.

下面对步骤S102之前,本公开训练图像处理模型的实施例作出说明。The following is a description of an embodiment of the present disclosure of training an image processing model before step S102.

参照图4,根据本公开的一些实施例,在步骤S102基于图像处理模型对测序图像进行图像处理,得到目标图像之前,还可以包括对图像处理模型进行训练,具体可以包括,但不限于下述步骤S401至步骤S404。4 , according to some embodiments of the present disclosure, before performing image processing on the sequencing image based on the image processing model to obtain the target image in step S102 , it may also include training the image processing model, which may specifically include, but is not limited to, the following steps S401 to S404 .

步骤S401,基于仿真核酸样本构建核酸分子真值图像;Step S401, constructing a true value image of nucleic acid molecules based on a simulated nucleic acid sample;

步骤S402,通过仿真光学系统基于仿真核酸样本进行仿真测序,得到仿真核酸图像;Step S402, performing simulated sequencing based on the simulated nucleic acid sample by a simulated optical system to obtain a simulated nucleic acid image;

步骤S403,将核酸分子真值图像、与核酸分子真值图像对应的仿真核酸图像输入原始的图像处理模型,对图像处理模型进行迭代训练;Step S403, inputting the true value image of the nucleic acid molecule and the simulated nucleic acid image corresponding to the true value image of the nucleic acid molecule into the original image processing model, and iteratively training the image processing model;

步骤S404,当图像处理模型在迭代训练中符合第一预定条件,得到训练后的图像处理模型。Step S404: When the image processing model meets the first predetermined condition during iterative training, a trained image processing model is obtained.

一些实施例的步骤S401,基于仿真核酸样本构建核酸分子真值图像。可以强调的是,将核酸分子固定在阵列化的测序芯片上,通过每轮核酸分子、特定的酶与荧光探针之间的反应作用,不同碱基将会发出不同波长的荧光信号,这一过程被成像系统采集,在此基础上针对采集到的图像进行重建和识别,即可从中测定碱 基序列。可以指出,仿真核酸样本可以通过以上方式来制备,仿真核酸样本用于构建图像处理模型的训练数据。可以指出,仿真核酸样本的碱基序列是已知的,在此基础上,可以基于仿真核酸样本已知的碱基序列来构建核酸分子真值图像。In some embodiments, step S401 constructs a true value image of nucleic acid molecules based on a simulated nucleic acid sample. It can be emphasized that nucleic acid molecules are fixed on an arrayed sequencing chip, and through each round of reaction between nucleic acid molecules, specific enzymes and fluorescent probes, different bases will emit fluorescent signals of different wavelengths. This process is collected by the imaging system, and the collected images are reconstructed and identified on this basis, and the bases can be determined from them. It can be pointed out that the simulated nucleic acid sample can be prepared in the above manner, and the simulated nucleic acid sample is used to construct the training data of the image processing model. It can be pointed out that the base sequence of the simulated nucleic acid sample is known, and on this basis, the true value image of the nucleic acid molecule can be constructed based on the known base sequence of the simulated nucleic acid sample.

根据一些较为具体的实施例,仿真核酸样本的每个核酸分子被规则装载到测序芯片中进行排布,核酸排布的排布单元称作block。每个成像视野中含有多个排布单元block,相邻排布单元block的间距称作追踪线。可以指出,仿真核酸样本对应每个成像视野中划分的排布单元block数量、排列顺序以及核酸分子间距等信息被记录在掩膜文件之中。由于掩膜文件包含有仿真核酸样本中已知的核酸排布情况,因此可以基于掩膜文件来构建核酸分子真值图像,得到核酸分子真值图像。其中,一组仿真中可以包含描述不同成像视野的多个掩膜文件。According to some more specific embodiments, each nucleic acid molecule of the simulated nucleic acid sample is regularly loaded into the sequencing chip for arrangement, and the arrangement unit of the nucleic acid arrangement is called a block. Each imaging field of view contains multiple arrangement unit blocks, and the spacing between adjacent arrangement unit blocks is called a tracking line. It can be pointed out that the information such as the number of arrangement unit blocks, the arrangement order, and the spacing between nucleic acid molecules divided in each imaging field of view of the simulated nucleic acid sample is recorded in the mask file. Since the mask file contains the known nucleic acid arrangement in the simulated nucleic acid sample, the true value image of the nucleic acid molecule can be constructed based on the mask file to obtain the true value image of the nucleic acid molecule. Among them, a group of simulations can include multiple mask files describing different imaging fields.

根据一些更为具体的实施例,每个核酸分子都是由一个碱基序列构成,该碱基序列可以是随机生成的,也可以来自大肠杆菌、人类等标准基因组文库。在基于掩膜文件来构建核酸分子真值图像的过程中,对核酸分子进行多轮成像时可以依次对其中的一个碱基进行成像,在每个成像视野中的核酸分子经过数轮成像后,每个成像视野中核酸分子的碱基即可清楚呈现在仿真测序得到的核酸分子真值图像中。其中,由于不同碱基可以发出不同波长的荧光信号,因此每个碱基对应荧光信号的亮度可以由一组归一化的四维向量[i_a,i_c,i_g,i_t]表示,当该碱基类型为A、C、G、T中某一种时,对应位置的亮度设定为一定范围的数值,若某位置无碱基则该向量四个元素均为0。如此一来,便可以在核酸分子真值图像中清楚呈现出仿真核酸样本的真值。According to some more specific embodiments, each nucleic acid molecule is composed of a base sequence, which can be randomly generated or from a standard genome library such as Escherichia coli and humans. In the process of constructing a true value image of a nucleic acid molecule based on a mask file, one base can be imaged in turn when the nucleic acid molecule is imaged for multiple rounds. After several rounds of imaging of the nucleic acid molecule in each imaging field of view, the base of the nucleic acid molecule in each imaging field of view can be clearly presented in the true value image of the nucleic acid molecule obtained by simulation sequencing. Among them, since different bases can emit fluorescent signals of different wavelengths, the brightness of the fluorescent signal corresponding to each base can be represented by a set of normalized four-dimensional vectors [i_a, i_c, i_g, i_t]. When the base type is one of A, C, G, and T, the brightness of the corresponding position is set to a certain range of values. If there is no base at a certain position, the four elements of the vector are all 0. In this way, the true value of the simulated nucleic acid sample can be clearly presented in the true value image of the nucleic acid molecule.

一些实施例的步骤S402,通过仿真光学系统基于仿真核酸样本进行仿真测序,得到仿真核酸图像。可以说明的是,仿真光学系统为对预设光学测序系统进行仿真得到的光学系统,因此通过仿真光学系统模拟预设光学测序系统的光学条件,能够基于仿真核酸样本进行仿真测序,得到仿真核酸图像。In step S402 of some embodiments, a simulated nucleic acid sample is simulated and sequenced by a simulated optical system to obtain a simulated nucleic acid image. It can be explained that the simulated optical system is an optical system obtained by simulating a preset optical sequencing system, and thus the simulated optical system simulates the optical conditions of the preset optical sequencing system, and can perform simulated sequencing based on the simulated nucleic acid sample to obtain a simulated nucleic acid image.

在本公开核酸分子测序方法的执行过程中,测序图像是采用预设光学测序系统对目标核酸样本进行图像采集得到的图像。因此,为了使得训练出来的图像处理模型能够适应于对测序图像进行超分辨率处理,在对图像处理模型进行训练的过程中,可以通过仿真光学系统基于仿真核酸样本进行仿真测序,得到仿真核酸图像。In the process of executing the nucleic acid molecule sequencing method disclosed in the present invention, the sequencing image is an image obtained by collecting the target nucleic acid sample using a preset optical sequencing system. Therefore, in order to enable the trained image processing model to be adapted to super-resolution processing of the sequencing image, in the process of training the image processing model, a simulated nucleic acid sample can be simulated sequenced by a simulated optical system to obtain a simulated nucleic acid image.

参照图5,根据本公开的一些实施例,步骤S402通过仿真光学系统基于仿真核酸样本进行仿真测序,得到仿真核酸图像,可以包括,但不限于下述步骤S501至步骤S502。5 , according to some embodiments of the present disclosure, step S402 performs simulated sequencing based on a simulated nucleic acid sample through a simulated optical system to obtain a simulated nucleic acid image, which may include, but is not limited to, the following steps S501 to S502 .

步骤S501,对预设光学测序系统的光学标定信息进行模拟;Step S501, simulating optical calibration information of a preset optical sequencing system;

步骤S502,基于模拟的光学标定信息对核酸分布信息进行仿真,得到仿真核酸图像。Step S502 , simulating the nucleic acid distribution information based on the simulated optical calibration information to obtain a simulated nucleic acid image.

一些实施例的步骤S501,对预设光学测序系统的光学标定信息进行模拟。可以说明的是,由于仿真光学系统为对预设光学测序系统进行仿真得到的光学系统,因此通过仿真光学系统模拟预设光学测序系统的光学条件,能够基于仿真核酸样本进行仿真测序,得到仿真核酸图像。具体而言,仿真光学系统模拟预设光学测序系统的光学条件,可以是对预设光学测序系统的光学标定信息进行模拟。所谓光学标定信息的含义是,仿真光学系统为模拟预设光学测序系统的图像采集过程,所可以的光学物理量信息。In step S501 of some embodiments, the optical calibration information of the preset optical sequencing system is simulated. It can be explained that since the simulated optical system is an optical system obtained by simulating the preset optical sequencing system, the simulated optical system simulates the optical conditions of the preset optical sequencing system, and can perform simulated sequencing based on the simulated nucleic acid sample to obtain a simulated nucleic acid image. Specifically, the simulated optical system simulates the optical conditions of the preset optical sequencing system, which can be a simulation of the optical calibration information of the preset optical sequencing system. The so-called optical calibration information means the optical physical quantity information that can be obtained by the simulated optical system in the image acquisition process of simulating the preset optical sequencing system.

一些实施例中,由于预设光学测序系统是预先设置有一定光学条件的核酸图像采集及测序系统,因此通过查询其预先设置的参数,即可实现对预设光学测序系统的光学标定信息进行模拟。In some embodiments, since the preset optical sequencing system is a nucleic acid image acquisition and sequencing system that is pre-set with certain optical conditions, the optical calibration information of the preset optical sequencing system can be simulated by querying its pre-set parameters.

参照图6,根据本公开的一些较为具体的实施例,仿真核酸样本中的核酸分子包括多个碱基,每一个碱基被标记有一个荧光信号。步骤S501对预设光学测序系统的光学标定信息进行模拟,可以包括,但不限于下述步骤S601至步骤S603。6 , according to some more specific embodiments of the present disclosure, the nucleic acid molecules in the simulated nucleic acid sample include a plurality of bases, each base being labeled with a fluorescent signal. Step S501 simulates the optical calibration information of the preset optical sequencing system, which may include, but is not limited to, the following steps S601 to S603.

步骤S601,确定预设光学测序系统的像素尺寸;其中,像素尺寸为预设光学测序系统中单个像元对应在 成像平面的尺寸;Step S601, determining the pixel size of a preset optical sequencing system; wherein the pixel size is the pixel size corresponding to a single pixel in the preset optical sequencing system. The size of the imaging plane;

步骤S602,基于仿真核酸样本中碱基对应的荧光信号进行光学成像解析处理,得到光学传递函数;Step S602, performing optical imaging analysis based on the fluorescence signals corresponding to the bases in the simulated nucleic acid sample to obtain an optical transfer function;

步骤S603,将像素尺寸与光学传递函数进行整合,得到模拟的光学标定信息。Step S603: Integrate the pixel size and the optical transfer function to obtain simulated optical calibration information.

一些实施例的步骤S601,确定预设光学测序系统的像素尺寸;其中,像素尺寸为预设光学测序系统中单个像元对应在成像平面的尺寸。可以说明的是,预设光学测序系统通过相机来采集图像,那么像素尺寸则表示相机中单个像元对应在成像平面的尺寸,例如相机像元物理尺寸为2μm,预设光学测序系统放大率为20倍,那么预设光学测序系统对应的像素尺寸则为0.1μm。可以指出,确定预设光学测序系统的像素尺寸,旨在建立仿真核酸样本到相机成像空间的映射。In step S601 of some embodiments, the pixel size of the preset optical sequencing system is determined; wherein the pixel size is the size of a single pixel in the preset optical sequencing system corresponding to the imaging plane. It can be explained that the preset optical sequencing system uses a camera to collect images, then the pixel size represents the size of a single pixel in the camera corresponding to the imaging plane, for example, the physical size of the camera pixel is 2μm, and the magnification of the preset optical sequencing system is 20 times, then the corresponding pixel size of the preset optical sequencing system is 0.1μm. It can be pointed out that the determination of the pixel size of the preset optical sequencing system is intended to establish a mapping of the simulated nucleic acid sample to the camera imaging space.

一些较为具体的实施例中,确定预设光学测序系统的像素尺寸,可以使用高精度载物台将装载有仿真核酸样本的测序芯片移动较大距离(如100μm),在此过程中从图像上测量对应位置的位移像素数,然后将移动的距离除以位移像素数,即可计算获得系统像素尺寸。应理解,确定预设光学测序系统的像素尺寸的实施方式多种多样,可以包括,但不限于上述举出的具体实施例。In some more specific embodiments, the pixel size of the preset optical sequencing system can be determined by using a high-precision stage to move the sequencing chip loaded with the simulated nucleic acid sample a large distance (e.g., 100 μm), and in this process, the number of displacement pixels at the corresponding position is measured from the image, and then the distance moved is divided by the number of displacement pixels to calculate the system pixel size. It should be understood that there are various implementation methods for determining the pixel size of the preset optical sequencing system, which may include, but are not limited to, the specific embodiments listed above.

一些实施例的步骤S602,基于仿真核酸样本中碱基对应的荧光信号进行光学成像解析处理,得到光学传递函数。可以说明的是,光学传递函数用于对光学系统的成像性能进行建模,建模的成像性能可用于仿真核酸图像的生成。在基于仿真核酸样本中碱基对应的荧光信号进行光学成像解析处理,也即对光学系统的成像性能进行建模,如此一来便可得到对应的光学传递函数。In step S602 of some embodiments, optical imaging analysis is performed based on the fluorescent signals corresponding to the bases in the simulated nucleic acid sample to obtain an optical transfer function. It can be explained that the optical transfer function is used to model the imaging performance of the optical system, and the modeled imaging performance can be used to generate a simulated nucleic acid image. By performing optical imaging analysis based on the fluorescent signals corresponding to the bases in the simulated nucleic acid sample, that is, modeling the imaging performance of the optical system, the corresponding optical transfer function can be obtained.

参照图7,根据本公开的一些更为具体的实施例,步骤S602基于仿真核酸样本中碱基对应的荧光信号进行光学成像解析处理,得到光学传递函数,可以包括,但不限于下述步骤S701至步骤S704。7 , according to some more specific embodiments of the present disclosure, step S602 performs optical imaging analysis based on the fluorescence signals corresponding to the bases in the simulated nucleic acid sample to obtain the optical transfer function, which may include, but is not limited to, the following steps S701 to S704 .

步骤S701,对仿真核酸样本进行扫描成像,得到扫描核酸图像;Step S701, scanning and imaging the simulated nucleic acid sample to obtain a scanned nucleic acid image;

步骤S702,基于扫描核酸图像进行局部极大值搜索,得到反映每一个荧光信号的荧光图像;Step S702, performing a local maximum search based on the scanned nucleic acid image to obtain a fluorescent image reflecting each fluorescent signal;

步骤S703,将多个荧光信号对应的荧光图像进行高斯拟合平均化处理,得到点扩展函数;Step S703, performing Gaussian fitting and averaging processing on the fluorescence images corresponding to the multiple fluorescence signals to obtain a point spread function;

步骤S704,对点扩展函数进行傅里叶变换,得到光学传递函数。Step S704: Perform Fourier transform on the point spread function to obtain an optical transfer function.

一些实施例的步骤S701,对仿真核酸样本进行扫描成像,得到扫描核酸图像。可以说明的是,扫描成像是指捕获物理对象以在数字环境中准确表示其几何形状的过程。由于光学传递函数用于对光学系统的成像性能进行建模,因此一些实施例可以对仿真核酸样本中具有稀疏点状的区域进行扫描成像,如此一来,可以更为高效地在稀疏点状的仿真核酸样本中,得到所可以的扫描核酸图像。应理解,仿真核酸样本中具有稀疏点状的区域,指的是指核酸分子的间距远大于光学分辨率的区域。In step S701 of some embodiments, the simulated nucleic acid sample is scanned and imaged to obtain a scanned nucleic acid image. It can be explained that scanning imaging refers to the process of capturing a physical object to accurately represent its geometric shape in a digital environment. Since the optical transfer function is used to model the imaging performance of an optical system, some embodiments can scan and image a sparsely dotted area in a simulated nucleic acid sample, so that a scanned nucleic acid image can be obtained more efficiently in a sparsely dotted simulated nucleic acid sample. It should be understood that the sparsely dotted area in a simulated nucleic acid sample refers to an area where the spacing between nucleic acid molecules is much greater than the optical resolution.

一些实施例的步骤S702至步骤S704中,先基于扫描核酸图像进行局部极大值搜索,得到反映每一个荧光信号的荧光图像;再将多个荧光信号对应的荧光图像进行高斯拟合平均化处理,得到点扩展函数;对点扩展函数进行傅里叶变换,得到光学传递函数。可以说明的是,扫描核酸图像还可以通过高斯差分函数滤波后,经由局部极大值搜索,提取出每个碱基对应的单个荧光小球,形成包含多个荧光小球的荧光图像,再经由高斯拟合与平均后得到点扩展函数,再经过傅里叶变换后可获取系统光学传递函数。In steps S702 to S704 of some embodiments, a local maximum search is first performed based on the scanned nucleic acid image to obtain a fluorescent image reflecting each fluorescent signal; then the fluorescent images corresponding to multiple fluorescent signals are subjected to Gaussian fitting and averaging to obtain a point spread function; the point spread function is subjected to Fourier transformation to obtain an optical transfer function. It can be explained that the scanned nucleic acid image can also be filtered by a Gaussian difference function, and then a single fluorescent sphere corresponding to each base is extracted through a local maximum search to form a fluorescent image containing multiple fluorescent spheres, and then a point spread function is obtained through Gaussian fitting and averaging, and then the system optical transfer function can be obtained after Fourier transformation.

通过步骤S701至步骤S704示出的本公开实施例,先对仿真核酸样本进行扫描成像,得到扫描核酸图像;然后基于扫描核酸图像进行局部极大值搜索,得到反映每一个荧光信号的荧光图像;将多个荧光信号对应的荧光图像进行高斯拟合平均化处理,得到点扩展函数;再对点扩展函数进行傅里叶变换,得到光学传递函数。如此得到的光学传递函数,可以更为准确地对光学系统的成像性能进行建模。In the embodiment of the present disclosure shown by steps S701 to S704, the simulated nucleic acid sample is first scanned and imaged to obtain a scanned nucleic acid image; then a local maximum search is performed based on the scanned nucleic acid image to obtain a fluorescent image reflecting each fluorescent signal; the fluorescent images corresponding to multiple fluorescent signals are Gaussian-fitted and averaged to obtain a point spread function; and the point spread function is Fourier transformed to obtain an optical transfer function. The optical transfer function obtained in this way can more accurately model the imaging performance of the optical system.

参照图8(a)至图8(g)提供了本公开一些得到光学传递函数的实施例。可以强调,仿真核酸样本中的核酸分子包括多个碱基,每一个碱基被标记有一个荧光信号。Some embodiments of the present disclosure for obtaining optical transfer functions are provided with reference to Figures 8(a) to 8(g). It can be emphasized that the nucleic acid molecules in the simulated nucleic acid sample include a plurality of bases, each of which is labeled with a fluorescent signal.

图8(a)中,对仿真核酸样本进行扫描成像,得到扫描核酸图像,图8(a)中示出的扫描核酸图像中, 包含装载有仿真核酸样本的测序芯片上离散分布的多个荧光微球;In FIG8(a), the simulated nucleic acid sample is scanned and imaged to obtain a scanned nucleic acid image. In the scanned nucleic acid image shown in FIG8(a), A plurality of fluorescent microspheres discretely distributed on a sequencing chip loaded with a simulated nucleic acid sample;

图8(b)中,在扫描核酸图像的基础上提取荧光微球的所处区域;In FIG8( b ), the region where the fluorescent microspheres are located is extracted based on the scanned nucleic acid image;

图8(c)中,利用过滤算法(例如局部极大值搜索)提取符合要求的稀疏离散的单个荧光小球区域,得到反映每一个荧光信号的荧光图像;In FIG8(c), a filtering algorithm (such as local maximum search) is used to extract a sparse and discrete single fluorescent sphere region that meets the requirements, and a fluorescent image reflecting each fluorescent signal is obtained;

图8(d)示出了单个荧光微球的其中一个荧光图像,图8(e)示出了单个荧光微球的另一个荧光图像;FIG8( d ) shows one of the fluorescence images of a single fluorescent microsphere, and FIG8( e ) shows another fluorescence image of a single fluorescent microsphere;

使用高斯函数对截取的单个荧光小球图像进行拟合,获取图像中心,将多个荧光小球图像对齐平均,也即将多个荧光信号对应的荧光图像进行高斯拟合平均化处理,得到点扩展函数;A Gaussian function is used to fit the intercepted single fluorescent ball image to obtain the image center, and multiple fluorescent ball images are aligned and averaged, that is, the fluorescent images corresponding to multiple fluorescent signals are Gaussian-fitted and averaged to obtain the point spread function;

图8(f)、图8(g)是对点扩展函数进行傅里叶变换,获取到光学传递函数的示意图。FIG8(f) and FIG8(g) are schematic diagrams of performing Fourier transform on the point spread function to obtain the optical transfer function.

一些实施例的步骤S603,将像素尺寸与光学传递函数进行整合,得到模拟的光学标定信息。可以说明的是,像素尺寸用于建立仿真核酸样本到相机成像空间的映射,光学传递函数用于对光学系统的成像性能进行建模,将像素尺寸与光学传递函数进行整合,即可得到仿真光学系统为模拟预设光学测序系统的图像采集过程,进而确定出所可以的模拟的光学标定信息。In step S603 of some embodiments, the pixel size is integrated with the optical transfer function to obtain simulated optical calibration information. It can be explained that the pixel size is used to establish a mapping from the simulated nucleic acid sample to the camera imaging space, and the optical transfer function is used to model the imaging performance of the optical system. By integrating the pixel size with the optical transfer function, the simulated optical system can be obtained as an image acquisition process of a simulated preset optical sequencing system, thereby determining the simulated optical calibration information.

参照图9,根据本公开的一些实施例,步骤S603将像素尺寸与光学传递函数进行整合,得到模拟的光学标定信息,可以包括,但不限于步骤S901至步骤S902。9 , according to some embodiments of the present disclosure, step S603 integrates the pixel size with the optical transfer function to obtain simulated optical calibration information, which may include, but is not limited to, steps S901 to S902 .

步骤S901,对仿真核酸样本中核酸分子之间的空白区域进行噪声提取,得到仿真噪声信息;Step S901, extracting noise from blank areas between nucleic acid molecules in a simulated nucleic acid sample to obtain simulated noise information;

步骤S902,将像素尺寸、光学传递函数与仿真噪声信息进行整合,得到模拟的光学标定信息。Step S902 , integrating the pixel size, the optical transfer function and the simulated noise information to obtain simulated optical calibration information.

通过步骤S901至步骤S902示出的实施例,对仿真核酸样本中核酸分子之间的空白区域进行噪声提取,得到仿真噪声信息,然后将像素尺寸、光学传递函数与仿真噪声信息进行整合,得到模拟的光学标定信息。可以说明的是,仿真核酸样本中核酸分子之间的空白区域并未包含可以测序的核酸,因此核酸分子之间的空白区域适于从中提取噪声,得到仿真噪声信息。本公开实施例旨在将预设光学测序系统进行图像采集时的光学噪声纳入仿真建模的考虑因素,提升光学标定信息对核酸分布信息进行仿真的准确度,得到更高质量的仿真核酸图像。Through the embodiment shown in step S901 to step S902, noise is extracted from the blank areas between the nucleic acid molecules in the simulated nucleic acid sample to obtain simulated noise information, and then the pixel size, optical transfer function and simulated noise information are integrated to obtain simulated optical calibration information. It can be explained that the blank areas between the nucleic acid molecules in the simulated nucleic acid sample do not contain nucleic acids that can be sequenced, so the blank areas between the nucleic acid molecules are suitable for extracting noise therefrom to obtain simulated noise information. The disclosed embodiment aims to incorporate the optical noise of the preset optical sequencing system during image acquisition into the consideration of simulation modeling, improve the accuracy of the optical calibration information in simulating the nucleic acid distribution information, and obtain a higher quality simulated nucleic acid image.

一些较为具体的实施例中,光学噪声的提取可以通过手动提取,也可以通过算法提取。手动提取时,可以通过图像软件可读取追踪线上空白区域信号平均值bg_ave、标准差bg_std和核酸分子区域的极大值平均sig;算法提取时,可以通过算法自动定位追踪线和空白区域并提取相应信号。在噪声仿真时,可以首先生成归一化的、每个像素独立的高斯噪声,高斯噪声的平均值为bg_ave/sig,标准差为bg_std/sig。然后,可以将高斯噪声与仿真信号相加后通过泊松分布函数计算出叠加散粒噪声的像素值。In some more specific embodiments, the optical noise can be extracted manually or by algorithm. When manually extracted, the image software can read the average signal bg_ave, standard deviation bg_std of the blank area on the tracking line, and the maximum average sig of the nucleic acid molecule area; when extracted by algorithm, the algorithm can automatically locate the tracking line and the blank area and extract the corresponding signal. When simulating noise, a normalized Gaussian noise can be generated first, and each pixel is independent. The average value of the Gaussian noise is bg_ave/sig, and the standard deviation is bg_std/sig. Then, the Gaussian noise can be added to the simulated signal and the pixel value of the superimposed shot noise can be calculated by the Poisson distribution function.

参照图10示出的实施例,提供了本公开确定空白区域信号平均值bg_ave、标准差bg_std和核酸分子区域的极大值平均sig的较为具体的实施例。图10中,白色方框为追踪线背景区域,测量该区域信号的平均值和标准差作为bg_ave和bg_std。白色十字指示亮点为核酸分子信号,选取其中10个亮点中心位置测量亮度取平均值为sig。Referring to the embodiment shown in FIG10 , a more specific embodiment of the present disclosure for determining the average value bg_ave, standard deviation bg_std of the blank area signal and the maximum average sig of the nucleic acid molecule area is provided. In FIG10 , the white box is the tracking line background area, and the average value and standard deviation of the signal in this area are measured as bg_ave and bg_std. The white cross indicates that the bright spot is a nucleic acid molecule signal, and the brightness of the center position of 10 bright spots is measured and the average value is sig.

通过步骤S601至步骤S603示出的本公开实施例,先确定预设光学测序系统的像素尺寸;其中,像素尺寸为预设光学测序系统中单个像元对应在成像平面的尺寸;基于仿真核酸样本中碱基对应的荧光信号进行光学成像解析处理,得到光学传递函数;再将像素尺寸与光学传递函数进行整合,得到光学标定信息。并且光学标定信息中的像素尺寸能够用于建立仿真核酸样本到相机成像空间的映射,光学标定信息中的光学传递函数能够用于对光学系统的成像性能进行建模。应理解,如此得到的光学标定信息有助于对核酸分布信息进行更为准确的仿真,得到质量较高的仿真核酸图像。In the embodiment of the present disclosure shown by steps S601 to S603, the pixel size of the preset optical sequencing system is first determined; wherein the pixel size is the size of a single pixel in the preset optical sequencing system corresponding to the imaging plane; optical imaging analysis processing is performed based on the fluorescence signal corresponding to the base in the simulated nucleic acid sample to obtain the optical transfer function; and the pixel size and the optical transfer function are then integrated to obtain optical calibration information. And the pixel size in the optical calibration information can be used to establish a mapping from the simulated nucleic acid sample to the camera imaging space, and the optical transfer function in the optical calibration information can be used to model the imaging performance of the optical system. It should be understood that the optical calibration information obtained in this way helps to simulate the nucleic acid distribution information more accurately and obtain a higher quality simulated nucleic acid image.

一些实施例的步骤S502,基于模拟的光学标定信息对核酸分布信息进行仿真,得到仿真核酸图像。可以说明的是,在对预设光学测序系统的光学标定信息进行模拟之后,即可基于模拟的光学标定信息对核酸分布 信息进行仿真,得到仿真核酸图像。其中,由于模拟的光学标定信息具体是通过对预设光学测序系统进行模拟而得到的,因此仿真得到的仿真核酸图像,能够更为真实地模拟对测序图像的实际采集情况。In step S502 of some embodiments, the nucleic acid distribution information is simulated based on the simulated optical calibration information to obtain a simulated nucleic acid image. It can be explained that after simulating the optical calibration information of the preset optical sequencing system, the nucleic acid distribution information can be simulated based on the simulated optical calibration information. The simulated optical calibration information is simulated to obtain a simulated nucleic acid image. Since the simulated optical calibration information is obtained by simulating a preset optical sequencing system, the simulated nucleic acid image can more realistically simulate the actual collection of sequencing images.

参照图11,根据本公开的一些实施例,步骤S502基于模拟的光学标定信息对核酸分布信息进行仿真,得到仿真核酸图像,可以包括,但不限于下述步骤S1101至步骤S1103。11 , according to some embodiments of the present disclosure, step S502 simulates the nucleic acid distribution information based on the simulated optical calibration information to obtain a simulated nucleic acid image, which may include, but is not limited to, the following steps S1101 to S1103 .

步骤S1101,基于像素尺寸将核酸分布信息映射到成像空间,得到仿真核酸样本在成像空间的第一仿真图像;Step S1101, mapping nucleic acid distribution information to an imaging space based on pixel size to obtain a first simulated image of a simulated nucleic acid sample in the imaging space;

步骤S1102,基于光学传递函数对第一仿真图像进行成像性能仿真,得到第二仿真图像;Step S1102, performing imaging performance simulation on the first simulation image based on the optical transfer function to obtain a second simulation image;

步骤S1103,基于仿真噪声信息对第二仿真图像进行环境噪声仿真,得到仿真核酸图像。Step S1103, performing environmental noise simulation on the second simulated image based on the simulated noise information to obtain a simulated nucleic acid image.

一些实施例的步骤S1101至步骤S1103中,先基于像素尺寸将核酸分布信息映射到成像空间,得到仿真核酸样本在成像空间的第一仿真图像;然后基于光学传递函数对第一仿真图像进行成像性能仿真,得到第二仿真图像;基于仿真噪声信息对第二仿真图像进行环境噪声仿真,得到仿真核酸图像。可以说明的是,像素尺寸用于建立仿真核酸样本到相机成像空间的映射,因此,基于像素尺寸可以将核酸分布信息映射到成像空间,得到仿真核酸样本在成像空间的第一仿真图像。由于光学传递函数用于对光学系统的成像性能进行建模,因此,基于光学传递函数可以对第一仿真图像进行成像性能仿真,得到第二仿真图像。再由于仿真噪声信息是预设光学测序系统进行图像采集时的光学噪声,因此基于仿真噪声信息可以对第二仿真图像进行环境噪声仿真,最终得到仿真核酸图像。应理解,在光学标定信息包含像素尺寸、光学传递函数和仿真噪声信息的基础上,光学标定信息有助于更加准确地对预设光学测序系统进行模拟,如此仿真得到的仿真核酸图像,能够更为真实地模拟对测序图像的实际采集情况。In steps S1101 to S1103 of some embodiments, the nucleic acid distribution information is first mapped to the imaging space based on the pixel size to obtain a first simulated image of the simulated nucleic acid sample in the imaging space; then the imaging performance of the first simulated image is simulated based on the optical transfer function to obtain a second simulated image; the environmental noise of the second simulated image is simulated based on the simulated noise information to obtain a simulated nucleic acid image. It can be explained that the pixel size is used to establish a mapping of the simulated nucleic acid sample to the camera imaging space. Therefore, the nucleic acid distribution information can be mapped to the imaging space based on the pixel size to obtain the first simulated image of the simulated nucleic acid sample in the imaging space. Since the optical transfer function is used to model the imaging performance of the optical system, the imaging performance of the first simulated image can be simulated based on the optical transfer function to obtain the second simulated image. Since the simulated noise information is the optical noise when the preset optical sequencing system performs image acquisition, the environmental noise of the second simulated image can be simulated based on the simulated noise information, and finally the simulated nucleic acid image is obtained. It should be understood that, based on the optical calibration information including pixel size, optical transfer function and simulated noise information, the optical calibration information helps to simulate the preset optical sequencing system more accurately, so that the simulated nucleic acid image obtained by simulation can more realistically simulate the actual acquisition of the sequencing image.

参照图12(a)至图12(c)示出的一些较为具体的实施例,在仿真时,可以首先确定每个成像视野中核酸分子的分布,并生成仿真的核酸分子列表,如图12(a);将核酸分子列表映射到双色或四色成像空间,如图12(b)是将核酸分子列表映射到四色成像空间;再经测得的光学传递函数对映射后的图像进行低通滤波,最后添加仿真噪声,获得接近于核酸分子测序过程中低分辨率的测序图像,也即仿真核酸图像,如图12(c)。Referring to some more specific embodiments shown in Figures 12(a) to 12(c), during simulation, the distribution of nucleic acid molecules in each imaging field of view can be first determined, and a simulated nucleic acid molecule list can be generated, as shown in Figure 12(a); the nucleic acid molecule list can be mapped to a two-color or four-color imaging space, as shown in Figure 12(b), which is mapping the nucleic acid molecule list to a four-color imaging space; the mapped image can then be low-pass filtered using the measured optical transfer function, and finally simulated noise can be added to obtain a sequencing image that is close to the low-resolution during nucleic acid molecule sequencing, that is, a simulated nucleic acid image, as shown in Figure 12(c).

一些更为具体的实施例中,在进行多轮仿真时,根据系统实际拍摄图像提取实际的信号与噪声水平,还可以仿真长周期测序中的信噪比下降。In some more specific embodiments, when performing multiple rounds of simulation, the actual signal and noise levels are extracted based on the images actually captured by the system, and the decrease in signal-to-noise ratio in long-cycle sequencing can also be simulated.

经过步骤S501至步骤S502示出的本公开实施例,先对预设光学测序系统的光学标定信息进行模拟,再基于模拟的光学标定信息对核酸分布信息进行仿真,得到仿真核酸图像。通过对预设光学测序系统模拟得到的光学标定信息,将其用于仿真核酸图像的仿真有助于增进仿真效果,使得仿真核酸图像能够更为真实地模拟对测序图像的实际采集情况。Through the embodiment of the present disclosure shown in step S501 to step S502, the optical calibration information of the preset optical sequencing system is first simulated, and then the nucleic acid distribution information is simulated based on the simulated optical calibration information to obtain a simulated nucleic acid image. The optical calibration information obtained by simulating the preset optical sequencing system is used to simulate the simulated nucleic acid image, which helps to improve the simulation effect, so that the simulated nucleic acid image can more realistically simulate the actual collection of the sequencing image.

一些实施例的步骤S403,将核酸分子真值图像、与核酸分子真值图像对应的仿真核酸图像输入原始的图像处理模型,对图像处理模型进行迭代训练。可以说明的是,仿真核酸图像是基于核酸分子真值图像对应的仿真核酸样本仿真得到的图像,能够较为真实地模拟对测序图像的采集情况,将仿真核酸图像输入原始的图像处理模型进行训练,有助于使得训练出来的图像处理模型能够适应于对测序图像进行超分辨率处理;核酸分子真值图像能够反映仿真核酸样本中碱基序列真实、确切的排布,将仿真核酸图像输入原始的图像处理模型进行训练,能够提升图像处理模型的超分辨率处理能力。因此,将核酸分子真值图像、与核酸分子真值图像对应的仿真核酸图像输入原始的图像处理模型,对图像处理模型进行迭代训练,能够在迭代训练中,持续提升图像处理模型针对图像的超分辨率处理能力,并且使得图像处理模型能够适应于对测序图像进行超分辨率处理。In step S403 of some embodiments, the true value image of nucleic acid molecules and the simulated nucleic acid image corresponding to the true value image of nucleic acid molecules are input into the original image processing model, and the image processing model is iteratively trained. It can be explained that the simulated nucleic acid image is an image obtained by simulating the simulated nucleic acid sample corresponding to the true value image of nucleic acid molecules, which can simulate the acquisition of sequencing images more realistically. Inputting the simulated nucleic acid image into the original image processing model for training helps to make the trained image processing model adaptable to super-resolution processing of sequencing images; the true value image of nucleic acid molecules can reflect the real and accurate arrangement of base sequences in the simulated nucleic acid sample, and inputting the simulated nucleic acid image into the original image processing model for training can improve the super-resolution processing capability of the image processing model. Therefore, inputting the true value image of nucleic acid molecules and the simulated nucleic acid image corresponding to the true value image of nucleic acid molecules into the original image processing model, and iteratively training the image processing model can continuously improve the super-resolution processing capability of the image processing model for images during iterative training, and make the image processing model adaptable to super-resolution processing of sequencing images.

参照图13,根据本公开的一些实施例,步骤S403将核酸分子真值图像、与核酸分子真值图像对应的仿真核酸图像输入原始的图像处理模型,对图像处理模型进行迭代训练,可以包括,但不限于下述步骤S1301至 步骤S1303:13, according to some embodiments of the present disclosure, step S403 inputs the true value image of the nucleic acid molecule and the simulated nucleic acid image corresponding to the true value image of the nucleic acid molecule into the original image processing model, and iteratively trains the image processing model, which may include, but is not limited to, the following steps S1301 to S1302. Step S1303:

步骤S1301,每一轮迭代训练中,将仿真核酸图像输入图像处理模型进行图像处理训练,得到本轮处理结果;Step S1301, in each round of iterative training, the simulated nucleic acid image is input into the image processing model for image processing training to obtain the processing result of this round;

步骤S1302,将本轮处理结果与核酸分子真值图像进行比对,得到训练偏差数据;Step S1302, comparing the processing result of this round with the true value image of the nucleic acid molecule to obtain training deviation data;

步骤S1303,基于训练偏差数据,对图像处理模型的权重参数进行更新。Step S1303, updating the weight parameters of the image processing model based on the training deviation data.

一些实施例的步骤S1301,每一轮迭代训练中,将仿真核酸图像输入图像处理模型进行图像处理训练,得到本轮处理结果。可以说明的是,由于仿真核酸图像是基于核酸分子真值图像对应的仿真核酸样本仿真得到的图像,能够较为真实地模拟对测序图像的采集情况,并且数轮迭代训练旨在对图像处理模型的超分辨率处理能力进行优化,因此每一轮迭代训练中,都可以将仿真核酸图像输入图像处理模型进行图像处理训练,得到本轮处理结果。In step S1301 of some embodiments, in each round of iterative training, the simulated nucleic acid image is input into the image processing model for image processing training to obtain the processing result of this round. It can be explained that since the simulated nucleic acid image is an image obtained by simulating the simulated nucleic acid sample corresponding to the true value image of the nucleic acid molecule, it can more realistically simulate the collection of sequencing images, and several rounds of iterative training are intended to optimize the super-resolution processing capability of the image processing model, therefore, in each round of iterative training, the simulated nucleic acid image can be input into the image processing model for image processing training to obtain the processing result of this round.

一些实施例的步骤S1302,将本轮处理结果与核酸分子真值图像进行比对,得到训练偏差数据。可以说明的是,核酸分子真值图像能够反映仿真核酸样本中碱基序列真实、确切的排布,因此每一轮迭代训练中得到的本轮处理结果,都可以与核酸分子真值图像进行比对。从而得到本轮处理结果与核酸分子真值图像之间的训练偏差数据。可以指出,当本轮处理结果与核酸分子真值图像之间的训练偏差数据反映本轮处理结果与核酸分子真值图像越接近,则意味着图像处理模型提升分辨率的效果越好,图像处理模型针对图像的超分辨率处理能力也即越强。In step S1302 of some embodiments, the processing results of this round are compared with the true image of the nucleic acid molecule to obtain training deviation data. It can be explained that the true image of the nucleic acid molecule can reflect the real and accurate arrangement of the base sequence in the simulated nucleic acid sample, so the processing results of this round obtained in each round of iterative training can be compared with the true image of the nucleic acid molecule. Thereby, the training deviation data between the processing results of this round and the true image of the nucleic acid molecule is obtained. It can be pointed out that when the training deviation data between the processing results of this round and the true image of the nucleic acid molecule reflects that the processing results of this round are closer to the true image of the nucleic acid molecule, it means that the image processing model has a better effect in improving the resolution, and the image processing model has a stronger super-resolution processing capability for the image.

一些实施例的步骤S1303,基于训练偏差数据,对图像处理模型的权重参数进行更新。可以说明的是,在得到训练偏差数据之后,即可对图像处理模型的权重参数进行更新。可以说明的是,神经网络模型中一般有两类参数:一类参数是机器学习算法中的调优参数(Tuning Parameters),可以根据已有或现有的经验灵活设定,也称为超参数(gyperparameter)。比如,正则化系数λ,决策树模型中树的深度。超参数也是一种参数,它具有参数的特性,比如未知性,也就是它不是一个已知常量,而是一种可配置的数值,可以为它根据已有或现有的经验指定“正确”的值,也就是灵活设定的一个值,它不是通过系统学习得到的;另一类参数则可以从数据中学习和估计得到,称为模型参数(Parameter),也就是模型本身的可学习参数。比如,线性回归直线的加权系数(斜率)及其偏差项(截距)都是模型参数。可学习参数具体是指在神经网络模型训练过程中学习的参数值,对于可学习的参数,通常从一组随机值开始,然后随着神经网络模型的学习,以迭代的方式更新这些值。事实上,“神经网络模型进行学习”的时候,更为准确的意思是神经网络模型的参数正处于迭代更新的过程中,逐渐确定这些参数的适当值。可以指出,所谓适当值,可以是使损失函数最小化或者收敛的值。因此,本公开一些实施例中为了在数据分类模型处于迭代更新的过程中,逐渐确定这些模型参数的适当值,可以基于训练偏差数据,对图像处理模型的权重参数进行更新。In step S1303 of some embodiments, the weight parameters of the image processing model are updated based on the training deviation data. It can be explained that after obtaining the training deviation data, the weight parameters of the image processing model can be updated. It can be explained that there are generally two types of parameters in the neural network model: one type of parameter is the tuning parameters in the machine learning algorithm, which can be flexibly set according to existing or existing experience, also known as hyperparameters. For example, the regularization coefficient λ, the depth of the tree in the decision tree model. A hyperparameter is also a parameter, which has the characteristics of a parameter, such as unknown, that is, it is not a known constant, but a configurable value, which can be assigned a "correct" value based on existing or existing experience, that is, a flexibly set value, which is not obtained through system learning; another type of parameter can be learned and estimated from the data, called a model parameter, that is, a learnable parameter of the model itself. For example, the weight coefficient (slope) and the deviation term (intercept) of the linear regression line are both model parameters. Learnable parameters specifically refer to parameter values learned during the training process of the neural network model. For learnable parameters, they usually start from a set of random values, and then update these values in an iterative manner as the neural network model learns. In fact, when the "neural network model is learning", it is more accurate to mean that the parameters of the neural network model are in the process of iterative updating, and the appropriate values of these parameters are gradually determined. It can be pointed out that the so-called appropriate value can be a value that minimizes or converges the loss function. Therefore, in some embodiments of the present disclosure, in order to gradually determine the appropriate values of these model parameters while the data classification model is in the process of iterative updating, the weight parameters of the image processing model can be updated based on the training bias data.

通过步骤S1301至步骤S1303示出的本公开实施例,每一轮迭代训练中,将仿真核酸图像输入图像处理模型进行图像处理训练,得到本轮处理结果,进而将本轮处理结果与核酸分子真值图像进行比对,得到训练偏差数据,再基于训练偏差数据,对图像处理模型的权重参数进行更新。如此一来,便可以在数轮迭代训练中,持续优化图像处理模型针对图像的超分辨率处理能力。In the embodiment of the present disclosure shown by steps S1301 to S1303, in each round of iterative training, the simulated nucleic acid image is input into the image processing model for image processing training to obtain the processing result of this round, and then the processing result of this round is compared with the true value image of the nucleic acid molecule to obtain the training deviation data, and then the weight parameters of the image processing model are updated based on the training deviation data. In this way, the super-resolution processing capability of the image processing model for images can be continuously optimized in several rounds of iterative training.

一些实施例的步骤S404,当图像处理模型在迭代训练中符合第一预定条件,得到训练后的图像处理模型。可以说明的是,当图像处理模型在迭代训练中符合第一预定条件,意味着图像处理模型针对图像的超分辨率处理能力已经达到了可以实际应用的预期水平,并且图像处理模型能够适应于对测序图像进行超分辨率处理。在此情况下,即可结束对图像处理模型的迭代训练,得到训练后的图像处理模型。In step S404 of some embodiments, when the image processing model meets the first predetermined condition in iterative training, a trained image processing model is obtained. It can be explained that when the image processing model meets the first predetermined condition in iterative training, it means that the image processing model's super-resolution processing capability for images has reached the expected level for practical application, and the image processing model can be adapted to super-resolution processing of sequencing images. In this case, the iterative training of the image processing model can be terminated to obtain the trained image processing model.

一些实施例中,图像处理模型在迭代训练中符合第一预定条件,可以是图像处理模型在迭代训练中对仿真核酸图像进行图像处理训练之后,处理结果达到核酸分子真值图像的分辨率水平;图像处理模型在迭代训 练中符合第一预定条件,也可以是图像处理模型在迭代训练中,针对图像的超分辨率处理能力收敛于一定水平。In some embodiments, the image processing model meets the first predetermined condition in the iterative training, which may be that after the image processing model performs image processing training on the simulated nucleic acid image in the iterative training, the processing result reaches the resolution level of the true value image of the nucleic acid molecule; The first predetermined condition is met during training, or the image processing model's super-resolution processing capability for images converges to a certain level during iterative training.

根据本公开的一些较为具体的实施例,步骤S404当图像处理模型在迭代训练中符合第一预定条件,得到训练后的图像处理模型,可以包括,但不限于:当训练偏差数据反映图像处理模型在迭代训练中收敛,确定图像处理模型在迭代训练中符合第一预定条件,得到训练后的图像处理模型。可以说明的是,当训练偏差数据反映图像处理模型在迭代训练中收敛,意味着图像处理模型针对图像的超分辨率处理能力已经优化到当前的一个极限水平,难以继续提高,此时可以确定图像处理模型在迭代训练中符合第一预定条件,得到训练后的图像处理模型。According to some more specific embodiments of the present disclosure, step S404, when the image processing model meets the first predetermined condition in iterative training, the trained image processing model is obtained, which may include, but is not limited to: when the training deviation data reflects that the image processing model converges in the iterative training, it is determined that the image processing model meets the first predetermined condition in the iterative training, and the trained image processing model is obtained. It can be explained that when the training deviation data reflects that the image processing model converges in the iterative training, it means that the super-resolution processing capability of the image processing model for the image has been optimized to a current limit level and is difficult to continue to improve. At this time, it can be determined that the image processing model meets the first predetermined condition in the iterative training, and the trained image processing model is obtained.

应理解,确定图像处理模型在迭代训练中符合第一预定条件的实施例多种多样,可以包括,但不限于上述举出的具体实施例。It should be understood that there are various embodiments for determining whether an image processing model meets the first predetermined condition during iterative training, which may include, but are not limited to, the specific embodiments listed above.

通过步骤S401至步骤S404示出的本公开实施例,先基于仿真核酸样本构建核酸分子真值图像;再通过仿真光学系统基于仿真核酸样本进行仿真测序,得到仿真核酸图像;将核酸分子真值图像、与核酸分子真值图像对应的仿真核酸图像输入原始的图像处理模型,对图像处理模型进行迭代训练;当图像处理模型在迭代训练中符合第一预定条件,得到训练后的图像处理模型。如此一来,便可以在对图像处理模型进行训练的过程中,提升图像处理模型针对图像的超分辨率处理能力,直至图像处理模型针对图像的超分辨率处理能力已经达到了可以实际应用的预期水平,并且使得图像处理模型能够适应于对测序图像进行超分辨率处理。在此基础上,将训练后的图像处理模型用于对测序图像进行图像处理,有助于实现图像分辨率的提升,有助于在核酸分子的间距小于成像系统的分辨率时,减少相邻核酸分子的荧光信号带来的串扰影响,如此便可以在对核酸分子进行测序的过程中有效提升碱基判读的准确率。According to the disclosed embodiment shown in step S401 to step S404, a true image of nucleic acid molecules is first constructed based on a simulated nucleic acid sample; then a simulated sequencing is performed based on the simulated nucleic acid sample through a simulated optical system to obtain a simulated nucleic acid image; the true image of nucleic acid molecules and the simulated nucleic acid image corresponding to the true image of nucleic acid molecules are input into the original image processing model, and the image processing model is iteratively trained; when the image processing model meets the first predetermined condition in the iterative training, a trained image processing model is obtained. In this way, the image processing model can be improved in the process of training the image processing model for super-resolution processing of images, until the image processing model for super-resolution processing of images has reached the expected level that can be actually applied, and the image processing model can be adapted to super-resolution processing of sequencing images. On this basis, the trained image processing model is used for image processing of sequencing images, which helps to improve the image resolution, and helps to reduce the crosstalk caused by the fluorescent signals of adjacent nucleic acid molecules when the spacing between nucleic acid molecules is less than the resolution of the imaging system, so that the accuracy of base calling can be effectively improved in the process of sequencing nucleic acid molecules.

接下来提供两个较为具体的实施例,佐以说明本公开核酸分子测序方法的技术效果。Next, two more specific embodiments are provided to illustrate the technical effects of the nucleic acid molecule sequencing method disclosed in the present invention.

具体实施例一:在测序仪光机上对多间距的核酸分子芯片进行测序。Specific embodiment 1: sequencing a multi-spaced nucleic acid molecule chip on a sequencer optical machine.

测序仪采用0.8NA空气镜,光学分辨率约为500-600nm,采样分辨率为566nm,正常使用时采用的核酸分子间距为715nm,对大肠杆菌基因组文库进行测序时,单链100轮(SE100)比对率可达90%以上。在此基础上,本具体实施例将对多间距测序芯片进行测序(576nm,480nm,450nm)。The sequencer uses a 0.8NA air mirror, with an optical resolution of about 500-600nm, a sampling resolution of 566nm, and a nucleic acid molecule spacing of 715nm in normal use. When sequencing the E. coli genome library, the single-stranded 100-round (SE100) alignment rate can reach more than 90%. On this basis, this specific embodiment will sequence a multi-spacing sequencing chip (576nm, 480nm, 450nm).

首先,根据芯片模板生成576nm,480nm,450nm间距的核酸分子图案,生成多轮测序的核酸分子真值图像和对应的仿真核酸图像,训练图像处理模型。以480nm为例,芯片模板每个成像视野FoV包含10*10共100个block,每个block包含的核酸分子的行/列数为145,120,180,210,240,240,210,180,120,145,相邻block间追踪线宽度为1440nm。首先根据模板生成仿真所需的核酸分子列表,根据测量得到的像素尺寸566nm将核酸分子映射到对应的四色成像空间中。First, nucleic acid molecule patterns with spacing of 576nm, 480nm, and 450nm are generated according to the chip template, and the true value images of nucleic acid molecules and the corresponding simulated nucleic acid images of multiple rounds of sequencing are generated to train the image processing model. Taking 480nm as an example, each imaging field of view FoV of the chip template contains 10*10 blocks with a total of 100 blocks. The number of rows/columns of nucleic acid molecules contained in each block is 145, 120, 180, 210, 240, 240, 210, 180, 120, 145, and the tracking line width between adjacent blocks is 1440nm. First, a list of nucleic acid molecules required for simulation is generated according to the template, and the nucleic acid molecules are mapped to the corresponding four-color imaging space according to the measured pixel size of 566nm.

在测序仪上对测序芯片上的核酸分子进行测序,拍摄多轮测序图像。每轮图像提取图像信噪比,根据系统光学传递函数和信噪比生成仿真数据对,用于训练图像处理模型。The nucleic acid molecules on the sequencing chip are sequenced on the sequencer, and multiple rounds of sequencing images are taken. The image signal-to-noise ratio is extracted for each round of images, and simulation data pairs are generated based on the system optical transfer function and signal-to-noise ratio for training the image processing model.

再测序图像经图像处理模型进行图像处理后,对目标图像进行碱基判读与比对。After the resequencing image is processed by the image processing model, the bases of the target image are called and compared.

参照图14,测序结果显示,利用本公开核酸分子测序方法对多间距的核酸分子芯片进行测序后,576nm间距核酸分子的比对率可达90%,480nm间距核酸分子的比对率可达75%,相较传统算法提升18%。14 , the sequencing results show that after sequencing a multi-spacing nucleic acid molecule chip using the nucleic acid molecule sequencing method disclosed herein, the matching rate of nucleic acid molecules with a spacing of 576 nm can reach 90%, and the matching rate of nucleic acid molecules with a spacing of 480 nm can reach 75%, which is 18% higher than the traditional algorithm.

具体实施例二:在自行搭建的高采样光机上对多间距核酸分子芯片进行测序。Specific embodiment 2: sequencing a multi-spacing nucleic acid molecule chip on a self-built high-sampling optical machine.

自行搭建的高采样光机采用0.8NA空气镜,光学分辨率约为500-600nm,采样分辨率为260nm,正常使用时采用的核酸分子间距为715nm。The self-built high-sampling optical machine uses a 0.8NA air mirror, with an optical resolution of approximately 500-600nm and a sampling resolution of 260nm. The nucleic acid molecular spacing used in normal use is 715nm.

利用本公开核酸分子测序方法对大肠杆菌基因组文库进行测序时,单链100轮(SE100)比对率可达90%以上。利用本公开核酸分子测序方法对多间距测序芯片进行测序(576nm,480nm,450nm,400nm,360nm)。实 施步骤参照前述具体实施例一。When the E. coli genome library is sequenced using the nucleic acid molecular sequencing method disclosed in the present invention, the single-strand 100 round (SE100) alignment rate can reach more than 90%. The nucleic acid molecular sequencing method disclosed in the present invention is used to sequence a multi-spacing sequencing chip (576nm, 480nm, 450nm, 400nm, 360nm). The implementation steps refer to the above-mentioned specific embodiment 1.

参照图15,最终测序结果显示,利用该技术方案所述算法进行处理相较传统算法,576nm间距、480nm间距、450nm间距、400nm间距和360nm间距核酸分子的比对率分别提升10%、14%、18%、17%和24%。Referring to Figure 15, the final sequencing results show that compared with the traditional algorithm, the matching rates of nucleic acid molecules with 576nm spacing, 480nm spacing, 450nm spacing, 400nm spacing and 360nm spacing using the algorithm described in this technical solution are increased by 10%, 14%, 18%, 17% and 24% respectively.

参照图16,根据一些实施例,本公开提供一种核酸分子测序装置1600,包括:16 , according to some embodiments, the present disclosure provides a nucleic acid molecule sequencing device 1600, comprising:

图像获取模块1601,用于获取目标核酸样本的测序图像,测序图像为采用预设光学测序系统对目标核酸样本进行图像采集得到的图像;An image acquisition module 1601 is used to acquire a sequencing image of a target nucleic acid sample, where the sequencing image is an image acquired by acquiring an image of the target nucleic acid sample using a preset optical sequencing system;

图像处理模块1602,用于基于图像处理模型对测序图像进行图像处理,得到目标图像,图像处理模型为采用构建的核酸分子真值图像以及对应的仿真核酸图像训练得到的模型,仿真核酸图像为基于仿真光学系统对核酸分子真值图像对应的仿真核酸样本进行仿真测序得到的图像,仿真光学系统为对预设光学测序系统进行仿真得到的光学系统;An image processing module 1602 is used to perform image processing on the sequencing image based on an image processing model to obtain a target image, wherein the image processing model is a model trained using a constructed true value image of a nucleic acid molecule and a corresponding simulated nucleic acid image, wherein the simulated nucleic acid image is an image obtained by performing simulated sequencing on a simulated nucleic acid sample corresponding to the true value image of the nucleic acid molecule based on a simulated optical system, and the simulated optical system is an optical system obtained by simulating a preset optical sequencing system;

序列测定模块1603,用于根据目标图像进行核酸分子测序,得到目标核酸样本对应的测序结果。The sequence determination module 1603 is used to perform nucleic acid molecule sequencing according to the target image to obtain a sequencing result corresponding to the target nucleic acid sample.

可见,上述核酸分子测序方法实施例中的内容均适用于本核酸分子测序装置的实施例中,本核酸分子测序装置实施例所具体实现的功能与上述核酸分子测序方法实施例相同,并且达到的有益效果与上述核酸分子测序方法实施例所达到的有益效果也相同。It can be seen that the contents in the above-mentioned nucleic acid molecule sequencing method embodiment are all applicable to the embodiment of the present nucleic acid molecule sequencing device. The functions specifically implemented by the present nucleic acid molecule sequencing device embodiment are the same as those in the above-mentioned nucleic acid molecule sequencing method embodiment, and the beneficial effects achieved are also the same as those achieved by the above-mentioned nucleic acid molecule sequencing method embodiment.

参照图17,图17示意了另一实施例的电子设备的硬件结构,电子设备包括:Referring to FIG. 17 , FIG. 17 illustrates a hardware structure of an electronic device according to another embodiment. The electronic device includes:

处理器1701,可以采用通用的CPU(Central ProcessingUnit,中央处理器)、微处理器、应用专用集成电路(ApplicationSpecificIntegratedCircuit,ASIC)、或者一个或多个集成电路等方式实现用于执行相关程序,以实现本公开实施例所提供的技术方案;Processor 1701 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits to execute related programs to implement the technical solutions provided by the embodiments of the present disclosure;

存储器1702,可以采用只读存储器(ReadOnlyMemory,ROM)、静态存储设备、动态存储设备或者随机存取存储器(RandomAccessMemory,RAM)等形式实现。存储器1702可以存储操作系统和其他应用程序,在通过软件或者固件来实现本说明书实施例所提供的技术方案时,相关的程序代码保存在存储器1702中,并由处理器1701来调用执行本公开实施例的核酸分子测序方法;The memory 1702 may be implemented in the form of a read-only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM). The memory 1702 may store an operating system and other application programs. When the technical solutions provided in the embodiments of this specification are implemented by software or firmware, the relevant program codes are stored in the memory 1702, and are called by the processor 1701 to execute the nucleic acid molecule sequencing method of the embodiments of the present disclosure;

输入/输出接口1703,用于实现信息输入及输出;Input/output interface 1703, used to implement information input and output;

通信接口1704,用于实现本设备与其他设备的通信交互,可以通过有线方式(例如USB、网线等)实现通信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信;Communication interface 1704, used to realize communication interaction between the device and other devices, which can be realized through wired mode (such as USB, network cable, etc.) or wireless mode (such as mobile network, WIFI, Bluetooth, etc.);

总线1705,在设备的每个组件(例如处理器1701、存储器1702、输入/输出接口1703和通信接口1704)之间传输信息;A bus 1705 that transmits information between each component of the device (e.g., the processor 1701, the memory 1702, the input/output interface 1703, and the communication interface 1704);

其中处理器1701、存储器1702、输入/输出接口1703和通信接口1704通过总线1705实现彼此之间在设备内部的通信连接。The processor 1701 , the memory 1702 , the input/output interface 1703 and the communication interface 1704 are connected to each other in communication within the device via the bus 1705 .

本公开实施例还提供了一种计算机程序产品,该计算机程序产品包括计算机程序。计算机设备的处理器读取该计算机程序并执行,使得该计算机设备执行实现上述的核酸分子测序方法。The present disclosure also provides a computer program product, which includes a computer program. A processor of a computer device reads and executes the computer program, so that the computer device executes and implements the above-mentioned nucleic acid molecule sequencing method.

本公开的说明书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。可以理解这样使用的数据在适当情况下可以互换,以便这里描述的本公开的实施例例如能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“包含”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或装置不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或装置固有的其它步骤或单元。The terms "first", "second", "third", "fourth", etc. (if any) in the specification of the present disclosure and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It is understood that the data used in this way can be interchangeable where appropriate, so that the embodiments of the present disclosure described herein can, for example, be implemented in an order other than those illustrated or described herein. In addition, the terms "comprises" and "comprising" and any variations thereof are intended to cover non-exclusive inclusions, for example, a process, method, system, product or device that includes a series of steps or units is not necessarily limited to those steps or units clearly listed, but may include other steps or units that are not clearly listed or inherent to these processes, methods, products or devices.

应当理解,在本公开中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B 以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。It should be understood that in this disclosure, "at least one (item)" means one or more, and "more" means two or more. "And/or" is used to describe the association relationship of associated objects, indicating that three relationships may exist. For example, "A and/or B" can mean: only A exists, only B exists. And there are three situations where A and B exist at the same time, where A and B can be singular or plural. The character "/" generally indicates that the objects before and after are in an "or" relationship. "At least one of the following" or similar expressions refers to any combination of these items, including any combination of single or plural items. For example, at least one of a, b or c can mean: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", where a, b, c can be single or plural.

应了解,在本公开实施例的描述中,多个(或多项)的含义是两个以上,大于、小于、超过等理解为不包括本数,以上、以下、以内等理解为包括本数。It should be understood that in the description of the embodiments of the present disclosure, the meaning of multiple (or multiple items) is more than two, greater than, less than, exceed, etc. are understood to not include the number, and above, below, within, etc. are understood to include the number.

在本公开所提供的几个实施例中,可以理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in the present disclosure, it can be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of units is only a logical function division. There may be other division methods in actual implementation. For example, multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be an indirect coupling or communication connection through some interfaces, devices or units, which can be electrical, mechanical or other forms.

作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的可以选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外,在本公开每个实施例中的各功能单元可以集成在一个处理单元中,也可以是每个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist physically separately, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.

集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括多个指令用以使得一台计算机装置(可以是个人计算机,服务器,或者网络装置等)执行本公开每个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,简称ROM)、随机存取存储器(Random Access Memory,简称RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present disclosure, or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including multiple instructions for a computer device (which can be a personal computer, server, or network device, etc.) to perform all or part of the steps of each embodiment method of the present disclosure. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), disk or optical disk, and other media that can store program codes.

还应了解,本公开实施例提供的各种实施方式可以任意进行组合,以实现不同的技术效果。It should also be understood that the various implementations provided in the embodiments of the present disclosure can be combined arbitrarily to achieve different technical effects.

以上是对本公开的实施方式的具体说明,但本公开并不局限于上述实施方式,熟悉本领域的技术人员在不违背本公开精神的条件下还可作出种种等同的变形或替换,这些等同的变形或替换均包括在本公开权利要求所限定的范围内。 The above is a specific description of the implementation methods of the present disclosure, but the present disclosure is not limited to the above implementation methods. Technical personnel familiar with the art can also make various equivalent modifications or substitutions without violating the spirit of the present disclosure. These equivalent modifications or substitutions are all included in the scope defined by the claims of the present disclosure.

Claims (13)

[根据细则91更正 15.07.2024]
一种核酸分子测序方法,其特征在于,包括:
[Corrected 15.07.2024 in accordance with Article 91]
A method for nucleic acid molecule sequencing, comprising:
获取目标核酸样本的测序图像,所述测序图像为采用预设光学测序系统对所述目标核酸样本进行图像采集得到的图像;Acquire a sequencing image of the target nucleic acid sample, wherein the sequencing image is an image acquired by collecting an image of the target nucleic acid sample using a preset optical sequencing system; 基于图像处理模型对所述测序图像进行图像处理,得到目标图像,所述图像处理模型为采用构建的核酸分子真值图像以及对应的仿真核酸图像训练得到的模型,所述仿真核酸图像为基于仿真光学系统对所述核酸分子真值图像对应的仿真核酸样本进行仿真测序得到的图像,所述仿真光学系统为对所述预设光学测序系统进行仿真得到的光学系统;Performing image processing on the sequencing image based on an image processing model to obtain a target image, wherein the image processing model is a model trained using a constructed true value image of nucleic acid molecules and a corresponding simulated nucleic acid image, wherein the simulated nucleic acid image is an image obtained by performing simulated sequencing on a simulated nucleic acid sample corresponding to the true value image of nucleic acid molecules based on a simulated optical system, and the simulated optical system is an optical system obtained by simulating the preset optical sequencing system; 根据所述目标图像进行核酸分子测序,得到所述目标核酸样本对应的测序结果。Nucleic acid molecule sequencing is performed according to the target image to obtain a sequencing result corresponding to the target nucleic acid sample.
根据权利要求1所述的方法,其特征在于,所述基于图像处理模型对所述测序图像进行图像处理,得到目标图像之前,还包括对所述图像处理模型进行训练,包括:The method according to claim 1, characterized in that, before performing image processing on the sequencing image based on the image processing model to obtain the target image, it also includes training the image processing model, including: 基于所述仿真核酸样本构建所述核酸分子真值图像;Constructing the true value image of the nucleic acid molecule based on the simulated nucleic acid sample; 通过所述仿真光学系统基于所述仿真核酸样本进行仿真测序,得到所述仿真核酸图像;Performing simulated sequencing based on the simulated nucleic acid sample by the simulated optical system to obtain the simulated nucleic acid image; 将所述核酸分子真值图像、与所述核酸分子真值图像对应的所述仿真核酸图像输入原始的所述图像处理模型,对所述图像处理模型进行迭代训练;Inputting the true value image of the nucleic acid molecule and the simulated nucleic acid image corresponding to the true value image of the nucleic acid molecule into the original image processing model, and iteratively training the image processing model; 当所述图像处理模型在迭代训练中符合第一预定条件,得到训练后的所述图像处理模型。When the image processing model meets the first predetermined condition during iterative training, the trained image processing model is obtained. 根据权利要求2所述的方法,其特征在于,所述通过所述仿真光学系统基于所述仿真核酸样本进行仿真测序,得到所述仿真核酸图像,包括:The method according to claim 2, characterized in that the step of performing simulated sequencing based on the simulated nucleic acid sample by the simulated optical system to obtain the simulated nucleic acid image comprises: 对所述预设光学测序系统的光学标定信息进行模拟;Simulating optical calibration information of the preset optical sequencing system; 基于模拟的所述光学标定信息对所述核酸分布信息进行仿真,得到所述仿真核酸图像。The nucleic acid distribution information is simulated based on the simulated optical calibration information to obtain the simulated nucleic acid image. 根据权利要求3所述的方法,其特征在于,所述仿真核酸样本中的核酸分子包括多个碱基,每一个所述碱基被标记有一个荧光信号;The method according to claim 3, characterized in that the nucleic acid molecule in the simulated nucleic acid sample comprises a plurality of bases, each of which is marked with a fluorescent signal; 所述对所述预设光学测序系统的光学标定信息进行模拟,包括:The simulating the optical calibration information of the preset optical sequencing system includes: 确定所述预设光学测序系统的像素尺寸;其中,所述像素尺寸为所述预设光学测序系统中单个像元对应在成像平面的尺寸;Determine the pixel size of the preset optical sequencing system; wherein the pixel size is the size of a single pixel in the preset optical sequencing system corresponding to the imaging plane; 基于所述仿真核酸样本中所述碱基对应的所述荧光信号进行光学成像解析处理,得到光学传递函数;Performing optical imaging analysis based on the fluorescence signal corresponding to the base in the simulated nucleic acid sample to obtain an optical transfer function; 将所述像素尺寸与所述光学传递函数进行整合,得到模拟的所述光学标定信息。The pixel size is integrated with the optical transfer function to obtain the simulated optical calibration information. 根据权利要求4所述的方法,其特征在于,所述基于所述仿真核酸样本中所述碱基对应的所述荧光信号进行光学成像解析处理,得到光学传递函数,包括:The method according to claim 4, characterized in that the optical imaging analysis processing based on the fluorescence signal corresponding to the base in the simulated nucleic acid sample to obtain the optical transfer function comprises: 对所述仿真核酸样本进行扫描成像,得到扫描核酸图像;Scanning and imaging the simulated nucleic acid sample to obtain a scanned nucleic acid image; 基于所述扫描核酸图像进行局部极大值搜索,得到反映每一个所述荧光信号的荧光图像;Performing a local maximum search based on the scanned nucleic acid image to obtain a fluorescent image reflecting each of the fluorescent signals; 将多个所述荧光信号对应的所述荧光图像进行高斯拟合平均化处理,得到点扩展函数;Performing Gaussian fitting and averaging processing on the fluorescence images corresponding to the multiple fluorescence signals to obtain a point spread function; 对所述点扩展函数进行傅里叶变换,得到所述光学传递函数。Performing Fourier transform on the point spread function to obtain the optical transfer function. 根据权利要求4所述的方法,其特征在于,所述将所述像素尺寸与所述光学传递函数进行整合,得到模拟的所述光学标定信息,包括:The method according to claim 4, characterized in that the step of integrating the pixel size with the optical transfer function to obtain the simulated optical calibration information comprises: 对所述仿真核酸样本中所述核酸分子之间的空白区域进行噪声提取,得到仿真噪声信息;Extracting noise from blank areas between the nucleic acid molecules in the simulated nucleic acid sample to obtain simulated noise information; 将所述像素尺寸、所述光学传递函数与所述仿真噪声信息进行整合,得到模拟的所述光学标定信息。The pixel size, the optical transfer function and the simulated noise information are integrated to obtain the simulated optical calibration information. 根据权利要求6所述的方法,其特征在于,所述基于模拟的所述光学标定信息对所述核酸分布信息进行仿真,得到所述仿真核酸图像,包括:The method according to claim 6, characterized in that the simulating the nucleic acid distribution information based on the simulated optical calibration information to obtain the simulated nucleic acid image comprises: 基于所述像素尺寸将所述核酸分布信息映射到成像空间,得到所述仿真核酸样本在所述成像空间的第一仿真图像;Mapping the nucleic acid distribution information to an imaging space based on the pixel size to obtain a first simulated image of the simulated nucleic acid sample in the imaging space; 基于所述光学传递函数对所述第一仿真图像进行成像性能仿真,得到第二仿真图像;Performing imaging performance simulation on the first simulation image based on the optical transfer function to obtain a second simulation image; 基于所述仿真噪声信息对所述第二仿真图像进行环境噪声仿真,得到所述仿真核酸图像。Environmental noise simulation is performed on the second simulated image based on the simulated noise information to obtain the simulated nucleic acid image. 根据权利要求2所述的方法,其特征在于,所述将所述核酸分子真值图像、与所述核酸分子真值图像对应的所述仿真核酸图像输入原始的所述图像处理模型,对所述图像处理模型进行迭代训练,包括:The method according to claim 2, characterized in that the step of inputting the true value image of the nucleic acid molecule and the simulated nucleic acid image corresponding to the true value image of the nucleic acid molecule into the original image processing model and iteratively training the image processing model comprises: 每一轮迭代训练中,将所述仿真核酸图像输入所述图像处理模型进行图像处理训练,得到本轮处理结果;In each round of iterative training, the simulated nucleic acid image is input into the image processing model for image processing training to obtain the processing result of this round; 将本轮处理结果与所述核酸分子真值图像进行比对,得到训练偏差数据;Comparing the processing result of this round with the true value image of the nucleic acid molecule to obtain training deviation data; 基于所述训练偏差数据,对所述图像处理模型的权重参数进行更新。Based on the training bias data, weight parameters of the image processing model are updated. 根据权利要求8所述的方法,其特征在于,所述当所述图像处理模型在迭代训练中符合第一预定条件,得到训练后的所述图像处理模型,包括:The method according to claim 8, characterized in that when the image processing model meets the first predetermined condition in iterative training, obtaining the trained image processing model comprises: 当所述训练偏差数据反映所述图像处理模型在迭代训练中收敛,确定所述图像处理模型在迭代训练中符合所述第一预定条件,得到训练后的所述图像处理模型。When the training deviation data reflects that the image processing model converges during iterative training, it is determined that the image processing model meets the first predetermined condition during the iterative training, and the trained image processing model is obtained. 一种核酸分子测序装置,其特征在于,包括:A nucleic acid molecule sequencing device, comprising: 图像获取模块,用于获取目标核酸样本的测序图像,所述测序图像为采用预设光学测序系统对所述目标核酸样本进行图像采集得到的图像;An image acquisition module, used to acquire a sequencing image of a target nucleic acid sample, wherein the sequencing image is an image acquired by acquiring an image of the target nucleic acid sample using a preset optical sequencing system; 图像处理模块,用于基于图像处理模型对所述测序图像进行图像处理,得到目标图像,所述图像处理模型为采用构建的核酸分子真值图像以及对应的仿真核酸图像训练得到的模型,所述仿真核酸图像为基于仿真光学系统对所述核酸分子真值图像对应的仿真核酸样本进行仿真测序得到的图像,所述仿真光学系统为对所述预设光学测序系统进行仿真得到的光学系统;An image processing module, used for performing image processing on the sequencing image based on an image processing model to obtain a target image, wherein the image processing model is a model trained by using a constructed true value image of nucleic acid molecules and a corresponding simulated nucleic acid image, wherein the simulated nucleic acid image is an image obtained by performing simulated sequencing on a simulated nucleic acid sample corresponding to the true value image of nucleic acid molecules based on a simulated optical system, and the simulated optical system is an optical system obtained by simulating the preset optical sequencing system; 序列测定模块,用于根据所述目标图像进行核酸分子测序,得到所述目标核酸样本对应的测序结果。The sequence determination module is used to perform nucleic acid molecule sequencing according to the target image to obtain a sequencing result corresponding to the target nucleic acid sample. 一种电子设备,包括存储器和处理器,所述存储器存储有计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至9中任意一项所述的核酸分子测序方法。An electronic device comprises a memory and a processor, wherein the memory stores a computer program, and wherein the processor implements the nucleic acid molecule sequencing method according to any one of claims 1 to 9 when executing the computer program. 一种计算机可读存储介质,所述存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至9中任意一项所述的核酸分子测序方法。A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the nucleic acid molecule sequencing method according to any one of claims 1 to 9. 一种计算机程序产品,该计算机程序产品包括计算机程序,所述计算机程序被计算机设备的处理器读取并执行,使得该计算机设备执行权利要求1至9中任意一项所述的核酸分子测序方法。A computer program product, comprising a computer program, wherein the computer program is read and executed by a processor of a computer device, so that the computer device executes the nucleic acid molecule sequencing method according to any one of claims 1 to 9.
PCT/CN2023/141583 2023-12-25 2023-12-25 Nucleic acid molecule sequencing method and related device Pending WO2025137825A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2023/141583 WO2025137825A1 (en) 2023-12-25 2023-12-25 Nucleic acid molecule sequencing method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2023/141583 WO2025137825A1 (en) 2023-12-25 2023-12-25 Nucleic acid molecule sequencing method and related device

Publications (1)

Publication Number Publication Date
WO2025137825A1 true WO2025137825A1 (en) 2025-07-03

Family

ID=96216278

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/141583 Pending WO2025137825A1 (en) 2023-12-25 2023-12-25 Nucleic acid molecule sequencing method and related device

Country Status (1)

Country Link
WO (1) WO2025137825A1 (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090076735A1 (en) * 2007-08-15 2009-03-19 Opgen, Inc. Method, system and software arrangement for comparative analysis and phylogeny with whole-genome optical maps
CN112313666A (en) * 2019-03-21 2021-02-02 因美纳有限公司 Training data generation for artificial intelligence based sequencing
CN112823352A (en) * 2019-08-16 2021-05-18 深圳市真迈生物科技有限公司 Base recognition method, system, computer program product and sequencing system
CN114921537A (en) * 2017-03-17 2022-08-19 雅普顿生物系统公司 Sequencing and high-resolution imaging
US20230092006A1 (en) * 2020-02-12 2023-03-23 Mgi Tech Co., Ltd. Optical imaging system and biochemical substance detection system using same
CN116994246A (en) * 2023-09-20 2023-11-03 深圳赛陆医疗科技有限公司 Base recognition method and device based on multitasking combination, gene sequencer and medium
CN117237198A (en) * 2023-11-10 2023-12-15 深圳赛陆医疗科技有限公司 Super-resolution sequencing method and device based on deep learning, sequencer and medium
CN117274739A (en) * 2023-09-20 2023-12-22 深圳赛陆医疗科技有限公司 Base identification method and training set construction method, gene sequencer and medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090076735A1 (en) * 2007-08-15 2009-03-19 Opgen, Inc. Method, system and software arrangement for comparative analysis and phylogeny with whole-genome optical maps
CN114921537A (en) * 2017-03-17 2022-08-19 雅普顿生物系统公司 Sequencing and high-resolution imaging
CN112313666A (en) * 2019-03-21 2021-02-02 因美纳有限公司 Training data generation for artificial intelligence based sequencing
CN112823352A (en) * 2019-08-16 2021-05-18 深圳市真迈生物科技有限公司 Base recognition method, system, computer program product and sequencing system
US20230092006A1 (en) * 2020-02-12 2023-03-23 Mgi Tech Co., Ltd. Optical imaging system and biochemical substance detection system using same
CN116994246A (en) * 2023-09-20 2023-11-03 深圳赛陆医疗科技有限公司 Base recognition method and device based on multitasking combination, gene sequencer and medium
CN117274739A (en) * 2023-09-20 2023-12-22 深圳赛陆医疗科技有限公司 Base identification method and training set construction method, gene sequencer and medium
CN117237198A (en) * 2023-11-10 2023-12-15 深圳赛陆医疗科技有限公司 Super-resolution sequencing method and device based on deep learning, sequencer and medium

Similar Documents

Publication Publication Date Title
Shang et al. Spatially aware dimension reduction for spatial transcriptomics
Eng et al. The use of VARI, GLI, and VIgreen formulas in detecting vegetation in aerial images
JP7712297B2 (en) Equalizer-Based Intensity Correction for Base Calling
Cruz et al. Multi-modality imagery database for plant phenotyping
CN112634987B (en) Method and device for detecting copy number variation of single-sample tumor DNA
US20120197533A1 (en) Identifying rearrangements in a sequenced genome
US20030219151A1 (en) Method and system for measuring a molecular array background signal from a continuous background region of specified size
CN105980578A (en) Basecaller for DNA sequencing using machine learning
CN117274739B (en) Base recognition method, training set construction method thereof, gene sequencer and medium
CN114283882B (en) Non-destructive poultry egg quality character prediction method and system
Ceccarelli et al. A deformable grid-matching approach for microarray images
CN118571324B (en) Data processing method and device, storage medium and electronic device
CN116189763A (en) A single-sample copy number variation detection method based on next-generation sequencing
CN117999359A (en) Method and device for base recognition of nucleic acid samples
JPWO2002001477A1 (en) Gene expression data processing method and processing program
WO2025137825A1 (en) Nucleic acid molecule sequencing method and related device
US7068828B2 (en) Biochip image analysis system and method thereof
Villoutreix et al. Synthesizing developmental trajectories
Anderegg et al. SYMPATHIQUE: image-based tracking of symptoms and monitoring of pathogenesis to decompose quantitative disease resistance in the field
CN114729397B (en) Random emulsified digital absolute quantitative analysis method and device
EP1089211B1 (en) Method and apparatus for displaying gene expression patterns
Brown et al. The topology of representational geometry
EP1134687A2 (en) Method for displaying results of hybridization experiments
CN119418365B (en) A method and system for monitoring body temperature in group-raised pigs
CN119068977B (en) A method and system for evaluating crop germplasm resources based on prior knowledge

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23962523

Country of ref document: EP

Kind code of ref document: A1