WO2024248014A1 - Method for sorting seeds - Google Patents
Method for sorting seeds Download PDFInfo
- Publication number
- WO2024248014A1 WO2024248014A1 PCT/JP2024/019621 JP2024019621W WO2024248014A1 WO 2024248014 A1 WO2024248014 A1 WO 2024248014A1 JP 2024019621 W JP2024019621 W JP 2024019621W WO 2024248014 A1 WO2024248014 A1 WO 2024248014A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- seeds
- quality
- rate
- discrimination
- seed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/17—Systems in which incident light is modified in accordance with the properties of the material investigated
- G01N21/25—Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
- G01N21/31—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
- G01N21/35—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
- G01N21/3563—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light for analysing solids; Preparation of samples therefor
Definitions
- the present invention relates to a method for selecting seeds, and more specifically, to a method for selecting plant seeds having desired characteristics by analyzing the spectral data of the seeds using multivariate statistical analysis.
- the quality rate of seeds that are not sorted after harvest is usually less than 90%.
- the occurrence of defective seeds is partly due to insufficient maturation of the seeds in the mother plant, but damage, deterioration, or contamination during the seed preparation process after harvest can also be a factor.
- the large-scale equipment used in such sorting includes spiral density sorters, trommel rotary sorters, vibration sorters, magnetic sorters, weight sorters, and color sorters.
- the final quality rate after sorting falls below the target, the seeds that have finally been prepared through sowing, cultivation, harvesting, and sorting will just become a pile of garbage. At present, there is no way to improve the quality rate even to the "last 1%" from this stage.
- the seed sorting device described above is based on the premise that the quality of a seed is reflected in its outer surface or characteristics detectable from the outside. Characteristics detectable from the outer surface or outside of a seed include shape, surface texture or stickiness, specific gravity, and color.
- Characteristics detectable from the outer surface or outside of a seed include shape, surface texture or stickiness, specific gravity, and color.
- the above-mentioned device is suitable for mass sorting and can prepare most of the harvested seeds to a high-quality state. In contrast, these devices have the disadvantage that it is difficult to fine-tune the operating conditions. Furthermore, when focusing on the biochemical properties of seeds, changes in the embryo and endosperm hidden from the outside can have a much greater impact on the function of the seed than changes in the exposed seed coat.
- NIR spectroscopy is an effective method for non-destructively evaluating differences and changes in the chemical composition of substances that are primarily composed of organic compounds.
- NIR spectroscopy has a high affinity with food, pharmaceuticals, and agricultural products, and its application to their quality control is expanding.
- Imaging spectroscopy which has been developed primarily in the field of remote sensing, is also attracting attention as a technology for the quality control of these products, which are characterized by their heterogeneity.
- Near-infrared imaging spectroscopy which combines these new technologies, is highly expected to provide the key to eliminating "seed loss.”
- Non-Patent Documents 1-4 there have been reports of the application of NIR imaging spectroscopy to evaluate seed quality in terms of survival, growth potential, genetic purity, and the presence or absence of insect or fungal damage.
- Non-Patent Documents 1-4 the fact that "seed loss” is still occurring clearly indicates that this technology has not yet reached a practical level, or that social implementation has not progressed sufficiently.
- the present invention is as follows.
- a method for selecting a plant seed having a desired trait comprising the following steps 1 to 7: (Step 1) Removing a portion of seeds from a seed population; (a) constructing a data set for the extracted seeds, the data set constructing step including: (a) irradiating the extracted seeds with light to obtain spectral data and/or derived spectral data; (b) subjecting the seeds from which the spectral data has been acquired to a quality test for a desired trait and determining a preliminary yield rate of the extracted seeds;
- (c) Correlating the spectral data of each seed with the results of quality testing; (Third step) applying multivariate discriminant analysis to the data set constructed in the second step to derive candidates for a good/bad discrimination model for calculating a
- the trait is at least one trait selected from the group consisting of germination ability, germination vigor, genotype, stress resistance, dormancy, disease resistance, insect resistance, QTL characteristics, eating quality, heading time, and morphological characteristics.
- [4] The method for selecting a plant according to any one of [1] to [3], wherein the prediction of two or more desired traits is carried out by any one of the following methods (i) to (iii): (i) A method of scoring seeds individually using a discrimination model for each trait, focusing on the quality of each trait, and predicting that seeds with all scores equal to or above a threshold are good seeds; (ii) A method of selecting a single discrimination model by determining that a seed is good if all of the desired traits are good, and determining that the other traits are bad, regardless of whether each trait is good or bad, scoring the seeds using the single discrimination model, and predicting that seeds with a score equal to or higher than a threshold are good seeds; (iii) A method in which a discrimination model for each trait is selected by focusing on the quality of each trait, the scores obtained using the discrimination model for each trait are integrated to obtain an integrated score, and the quality of each seed is predicted based on the ranking of the integrated score.
- 1 is a diagram showing the appearance of vegetable seeds classified by quality category. The numbers below the boxes represent the scores of each seed calculated using the discrimination model in Table 2.
- 1 is a diagram showing near-infrared reflectance spectra of vegetable seeds classified by quality category, where A to I show the average reflectance spectra for each category, and J shows the reflectance spectrum of an individual seed.
- FIG. 11 is a flowchart showing a procedure for deriving a pass/fail discrimination model.
- FIG. 13 is a schematic diagram showing a procedure for deriving a pass/fail discrimination model. This figure shows the relationship between the distribution range of the discrimination score and the discrimination accuracy.
- A shows a hypothetical situation in which a high-precision discrimination model is used, and B shows a low-precision discrimination model.
- the preliminary pass rate is set to 80%.
- Black and gray triangles indicate the preliminary pass rate and the standard threshold value (see Table 3, footnote 2), respectively.
- (c) Distribution range of the discrimination score for good and bad seeds box plot and scatter plot).
- Figure 1 shows the internal validity of the good/bad discrimination model.
- F shows the results and accuracy of discrimination for germination traits only
- I shows the results and accuracy of discrimination for hybrid traits only.
- J shows the results and accuracy of discrimination for good seeds (F1 hybrids with normal germination ability) or not.
- K shows the results and accuracy of good/bad discrimination by integrated score, reflecting predictions of both traits based on the two discrimination models used in F and I.
- FIG. 8 shows the external validity of the pass/fail discrimination model.
- A-F correspond to Figures 8A-F
- G-I correspond to Figures 8I-K.
- Symbols, abbreviations, and the outline of graphs (a)-(e) are the same as those in Figure 5.
- Figure 1 shows the PR/rPR curves for distinguishing between good and bad cauliflower seeds.
- PR Precision-Recall
- PR PR (Precision-Recall)
- This figure shows the rPR curve for each combination of the pass/fail discrimination model and the dataset.
- the relationship between recall and relative precision is referred to as the "rPR curve,” and the area under the curve is referred to as the "rPR-AUC.”
- Figure 13 13
- Pictorial representation of pass/fail discrimination scores A. Pumpkin variety 5; B. Pea variety 19.
- Figure 1 shows the germination and early growth characteristics of seeds classified by the quality discrimination score.
- a and B pea cultivar 19
- C lettuce cultivar 803, D: leek cultivar 22, E: cauliflower cultivar 47.
- Image B is an image in which only the pixels corresponding to the green leaves of A have been extracted.
- the rankings shown in the upper rows of A through D and E are the predicted rankings based on the discrimination model for germination traits, and the rankings shown in the lower row of E are the predicted rankings based on the discrimination model for mating traits.
- the present invention provides a method for selecting a plant seed having a desired trait, the method comprising the following steps 1 to 7 (hereinafter, sometimes referred to as the "method of the present invention”): (Step 1) Removing a portion of seeds from a seed population; (a) constructing a data set for the extracted seeds, the data set constructing step including: (a) irradiating the extracted seeds with light to obtain spectral data and/or derived spectral data; (b) subjecting the seeds from which the spectral data has been acquired to a quality test for a desired trait and determining a preliminary yield rate of the extracted seeds;
- Step 3 applying multivariate discriminant analysis to the data set constructed in
- the term "plant seed” refers to the seeds of any plant, and is not particularly limited.
- the plant is not particularly limited as long as it is a seed plant, and may be, for example, either angiosperms or gymnosperms.
- the plant when the plant is angiosperms, the plant may be either dicotyledonous or monocotyledonous.
- the plant when the plant is dicotyledonous, the plant may be either sympetalous or polypetalous.
- the method of the present invention can be used to select seeds of plants with high added value. Examples of such plants with high added value include, but are not limited to, horticultural crops and plants that can be used as building timber.
- Horticultural crops include vegetables, fruit trees, and ornamental plants.
- vegetables include, but are not limited to, pumpkin, peas, lettuce, green onions, tomatoes, cauliflower, bitter melon, okra, onions, Japanese ginger, soybeans, butterbur, asparagus, Chinese chives, broad beans, celery, carrots, mizuna, komatsuna, chrysanthemum, radish, broccoli, spinach, Chinese cabbage, arugula, lotus root, turnip, avocado, cucumber, paprika, garlic, corn, zucchini, parsley, cilantro, eggplant, and green peppers.
- fruit trees include, but are not limited to, plum, fig, akebia, acerola, avocado, olive, orange, persimmon, quince, guava, cranberry, walnut, grapefruit, cherry, and pomegranate.
- examples of flowers include, but are not limited to, morning glory, cockscomb, cosmos, zinnia, columbine, globe amaranth, petunia, periwinkle, cabbage, sunflower, impatiens, portulaca, portulaca, balloon vine, marigold, gypsophila, snapdragon, calendula, sweet pea, stock, dianthus, daisy, nigella, nemesia, nemophila, poppy, verbena, pansy, viola, corn poppy, cornflower, lupine, and forget-me-not.
- examples of plants that can be used as building timber include, but are not limited to, cedar, cypress, Japanese cypress, chestnut, zelkova, cherry, beech, walnut, falcata, and red pine.
- the term “desired trait” encompasses both traits that the seed itself possesses and traits that are expressed in a plant that germinates from the seed when the seed is grown.
- traits that are the subject of selection in the method of the present invention include, but are not limited to, germination ability, germination vigor, genotype, stress resistance, dormancy, disease resistance, insect resistance, QTL characteristics, taste, heading time, and morphological characteristics such as leaf size.
- the first step of the present invention is characterized in that some seeds are taken out from the seed population.
- the number of seeds constituting the seed population may be 2 or more, with no particular upper limit.
- the seed population may be, but is not limited to, a seed population consisting of usually 2 to 10,000,000 seeds, preferably 1,000 to 10,000,000 seeds, and more preferably 10,000 to 10,000,000 seeds.
- the seed population may be expressed by weight.
- the seed population may be, but is not limited to, a seed population weighing 1 g to 10,000 kg, preferably 1 kg to 10,000 kg, and more preferably 10 kg to 10,000 kg.
- the "portion" of the seed population may vary depending on the “traits" to be selected, but is usually, but not limited to, about 50 to 50,000 seeds, preferably 100 to 30,000 seeds, and more preferably 200 to 3,000 seeds.
- the second step of the present invention is to construct a data set for the extracted seeds.
- the construction of the data set includes at least the following steps (a) to (c):
- the "preliminary conforming rate" may be referred to as the "initial conforming rate.”
- the method of irradiating light onto the seeds to obtain spectral data (reflection, absorption, and/or transmission spectral data) and/or derived spectral data thereof may be a method generally used in the technical field of optical analysis.
- the light irradiated onto the seeds is not particularly limited as long as it can obtain spectral data.
- the light irradiated onto the seeds may be, for example, microwaves, terahertz waves, infrared light, visible light, ultraviolet light, X-rays, and gamma rays, but is not limited to these.
- the light may be visible light or infrared light.
- the method of obtaining the spectral data may be, for example, the method and conditions used in the examples of this application, as well as the method taught in Patent No. 6782408, but is not limited to these.
- the method of generating derived spectral data from the obtained spectral data may be a method known per se in the technical field of optical analysis.
- the derived spectral data can be generated by performing reciprocal (1/R) transformation, logarithmic transformation, standard normalization, smoothing and smoothing differentiation using an SG (Savizky-Golay) filter, or any combination of these on the obtained spectral data (R).
- SG Sevizky-Golay
- the spectral data used may be one of the spectral data and the derived spectral data generated therefrom that is capable of deriving the most accurate prediction model.
- the equipment used to acquire the spectral data may be any known equipment.
- the spectral characteristics of biological tissues can be measured by point measurement using a fiber optic spectrometer, or by using a hyperspectral camera, which is a type of remote sensor, to measure (measure) them together with coordinate information (images) (surface measurement).
- an equipment that exhibits high detection sensitivity in the target wavelength range may be appropriately selected.
- equipment equipped with a photodetector such as a CCD, CMOS, CQD, InGaAs, HgCdTe (MCT), or Type II superlattice (T2SL) may be used, but is not limited to these.
- a reflectance correction image may be generated and a seed recognition model may be applied, if necessary (see ST1-3 in FIG. 3).
- the quality inspection in (b) of constructing the data set is also not particularly limited as long as the preliminary pass rate for the desired trait can be determined. For example, if the desired trait is "germination ability," this can be easily confirmed by subjecting the seeds from which the spectral data has been acquired to a germination test that complies with the International Seed Inspection Standards established by ISTA or a similar germination test.
- the [preliminary pass rate (%) (i.e., germination rate)] can be determined as 100B/A (%) from the definition formula [number of germinated seeds (B pieces) among the seeds subjected to quality evaluation] / [number of seeds subjected to quality evaluation (A pieces)] ⁇ 100.
- the spectral data (reflection, absorption, or transmission spectral data or derivative spectral data) of each seed obtained by irradiating light on seeds A, B, and C are SA, SB, and SC, respectively.
- the spectral data is intended to be matched with the quality results, such as SA being normal germination, SB being abnormal germination, and SC being non-germination.
- dummy variables may be assigned in order to make it easier to determine whether the result is good or bad.
- variables such as 1 (good) can be assigned to normal germination and -1 (bad) can be assigned to abnormal germination and non-germination, but this is not limiting.
- the third step of the present invention is a step of deriving candidates for a good/bad discrimination model for calculating a discrimination score for a desired trait in each seed by applying multivariate discriminant analysis to the data set constructed in the second step.
- the multivariate discriminant analysis in this step may be performed using a method known per se, and any method may be used as long as the desired effect of the present invention can be obtained.
- the multivariate statistical analysis technique may be, but is not limited to, partial least squares (PLS) or principal component analysis (PCA).
- the multivariate discriminant analysis may preferably be partial least squares-Discriminant analysis (PLS-DA).
- PLS-DA involves modeling an equation such as the following Equation 1 using the explanatory variables and target variables that have been set.
- x 1 , x 2 , x 3 and x 4 are explanatory variables
- y is a response variable
- a is an intercept (constant)
- b 1 , b 2 , b 3 and b 4 are partial regression coefficients (constants).
- the spectral data acquired in the second step is used as explanatory variables, and the results obtained by the quality inspection are used as objective variables, and a candidate quality discrimination model for calculating a discrimination score for the desired traits in each seed is derived.
- the method for deriving the candidate quality discrimination model can also be a method known per se, but the method used in the examples of the present invention is described below as an example.
- derived spectral data Xs is generated using the above-mentioned conversion method and any combination thereof.
- discriminant analysis an attempt is made to use all of the derived spectra as explanatory variables, but if the predictive performance of the derived discriminant model is equivalent, the simplest derived pattern can be adopted.
- a reflection, absorption, or transmission spectrum is acquired, and a derived spectrum is generated to prepare explanatory variable data. If necessary, standardization is performed for either or both of the objective variable and the explanatory variable.
- a discriminant model is derived by multivariate discriminant analysis. For example, a discriminant model can be derived by linear sparse modeling as follows. For all combinations of objective variable and explanatory variable data, first, an initial value of the weight of each explanatory variable is determined by PLS regression or Ridge regression.
- a solution path that represents the relationship between the regularization coefficient ⁇ and the partial regression coefficient, and the number of explanatory variables whose partial regression coefficients are not 0, is calculated by Adaptive LASSO regression. If the number of explanatory variables that minimize the residual sum of squares in the regression is p, candidates for combinations of explanatory variables suitable for reducing the number of explanatory variables to p or less can be determined from the solution path.
- the final discriminant model is derived by PLS-DA.
- a discriminant model is derived using 2 to p explanatory variables determined from the solution path in Adaptive LASSO regression.
- the derived discriminant model is set as a candidate for the good/bad discrimination model. Note that at this stage, many candidate discriminant models are presented. The discriminant model that gives a higher relative score to good seeds is selected in the next step 4.
- the fourth step of the present invention is a step of selecting a discrimination model from the candidates of the pass/fail discrimination model derived in the third step.
- Sorting conditions Sorting conditions that make the [preliminary quality rate] and [recovery rate] equal.
- a discrimination model that has a high rank correlation between increases and decreases in the recovery rate and increases and decreases in the precision rate (i.e., a relationship in which a decrease in the recovery rate reliably increases the precision rate, and an increase in the recovery rate reliably decreases the precision rate).
- the fifth step of the present invention is a step of irradiating the remaining seeds not extracted from the seed population in the first step and/or other seeds obtained under substantially the same conditions as the seeds to obtain spectral data, and determining the discrimination score of each seed by applying the discrimination model selected in the fourth step to the spectral data.
- the fifth step is a step of obtaining spectral data for non-training seeds in the seed population, and determining the discrimination score of each seed by applying the discrimination model obtained in the fourth step to the spectral data.
- the method and conditions for obtaining the spectral data may be the same as those used for the training seeds, and may be partially different as long as the desired effect is obtained.
- the sixth step of the present invention is a step of determining a threshold value of the discrimination score based on the discrimination score determined in the fifth step. If the pre-quality rate in the seed population is "k%," the score that is "k%" from the top can be set as a threshold value (standard threshold value) for standard selection. In addition, if a discrimination model having a high rank correlation between the increase/decrease in the recovery rate and the increase/decrease in the post-quality rate is selected in the fourth step described above, the matching rate can be adjusted by appropriately increasing/decreasing the threshold value based on the standard threshold value.
- the seventh step of the present invention is a step of recovering seeds predicted to have the desired trait by comparing the discrimination scores of each seed determined in the fifth step with the discrimination score threshold determined in the sixth step.
- the recovery of the seeds may be achieved by selecting and removing the seeds predicted to have the desired trait, or by selecting and removing the seeds predicted not to have the desired trait, and recovering the seeds not removed as seeds having the desired trait.
- the method of the present invention can be a method for selecting plant seeds having two or more desired traits.
- examples of two or more traits include, but are not limited to, a combination of two or more of the traits listed above (i.e., germination ability, germination vigor, genotype, stress resistance, dormancy, disease resistance, insect resistance, QTL characteristics, eating quality, heading time, and morphological characteristics such as leaf size, etc.).
- the method of the present invention is a method for selecting two or more desired traits
- the prediction of the two or more desired traits is performed by any one of the following methods (i) to (iii).
- the method of the present invention may further include a step of evaluating the accuracy of the discrimination model.
- the accuracy evaluation is an evaluation of the "generalized performance" of the selected discrimination model.
- the accuracy evaluation of the pass/fail discrimination model can be performed by the following method.
- the discrimination model obtained by carrying out the above-mentioned steps will undergo internal validation when applied to the internal data used in its derivation (i.e., the seed population from which the dataset was constructed) and external validation when applied to external data obtained independently of the derivation of the model.
- ROC Receiveiver Operating Characteristic
- PR Precision-Recall
- a discrimination model is considered to be superior when its AUC ( area under the curve ) is closer to the maximum value of 1.
- the minimum value of ROC-AUC is always 0.5
- the minimum value of PR-AUC is the initial precision rate, i.e., the prior quality rate, and this varies depending on the data set to which it is applied.
- PR-AUC is suitable for verifying which of a number of discriminant models shows superior performance for a specific data set, but is not suitable for verifying whether a single discriminant model shows equivalent performance for different data sets. This is because it is affected by the prior pass rate.
- rPR relative Precision-Recall
- rPR-AUC the area under the curve
- the relationship between precision and relative precision and recall forms a hypersurface occupying an n+1-dimensional space.
- PR curves and rPR curves the relationships between precision and relative precision and recall are referred to as PR curves and rPR curves, and the ratios below the hypersurface they form are referred to as PR-AUC and rPR-AUC, regardless of the number of discrimination models used.
- PR and rPR curves are created when each is applied to the data set derived from the target crop species.
- PR-AUC and rPR-AUC are calculated from each curve.
- the accuracy of the discrimination model is evaluated from the shape of the created curve and the AUC value.
- Vegetable seeds (pumpkin, pea, lettuce, green onion, tomato, and cauliflower) harvested from 2015 to 2022 provided by Tokita Seed Co., Ltd. were used as test materials (Table 1).
- the seeds were irradiated with near-infrared light by applying a DC voltage to two 24V 250W halogen lamps with aluminum mirrors (JTR24V250W10H/5-AL, GX5.3 base, manufactured by Kahoku Lighting Solutions), and near-infrared hyperspectral images were taken using a line-scan hyperspectral camera (CV-N801HS, manufactured by Sumitomo Electric Industries, Ltd.).
- the near-infrared lens was an image-side telecentric lens (manufactured by Sumitomo Electric Industries, Ltd.) with a focal length of 30 mm.
- the working distance during imaging was 28 cm, the spatial resolution was 90 ppi, and the wavelength sampling interval was 6 nm.
- the wavelength sensitivity range of the camera used was 980 to 2,350 nm.
- the reflectance at each wavelength was calibrated to the equivalent of diffuse reflectance based on the images of a standard reflector with a reflectance of 99% and dark current (reflectance of 0%) that constitute a contrast target (SRT-MS-050, Labsphere, USA).
- the reflectance spectrum of the seeds was the average of the reflectance spectra recorded for each pixel in the area occupied by each seed.
- the appearance of the seeds was photographed using an 8k color line scan camera (e2v EV71C4CCL8005-BA0) under illumination by a white LED light source (Leimac IDBA-HMS150WHV-S).
- the lens was an object-side telecentric lens (Optoart FT04-150CL) with an optical magnification of 0.4x, and the spatial resolution was 2,032 ppi.
- a standard reflector with a reflectance of 25% (when the seed surface was dark) or 50% (when the seed surface was light) that constitutes the contrast target (SRT-MS-050 mentioned above) was used to adjust the white balance of the camera.
- Cauliflower variety 47 (variety number omitted below) is used as an F1 hybrid, but due to incomplete self-incompatibility, self-fertilized seeds may be formed.
- Dummy variables were assigned to both good and bad seeds, and these were used as the objective variables in the discriminant analysis described below.
- the value of the dummy variable was set to 1 (good) for normally germinated seeds and -1 (bad) for others.
- F1 hybrids were assigned a value of 1 and others -1.
- F1 hybrids with normal germination ability were assigned a value of 1 and others -1.
- a model for discriminating between good and bad seeds was derived using a multivariate linear sparse modeling technique that combines Adaptive LASSO (Least Absolute Shrinkage and Selection Operator) and PLS-DA (Partial Least Squares-Discriminant Analysis). At this stage, many candidate models are presented, but it is necessary to select the one that gives a higher relative score to good seeds. The model selection method is specifically shown below.
- Adaptive LASSO Least Absolute Shrinkage and Selection Operator
- PLS-DA Partial Least Squares-Discriminant Analysis
- the sorting conditions that make the preliminary non-defective rate and the recovery rate equal are defined as the "standard sorting conditions," and the lower threshold of the discrimination score of the seeds to be recovered at that time is defined as the "standard threshold.”
- Method 1 The quality of germination and mating characteristics are scored individually using two models (m-Ca47g and m-Ca47x), and seeds with the highest scores are predicted to be good seeds.
- Method 2) The quality of individual characteristics is ignored, and only good seeds are predicted (m-Ca47).
- M1 through Mk be models that discriminate between good and bad traits 1 through k
- y1(X) through yk(X) be the discriminant scores obtained by substituting explanatory variable X into these models.
- n (several hundred) of seeds that make up a particular lot are measured, and scores y1(x) to yk(x) are obtained from the explanatory variables x (x1 to xn) that correspond to each. If the pre-qualification rates for each trait are assumed to be p1 to pk, then the n ⁇ p (p1 to pk)-th score from the top of y (y1 to yk) is defined as yp (yp1 to ypk). In addition, the standard deviation of y (y1 to yk) is defined as s (s1 to sk).
- the minimum value of score z (z1 to zk) is the combined score for discrimination using M1 to Mk.
- Figures 8 and 9 respectively correspond to the results of internal validity verification when each discriminant model is applied to the internal data used in its derivation, and external validity verification when it is applied to external data obtained independently of the model derivation.
- Figure 9 shows the evaluation results of the generalization performance of the model.
- ROC (Receiver Operating Characteristic) curves and PR (Precision-Recall) curves are often used to evaluate the overall accuracy of a pass/fail discrimination model that is independent of the set value of the lower score threshold.
- PR curve which is sensitive to false positives, is often used.
- PR-AUC is suitable for verifying which of a number of discriminant models shows superior performance for a specific data set, but is not suitable for verifying whether a single discriminant model shows equivalent performance for different data sets. This is because it is affected by the prior pass rate.
- the relationship between relative precision and recall is called the rPR (relative Precision-Recall) curve, and the area under the curve is called the rPR-AUC.
- the rPR-AUC does not depend on the dataset to which it is applied, and always takes a value in the closed interval [0, 1].
- the relationship between the precision rate or relative precision rate and recall rate also forms an n+1-dimensional hypersurface shape.
- the relationships between the precision rate and relative precision rate and recall rate are referred to as the PR curve and rPR curve, and the ratios below the hypersurface they form are referred to as the PR-AUC and rPR-AUC, regardless of the number of discrimination models used.
- the distribution range of the discrimination scores was not the same on different test occasions, even when the same variety and lot of seeds were used (Fig. 8, Fig. 9 (a) to (c)); different lots of tomato variety 94 were used in Fig. 8 and Fig. 9).
- the derived discrimination model was capable of repeated application, at least to the same variety, but it was desirable to optimize the lower threshold score for selecting good seeds for each occasion.
- deriving a discrimination model does not require a great deal of time or effort. It is possible to continue updating a model applied to a specific crop variety to improve its accuracy, with the intention of reusing it on another occasion; however, the reasons for the occurrence of defective seeds often differ from occasion to occasion. It is important not only to enrich the library of discrimination models compatible with various crop varieties, but also to establish a seed quality control system that can derive the optimal good/bad discrimination model for each occasion when a problem occurs and respond immediately to resolve it.
- Near-infrared hyperspectral images can be acquired all at once, and the data obtained from them can be used to distinguish multiple traits, such as seed germination characteristics and genotype.
- the images contain information about grain size and are also effective in estimating grain weight.
- the present invention provides a means of performing advanced seed sorting at low cost and in a small space, without introducing and installing multiple sorting machines with different functions.
- the present invention it is possible to select plant seeds having desired traits from a population of plant seeds having and not having the desired traits non-destructively, highly efficiently, and with high accuracy. Therefore, the present invention is extremely useful, for example, in the seed and seedling industry.
Landscapes
- Physics & Mathematics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Pretreatment Of Seeds And Plants (AREA)
Abstract
Description
本発明は、種子の選別方法に関し、詳細には、種子のスペクトルデータを多変量統計分析で解析することによる、所望の形質を有する植物種子の選別方法に関する。 The present invention relates to a method for selecting seeds, and more specifically, to a method for selecting plant seeds having desired characteristics by analyzing the spectral data of the seeds using multivariate statistical analysis.
世界人口は増加を続け、2022年には80億人、2030年には85億人、2050年には97億人に達すると予測されている。従って、人類社会を存続させていくためには、食料の安定的生産および供給の重要性がますます高まっている。かかる問題を解決するには、多収品種を開発し、世界中に普及させることが有効な手段の一つと考えられている。しかし、どのような品種で作物を生産するにしても、良質な苗がなければその品種の優れた形質は発揮され得ない。古来日本には「苗半作(なえはんさく)」との言葉があり、この言葉は「成長の半分は苗で決まる」との意味を有する。さらに踏み込めば、良質な種子なしには、良質な苗を作ることはできない。したがって、高品質の種子をどのように生産し調製するか、その品質を劣化させずにどのように保存するか、そしてその品質の現状をどのように評価するかは、人類が協力して取り組むべき緊急の技術的課題である。 The world population continues to grow, and is expected to reach 8 billion in 2022, 8.5 billion in 2030, and 9.7 billion in 2050. Therefore, the importance of stable food production and supply is increasing in order to ensure the survival of human society. One effective way to solve this problem is to develop high-yielding varieties and spread them around the world. However, no matter what variety is used to produce crops, the excellent characteristics of that variety cannot be demonstrated without good-quality seedlings. In ancient Japan, there was a phrase "naehansaku," which means "half of the growth is determined by the seedlings." Going further, good-quality seedlings cannot be produced without good-quality seeds. Therefore, how to produce and prepare high-quality seeds, how to preserve them without deteriorating their quality, and how to evaluate the current state of their quality are urgent technical issues that humanity must work together to address.
種子の品質管理の重要性は、国内外の公的機関によって強く認識されている。国際種子検査協会(ISTA)は、主要な作物種のほぼすべての種子についての品質検査方法に関するガイドラインを提供している。また、日本政府は、市場に流通する農作物生産用の種子の品質基準を定めた法律「種苗法」を施行している。このような公的監視が行われる理由の1つは、種子の品質(とりわけ発芽性や遺伝的純度)は、通常、種子の外観やその他の簡易な基準によっては識別することが非常に難しいことにある。低品質の種子が誤って拡散してしまうと、地域的な食糧不足につながるのみならず、農家の貧困にもつながる可能性がある。 The importance of seed quality control is strongly recognized by public institutions both in Japan and abroad. The International Seed Testing Association (ISTA) provides guidelines on quality testing methods for almost all seeds of major crop species. The Japanese government also enforces the Seed and Seedlings Act, which sets quality standards for seeds for agricultural production distributed on the market. One of the reasons for such public monitoring is that seed quality (especially germination and genetic purity) is usually very difficult to identify by the appearance of the seed or other simple criteria. If low-quality seeds are accidentally spread, it may not only lead to local food shortages but also to poverty for farmers.
公的な品質管理は確かに種子がもたらす作物の不作を防ぐのに役立つものであるが、何事にも必ず2つの側面が存在する。例えば、公的な品質管理の観点から種子が満たさなければならない良品率が90%と設定されている場合、当該90%から僅か1%低い良品率(即ち、89%の良品率)を有する種子ロットは、その商品価値を失ってしまう。なお、実状としては、種子供給業者らによる熾烈な品質競争により、そのような法的基準でさえも無関係或いは競争力のないものとなっている。その結果として、作物生産に貢献する可能性があるにもかかわらず、商業的価値がない種子が大量に生み出され続けており、この状況は無慈悲な「シードロス」と称し得る。 Although official quality control is certainly useful in preventing seed-related crop failures, there are always two sides to everything. For example, if the quality rate that seeds must meet is set at 90% from the perspective of official quality control, a seed lot with a quality rate just 1% lower than 90% (i.e., an 89% quality rate) will lose its commercial value. However, in reality, due to the fierce quality competition among seed suppliers, even such legal standards have become irrelevant or uncompetitive. As a result, a large amount of seeds that have no commercial value continue to be produced despite their potential contribution to crop production, and this situation can be called merciless "seed loss."
主要な作物種であっても、種子の良品率は、収穫後の選別されていない状態では通常90%には及ばない。不良種子の発生は母植物での種子の成熟不全が一因であるが、収穫後の種子の調製プロセス中の損傷、劣化、または汚染などもまたその一因となり得る。多段階の選別を経て初めて法定基準内の高い良品率を達成することができるが、かかる選別において使用される大規模な装置としては、スパイラル比重選別機、トロンメル回転式選別機、振動選別機、磁力選別機、重量選別機、色彩選別機などが挙げられる。しかし、選別後の最終的な良品率が目標を下回ってしまった場合、播種、栽培、収穫および選別を経てようやく調製された種子はただのゴミの山と化す。現状、この段階から「最後の1パーセント」であっても良品率を改善させる方法は存在しない。 Even for major crop species, the quality rate of seeds that are not sorted after harvest is usually less than 90%. The occurrence of defective seeds is partly due to insufficient maturation of the seeds in the mother plant, but damage, deterioration, or contamination during the seed preparation process after harvest can also be a factor. Only after multiple stages of sorting can a high quality rate within the legal standard be achieved, and the large-scale equipment used in such sorting includes spiral density sorters, trommel rotary sorters, vibration sorters, magnetic sorters, weight sorters, and color sorters. However, if the final quality rate after sorting falls below the target, the seeds that have finally been prepared through sowing, cultivation, harvesting, and sorting will just become a pile of garbage. At present, there is no way to improve the quality rate even to the "last 1%" from this stage.
上述される種子の選別装置は、種子の品質が、その外表面または外部から検出可能な特徴に反映されるとの前提に基づくものである。種子の外表面または外部から検出可能な特徴としては、形状、表面の質感や粘着性、比重および色などが含まれる。上述の装置は大量選別に適しており、収穫された種子のほとんどを高品質の状態に調製し得る。対照的に、これらの装置は動作条件の微調整が難しいとの欠点もある。さらに、種子の生化学的特性に着目すると、外部から隠れた胚や胚乳の変化は、露出した種皮における変化よりも、種子の機能に遥かに大きな影響を及ぼし得る。たとえば、胚乳脂質の酸化や特定の害虫による胚の摂食は、種子の外観や重量に直ちに変化を引き起こすわけではないが、発芽能を大きく毀損させる。「シードロス」は、このような不良種子によってもたらされるものと推定される。 The seed sorting device described above is based on the premise that the quality of a seed is reflected in its outer surface or characteristics detectable from the outside. Characteristics detectable from the outer surface or outside of a seed include shape, surface texture or stickiness, specific gravity, and color. The above-mentioned device is suitable for mass sorting and can prepare most of the harvested seeds to a high-quality state. In contrast, these devices have the disadvantage that it is difficult to fine-tune the operating conditions. Furthermore, when focusing on the biochemical properties of seeds, changes in the embryo and endosperm hidden from the outside can have a much greater impact on the function of the seed than changes in the exposed seed coat. For example, oxidation of endosperm lipids and eating of the embryo by certain pests do not immediately cause changes in the appearance or weight of the seed, but they greatly impair the germination ability. It is presumed that "seed loss" is caused by such inferior seeds.
近赤外(NIR)分光法は、主に有機化合物で構成される物質の化学組成の違いや変化を非破壊的に評価する有効な手法である。換言すれば、NIR分光法は、食品、医薬品、農産物との親和性が高く、それらの品質管理への応用が広がりつつある。主にリモートセンシングの分野において発展してきたイメージング分光法も、不等質であることを特徴とするこれらの商品に関する品質管理において注目を集める技術である。これらの新技術が融合した近赤外イメージング分光法は「シードロス」をなくす鍵を提供すると大いに期待される。 Near-infrared (NIR) spectroscopy is an effective method for non-destructively evaluating differences and changes in the chemical composition of substances that are primarily composed of organic compounds. In other words, NIR spectroscopy has a high affinity with food, pharmaceuticals, and agricultural products, and its application to their quality control is expanding. Imaging spectroscopy, which has been developed primarily in the field of remote sensing, is also attracting attention as a technology for the quality control of these products, which are characterized by their heterogeneity. Near-infrared imaging spectroscopy, which combines these new technologies, is highly expected to provide the key to eliminating "seed loss."
実際、生存、成長力、遺伝的純度、虫害や菌害の有無などの観点から種子の品質を評価する際に、NIRイメージング分光法を応用した事例も報告されている(非特許文献1~4)。しかし、未だに「シードロス」が続いているということは、この技術がまだ実用レベルに達していない、あるいは社会実装が十分に進んでいないことを如実に物語っているように思われる。 In fact, there have been reports of the application of NIR imaging spectroscopy to evaluate seed quality in terms of survival, growth potential, genetic purity, and the presence or absence of insect or fungal damage (Non-Patent Documents 1-4). However, the fact that "seed loss" is still occurring clearly indicates that this technology has not yet reached a practical level, or that social implementation has not progressed sufficiently.
上述の先行技術文献において報告されている手法は、いずれも一定の条件下において非破壊的な種子の選別を可能とするものであり有用性を有するものである。しかしながら、かかる先行技術はいずれも特定の植物種子の、特定の形質への適用に制限されていることから、より多くの品種および形質に対応できる汎用性の高い新たな種子選別手法の構築には依然として強いニーズがある。 All of the methods reported in the above-mentioned prior art documents enable non-destructive seed selection under certain conditions and are useful. However, all of these prior art techniques are limited to application to specific plant seeds with specific traits, and there remains a strong need to develop new, versatile seed selection methods that can be used for a greater number of varieties and traits.
本発明者は、上記課題に対して鋭意検討した結果、光学的分析手法等を用いて構築した各植物種子に対するデータセットを多変量判別分析に適用して種子の判別モデル候補を多数生成させ、当該モデル候補から予測精度の高い判別モデルを選択する際に、「標準選別条件」との概念を導入することで予測精度の高いモデルの選択が可能となること等を見出し、かかる知見に基づいてさらに研究を進めることによって本発明を完成するに至った。
すなわち、本発明は以下の通りである。
As a result of intensive research into the above-mentioned problems, the inventors have found that by applying a data set for each plant seed constructed using optical analysis techniques, etc. to multivariate discriminant analysis to generate a large number of candidate discrimination models for the seeds, and when selecting a discrimination model with high predictive accuracy from the candidate models, it is possible to select a model with high predictive accuracy by introducing the concept of "standard selection conditions". They have further researched based on this knowledge, and have thus completed the present invention.
That is, the present invention is as follows.
[1]
所望の形質を有する植物種子の選別方法であって、以下の第1工程~第7工程を含む、方法:
(第1工程)種子集団から一部の種子を取り出す工程、
(第2工程)取り出された種子に対するデータセットを構築する工程であって、該データセットを構築する工程は以下(a)~(c)を含み:
(a)取り出された種子に対して光を照射してスペクトルデータおよび/またはその派生スペクトルデータを取得すること、
(b)スペクトルデータを取得した種子を所望の形質に対する品質検査に供し、該取り出された種子の事前良品率を決定すること、
ここで、事前良品率は次式で定義される:
[事前良品率(%)]=[品質評価に供された種子のうち、所望の形質を有していた種子数]/[品質評価に供された種子数]×100
(c)各種子のスペクトルデータと品質検査の結果とを対応させること、
(第3工程)第2工程で構築されたデータセットに対して多変量判別分析を適用することにより、各種子における所望の形質に対する判別スコアを算出するための良否判別モデルの候補を導出する工程、
(第4工程)回収率および適合率を次式で定義し:
[回収率(%)]=[品質評価に供された種子のうち、判別スコアに基づいて選択される種子数]/[品質評価に供された種子数]×100
[適合率(%)]=[品質評価に供された種子であって、判別スコアに基づいて選択される種子のうち、所望の形質を有している種子数]/[品質評価に供された種子のうち、判別スコアに基づいて選択される種子数]×100
かつ、事前良品率と回収率を同値とする選別条件を「標準選別条件」と定義するとき、
該標準選別条件下で、適合率が事前良品率を上回るとの基準を満たす候補を良否判別モデルとして選択することを含む、第3工程で導出された良否判別モデルの候補から判別モデルを選択する工程、
(第5工程)第1工程で種子集団から取り出されなかった残りの種子、および/または該種子と実質的に同一の条件下で得られた別の種子に対して光を照射してスペクトルデータを取得し、該スペクトルデータに第4工程で選択された判別モデルを適用することで各種子の判別スコアを決定する工程、
(第6工程)第5工程で決定された判別スコアに基づいて、判別スコアの閾値を決定する工程、
(第7工程)第5工程で決定された各種子の判別スコアと第6工程で決定された判別スコアの閾値とを比較することで、所望の形質を有すると予測される種子を回収する工程。
[2]
2以上の所望の形質を有する植物種子の選別方法である、[1]記載の選別方法。
[3]
形質が、発芽能、発芽勢、遺伝子型、ストレス耐性、休眠性、病害抵抗性、虫害抵抗性、QTL特性、食味性、出穂期、および形態学的な特徴からなる群から選択される形質の少なくとも1つである、[1]または[2]記載の選別方法。
[4]
2以上の所望の形質の予測が、以下の(i)~(iii)のいずれかの方法によって行われることを特徴とする、[1]~[3]のいずれか記載の選別方法:
(i)形質ごとの良否に着目して形質ごとの判別モデルにより個別に種子をスコア化し、すべてのスコアが閾値以上のものを良種子と予測する方法、
(ii)形質ごとの良否は顧みず、所望の形質のすべてが良好である場合を良好とし、それ以外の場合は不良とすることで単一の判別モデルを選択し、当該単一の判別モデルにより種子をスコア化し、スコアが閾値以上のものを良種子と予測する方法、
(iii)形質ごとの良否に着目して形質ごとの判別モデルを選択した後に、当該形質ごとの判別モデルを用いて得られたスコアを統合して統合スコアとし、該統合スコアの順位に基づき、各種子の良否を予測する方法。
[5]
判別モデルの精度を評価する工程をさらに含む、[1]~[4]のいずれか記載の選別方法。
[6]
植物が園芸作物である、[1]~[5]のいずれか記載の選別方法。
[1]
A method for selecting a plant seed having a desired trait, comprising the following steps 1 to 7:
(Step 1) Removing a portion of seeds from a seed population;
(a) constructing a data set for the extracted seeds, the data set constructing step including:
(a) irradiating the extracted seeds with light to obtain spectral data and/or derived spectral data;
(b) subjecting the seeds from which the spectral data has been acquired to a quality test for a desired trait and determining a preliminary yield rate of the extracted seeds;
Here, the preliminary yield rate is defined as follows:
[Preliminary quality rate (%)] = [Number of seeds having the desired characteristics among the seeds subjected to quality evaluation] / [Number of seeds subjected to quality evaluation] × 100
(c) Correlating the spectral data of each seed with the results of quality testing;
(Third step) applying multivariate discriminant analysis to the data set constructed in the second step to derive candidates for a good/bad discrimination model for calculating a discriminant score for a desired trait in each seed;
(4th step) The recovery rate and the conformity rate are defined by the following formula:
[Recovery rate (%)] = [Number of seeds selected based on the discriminant score among the seeds subjected to the quality evaluation] / [Number of seeds subjected to the quality evaluation] × 100
[Conformity rate (%)] = [Number of seeds having desired characteristics among the seeds subjected to the quality evaluation and selected based on the discriminant score] / [Number of seeds selected based on the discriminant score among the seeds subjected to the quality evaluation] × 100
In addition, when the selection conditions that make the preliminary non-defective rate and the recovery rate equal are defined as "standard selection conditions",
a step of selecting a discrimination model from the candidates of the quality discrimination model derived in the third step, the step including selecting, as a quality discrimination model, a candidate that satisfies a criterion that the matching rate exceeds a preliminary quality rate under the standard screening conditions;
(Step 5) A step of irradiating the remaining seeds not removed from the seed population in Step 1 and/or other seeds obtained under substantially the same conditions as those of the seeds with light to obtain spectral data, and applying the discrimination model selected in Step 4 to the spectral data to determine a discrimination score for each seed;
(Sixth step) determining a threshold value of the discriminant score based on the discriminant score determined in the fifth step;
(Step 7) A step of recovering seeds predicted to have desired traits by comparing the discrimination scores of each seed determined in step 5 with the discrimination score threshold determined in step 6.
[2]
The selection method according to [1], which is a method for selecting a plant seed having two or more desired traits.
[3]
The selection method according to [1] or [2], wherein the trait is at least one trait selected from the group consisting of germination ability, germination vigor, genotype, stress resistance, dormancy, disease resistance, insect resistance, QTL characteristics, eating quality, heading time, and morphological characteristics.
[4]
The method for selecting a plant according to any one of [1] to [3], wherein the prediction of two or more desired traits is carried out by any one of the following methods (i) to (iii):
(i) A method of scoring seeds individually using a discrimination model for each trait, focusing on the quality of each trait, and predicting that seeds with all scores equal to or above a threshold are good seeds;
(ii) A method of selecting a single discrimination model by determining that a seed is good if all of the desired traits are good, and determining that the other traits are bad, regardless of whether each trait is good or bad, scoring the seeds using the single discrimination model, and predicting that seeds with a score equal to or higher than a threshold are good seeds;
(iii) A method in which a discrimination model for each trait is selected by focusing on the quality of each trait, the scores obtained using the discrimination model for each trait are integrated to obtain an integrated score, and the quality of each seed is predicted based on the ranking of the integrated score.
[5]
The selection method according to any one of [1] to [4], further comprising a step of evaluating the accuracy of the discrimination model.
[6]
The method for selecting a plant according to any one of [1] to [5], wherein the plant is a horticultural crop.
本発明によれば、所望の形質を有する植物種子と、所望の形質を有さない植物種子の集団から、非破壊的、高効率、且つ高精度に所望の形質を有する植物種子を選別することができる。 According to the present invention, it is possible to select plant seeds having desired traits from a population of plant seeds having and not having the desired traits non-destructively, highly efficiently, and with high accuracy.
以下、本発明について具体的に説明する。 The present invention will now be described in detail.
本発明は、所望の形質を有する植物種子の選別方法であって、以下の第1工程~第7工程を含む、方法(以下、「本発明の方法」と称することがある)を提供する:
(第1工程)種子集団から一部の種子を取り出す工程、
(第2工程)取り出された種子に対するデータセットを構築する工程であって、該データセットを構築する工程は以下(a)~(c)を含み:
(a)取り出された種子に対して光を照射してスペクトルデータおよび/またはその派生スペクトルデータを取得すること、
(b)スペクトルデータを取得した種子を所望の形質に対する品質検査に供し、該取り出された種子の事前良品率を決定すること、
ここで、事前良品率は次式で定義される:
[事前良品率(%)]=[品質評価に供された種子のうち、所望の形質を有していた種子数]/[品質評価に供された種子数]×100
(c)各種子のスペクトルデータと品質検査の結果とを対応させること、
(第3工程)第2工程で構築されたデータセットに対して多変量判別分析を適用することにより、各種子における所望の形質に対する判別スコアを算出するための良否判別モデルの候補を導出する工程、
(第4工程)回収率および適合率を次式で定義し:
[回収率(%)]=[品質評価に供された種子のうち、判別スコアに基づいて選択される種子数]/[品質評価に供された種子数]×100
[適合率(%)]=[品質評価に供された種子であって、判別スコアに基づいて選択される種子のうち、所望の形質を有している種子数]/[品質評価に供された種子のうち、判別スコアに基づいて選択される種子数]×100
かつ、事前良品率と回収率を同値とする選別条件を「標準選別条件」と定義するとき、
該標準選別条件下で、適合率が事前良品率を上回るとの基準を満たす候補を良否判別モデルとして選択することを含む、第3工程で導出された良否判別モデルの候補から判別モデルを選択する工程、
(第5工程)第1工程で種子集団から取り出されなかった残りの種子、および/または該種子と実質的に同一の条件下で得られた別の種子に対して光を照射してスペクトルデータを取得し、該スペクトルデータに第4工程で選択された判別モデルを適用することで各種子の判別スコアを決定する工程、
(第6工程)第5工程で決定された判別スコアに基づいて、判別スコアの閾値を決定する工程、
(第7工程)第5工程で決定された各種子の判別スコアと第6工程で決定された判別スコアの閾値とを比較することで、所望の形質を有すると予測される種子を回収する工程。
The present invention provides a method for selecting a plant seed having a desired trait, the method comprising the following steps 1 to 7 (hereinafter, sometimes referred to as the "method of the present invention"):
(Step 1) Removing a portion of seeds from a seed population;
(a) constructing a data set for the extracted seeds, the data set constructing step including:
(a) irradiating the extracted seeds with light to obtain spectral data and/or derived spectral data;
(b) subjecting the seeds from which the spectral data has been acquired to a quality test for a desired trait and determining a preliminary yield rate of the extracted seeds;
Here, the preliminary yield rate is defined as follows:
[Preliminary quality rate (%)] = [Number of seeds having the desired characteristics among the seeds subjected to quality evaluation] / [Number of seeds subjected to quality evaluation] × 100
(c) Correlating the spectral data of each seed with the results of quality testing;
(Step 3) applying multivariate discriminant analysis to the data set constructed in Step 2 to derive candidates for a good/bad discrimination model for calculating a discriminant score for a desired trait in each seed;
(4th step) The recovery rate and the conformity rate are defined by the following formula:
[Recovery rate (%)] = [Number of seeds selected based on the discriminant score among the seeds subjected to the quality evaluation] / [Number of seeds subjected to the quality evaluation] × 100
[Conformity rate (%)] = [Number of seeds having desired characteristics among the seeds subjected to the quality evaluation and selected based on the discriminant score] / [Number of seeds selected based on the discriminant score among the seeds subjected to the quality evaluation] × 100
In addition, when the selection conditions that make the preliminary non-defective rate and the recovery rate equal are defined as "standard selection conditions",
a step of selecting a discrimination model from the candidates of the quality discrimination model derived in the third step, the step including selecting, as a quality discrimination model, a candidate that satisfies a criterion that the matching rate exceeds a preliminary quality rate under the standard screening conditions;
(Step 5) A step of irradiating the remaining seeds not removed from the seed population in Step 1 and/or other seeds obtained under substantially the same conditions as those of the seeds with light to obtain spectral data, and applying the discrimination model selected in Step 4 to the spectral data to determine a discrimination score for each seed;
(Sixth step) determining a threshold value of the discriminant score based on the discriminant score determined in the fifth step;
(Step 7) A step of recovering seeds predicted to have desired traits by comparing the discrimination scores of each seed determined in step 5 with the discrimination score threshold determined in step 6.
本発明の方法において、「植物種子」とは、あらゆる植物の種子が意図され、特に限定されない。植物としては、種子植物であれば特に限定されず、例えば、被子植物及び裸子植物のいずれであってもよい。また、植物が被子植物である場合、植物は双子葉類及び単子葉類のいずれであってもよい。また、植物が双子葉類であるとき、植物は、合弁花類及び離弁花類のいずれであってもよい。一態様において、本発明の方法は、高付加価値を有する植物の種子の選別に用いられ得る。かかる高付加価値を有する植物としては、例えば、園芸作物や、建築用木材として用いられ得る植物が挙げられるが、これらに限定されない。園芸作物には、野菜、果樹および花卉が含まれる。尚、野菜としては、例えば、カボチャ、エンドウ、レタス、ネギ、トマト、カリフラワー、ニガウリ、オクラ、タマネギ、ミョウガ、大豆、フキ、アスパラ、ニラ、ソラマメ、セロリ、ニンジン、水菜、小松菜、春菊、大根、ブロッコリー、ホウレンソウ、ハクサイ、ルッコラ、レンコン、かぶ、アボカド、キュウリ、パプリカ、ニンニク、トウモロコシ、ズッキーニ、パセリ、パクチー、ナス、およびピーマン等が挙げられるがこれらに限定されない。また、果樹としては、ウメ、イチジク、アケビ、アセロラ、アボカド、オリーブ、オレンジ、カキ、カリン、グアバ、クランベリー、クルミ、グレープフルーツ、サクランボ、およびザクロ等が挙げられるがこれらに限定されない。また、花卉としては、アサガオ、ケイトウ、コスモス、ジニア、セイヨウオダマキ、センニチコウ、ペチュニア、ニチニチソウ、ハボタン、ヒマワリ、ホウセンカ、マツバボタン、ハナスベリヒユ、フウセンカズラ、マリーゴールド、カスミソウ、キンギョソウ、キンセンカ、スイートピー、ストック、ダイアンサス、デージー、ニゲラ、ネメシア、ネモフィラ、ハナビシソウ、バーベナ、パンジー、ビオラ、ヒナゲシ、ヤグルマギク、ルピナス、およびワスレナグサ等が挙げられるがこれらに限定されない。また、建築用木材として用いられ得る植物としては、スギ、ヒノキ、ヒバ、クリ、ケヤキ、サクラ、ブナ、ウォルナット、ファルカタ、およびアカマツ等が挙げられるがこれらに限定されない。 In the method of the present invention, the term "plant seed" refers to the seeds of any plant, and is not particularly limited. The plant is not particularly limited as long as it is a seed plant, and may be, for example, either angiosperms or gymnosperms. Furthermore, when the plant is angiosperms, the plant may be either dicotyledonous or monocotyledonous. Furthermore, when the plant is dicotyledonous, the plant may be either sympetalous or polypetalous. In one aspect, the method of the present invention can be used to select seeds of plants with high added value. Examples of such plants with high added value include, but are not limited to, horticultural crops and plants that can be used as building timber. Horticultural crops include vegetables, fruit trees, and ornamental plants. Examples of vegetables include, but are not limited to, pumpkin, peas, lettuce, green onions, tomatoes, cauliflower, bitter melon, okra, onions, Japanese ginger, soybeans, butterbur, asparagus, Chinese chives, broad beans, celery, carrots, mizuna, komatsuna, chrysanthemum, radish, broccoli, spinach, Chinese cabbage, arugula, lotus root, turnip, avocado, cucumber, paprika, garlic, corn, zucchini, parsley, cilantro, eggplant, and green peppers. Examples of fruit trees include, but are not limited to, plum, fig, akebia, acerola, avocado, olive, orange, persimmon, quince, guava, cranberry, walnut, grapefruit, cherry, and pomegranate. In addition, examples of flowers include, but are not limited to, morning glory, cockscomb, cosmos, zinnia, columbine, globe amaranth, petunia, periwinkle, cabbage, sunflower, impatiens, portulaca, portulaca, balloon vine, marigold, gypsophila, snapdragon, calendula, sweet pea, stock, dianthus, daisy, nigella, nemesia, nemophila, poppy, verbena, pansy, viola, corn poppy, cornflower, lupine, and forget-me-not. In addition, examples of plants that can be used as building timber include, but are not limited to, cedar, cypress, Japanese cypress, chestnut, zelkova, cherry, beech, walnut, falcata, and red pine.
本発明の方法において、「所望の形質」とは、種子そのものが有する形質および種子を生育させた際に当該種子から発芽した植物が発現する形質の両方を包含する概念とする。本発明の方法において選別の対象となる「形質」としては、例えば、発芽能、発芽勢、遺伝子型、ストレス耐性、休眠性、病害抵抗性、虫害抵抗性、QTL特性、食味性、出穂期、および、葉の大きさなどの形態学的な特徴等が挙げられるがこれらに限定されない。 In the method of the present invention, the term "desired trait" encompasses both traits that the seed itself possesses and traits that are expressed in a plant that germinates from the seed when the seed is grown. Examples of "traits" that are the subject of selection in the method of the present invention include, but are not limited to, germination ability, germination vigor, genotype, stress resistance, dormancy, disease resistance, insect resistance, QTL characteristics, taste, heading time, and morphological characteristics such as leaf size.
(第1工程)
本発明の第1工程においては、種子集団から一部の種子を取り出すことを特徴とする。種子集団を構成する種子数は2以上であればよく、特に上限はない。例えば、種子集団は、通常、2粒~10,000,000粒、好ましくは1,000粒~10,000,000粒、より好ましくは10,000粒~10,000,000粒の種子からなる種子集団であってよいがこれらに限定されない。或いは、種子集団は、重量で示されるものであってもよい。例えば、種子集団は、1g~10,000kg、好ましくは1kg~10,000kg、より好ましくは10kg~10,000kgの重量の種子集団であってよいがこれらに限定されない。
(First step)
The first step of the present invention is characterized in that some seeds are taken out from the seed population. The number of seeds constituting the seed population may be 2 or more, with no particular upper limit. For example, the seed population may be, but is not limited to, a seed population consisting of usually 2 to 10,000,000 seeds, preferably 1,000 to 10,000,000 seeds, and more preferably 10,000 to 10,000,000 seeds. Alternatively, the seed population may be expressed by weight. For example, the seed population may be, but is not limited to, a seed population weighing 1 g to 10,000 kg, preferably 1 kg to 10,000 kg, and more preferably 10 kg to 10,000 kg.
また、種子集団の「一部」としては、選別の対象となる「形質」によって異なり得るが、通常、50粒~50,000粒、好ましくは100~30,000粒、より好ましくは200粒~3,000粒程度であるが、これらに限定されない。 The "portion" of the seed population may vary depending on the "traits" to be selected, but is usually, but not limited to, about 50 to 50,000 seeds, preferably 100 to 30,000 seeds, and more preferably 200 to 3,000 seeds.
(第2工程)
本発明の第2工程は、取り出された種子に対するデータセットを構築する工程である。なお、データセットの構築は、少なくとも以下の(a)~(c)を含む。
(Second step)
The second step of the present invention is to construct a data set for the extracted seeds. The construction of the data set includes at least the following steps (a) to (c):
(a)取り出された種子に対して光を照射してスペクトルデータおよび/またはその派生スペクトルデータを取得すること。
(b)スペクトルデータを取得した種子を所望の形質に対する品質検査に供し、該取り出された種子の事前良品率を決定すること。尚、(b)における事前良品率は次式で定義される:
[事前良品率(%)]=[品質評価に供された種子のうち、所望の形質を有していた種子数]/[品質評価に供された種子数]×100
(c)各種子のスペクトルデータと品質検査の結果とを対応させること。
(a) irradiating the extracted seeds with light to obtain spectral data and/or derived spectral data.
(b) subjecting the seeds for which the spectral data has been acquired to a quality inspection for a desired characteristic, and determining a preliminary yield rate of the extracted seeds. The preliminary yield rate in (b) is defined by the following formula:
[Preliminary quality rate (%)] = [Number of seeds having the desired characteristics among the seeds subjected to quality evaluation] / [Number of seeds subjected to quality evaluation] × 100
(c) Correlation of the spectral data of each seed with the results of quality testing.
尚、本明細書において、「事前良品率」を「初期適合率」と表記することがある。 In this specification, the "preliminary conforming rate" may be referred to as the "initial conforming rate."
データセットの構築の(a)における、種子に対して光を照射してスペクトルデータ(反射、吸収、および/または透過スペクトルデータ)および/またはその派生スペクトルデータを取得する方法は、光学的分析の技術分野において一般的に用いられる自体公知の方法を用いることができる。また、種子に照射する光は、スペクトルデータを得られるものであれば特に限定されない。種子に照射する光としては、例えば、マイクロ波、テラヘルツ波、赤外光、可視光、紫外光、X線、およびγ線であり得るが、これらに限定されない。本発明の好ましい一態様において、光は可視光または赤外光であり得る。尚、スペクトルデータの取得は、例えば、本願の実施例で用いた方法や条件のほか、特許第6782408号等で教示される方法を例示することができるがこれらに限定されない。 In (a) of constructing the data set, the method of irradiating light onto the seeds to obtain spectral data (reflection, absorption, and/or transmission spectral data) and/or derived spectral data thereof may be a method generally used in the technical field of optical analysis. The light irradiated onto the seeds is not particularly limited as long as it can obtain spectral data. The light irradiated onto the seeds may be, for example, microwaves, terahertz waves, infrared light, visible light, ultraviolet light, X-rays, and gamma rays, but is not limited to these. In a preferred embodiment of the present invention, the light may be visible light or infrared light. The method of obtaining the spectral data may be, for example, the method and conditions used in the examples of this application, as well as the method taught in Patent No. 6782408, but is not limited to these.
また、得られたスペクトルデータの派生スペクトルデータを生成する方法も光学的分析の技術分野における自体公知の方法を用いればよい。例えば、得られたスペクトルデータ(R)に対して、逆数(1/R)変換、対数変換、標準正規化、SG(Savizky-Golay)フィルタによる平滑化および平滑化微分、又はこれらを任意に組み合わせた変換を行うことで、派生スペクトルデータを生成することができる。尚、スペクトルデータおよび派生スペクトルデータは、そのいずれか1つのみを用いてもよいし、2つ以上を用いてもよい。好ましい一態様において、用いられるスペクトルデータはスペクトルデータおよびそれより生成した派生スペクトルデータのうち、最も精度の高い予測モデルを導き得る1つのスペクトルデータであり得る。 Furthermore, the method of generating derived spectral data from the obtained spectral data may be a method known per se in the technical field of optical analysis. For example, the derived spectral data can be generated by performing reciprocal (1/R) transformation, logarithmic transformation, standard normalization, smoothing and smoothing differentiation using an SG (Savizky-Golay) filter, or any combination of these on the obtained spectral data (R). Note that only one of the spectral data and the derived spectral data may be used, or two or more of them may be used. In a preferred embodiment, the spectral data used may be one of the spectral data and the derived spectral data generated therefrom that is capable of deriving the most accurate prediction model.
尚、スペクトルデータを取得するための機器についても、自体公知の機器を用いればよい。例えば、生体組織の分光特性は、光ファイバー式の分光器による点計測が可能なほか、リモートセンサの一種であるハイパースペクトルカメラを用いることにより、座標情報(画像)と併せて計測(測定)することができる(面計測)。計測に際しては、対象とする波長域において高い検出感度を示す機器を適宜選択すればよい。本発明の方法においては、例えば、CCD、CMOS、CQD、InGaAs、HgCdTe(MCT)、TypeII超格子(T2SL)等の光検出器を備えた機器を用いることができるが、これらに限定されない。また、スペクトルデータを取得する際に、必要に応じて、反射率補正画像を生成し、種子認識モデルを適用することもできる(図3のST1~3を参照)。 In addition, the equipment used to acquire the spectral data may be any known equipment. For example, the spectral characteristics of biological tissues can be measured by point measurement using a fiber optic spectrometer, or by using a hyperspectral camera, which is a type of remote sensor, to measure (measure) them together with coordinate information (images) (surface measurement). When making measurements, an equipment that exhibits high detection sensitivity in the target wavelength range may be appropriately selected. In the method of the present invention, for example, equipment equipped with a photodetector such as a CCD, CMOS, CQD, InGaAs, HgCdTe (MCT), or Type II superlattice (T2SL) may be used, but is not limited to these. In addition, when acquiring the spectral data, a reflectance correction image may be generated and a seed recognition model may be applied, if necessary (see ST1-3 in FIG. 3).
データセットの構築の(b)における品質検査もまた、所望の形質に対する事前良品率が決定できる限り、特に限定されない。例えば、所望の形質が「発芽能」である場合、スペクトルデータを取得した種子を、例えばISTAが定める国際種子検査規定に準拠した発芽試験やこれに準ずる発芽試験に供することで容易に確認することができる。そして、発芽試験に供した種子数をA(個)、実際に発芽した種子数をB(個)としたとき、[事前良品率(%)(即ち、発芽率)]は、[品質評価に供された種子のうち、発芽した種子数(B個)]/[品質評価に供された種子数(A個)]×100との定義式から、100B/A(%)と決定することができる。 The quality inspection in (b) of constructing the data set is also not particularly limited as long as the preliminary pass rate for the desired trait can be determined. For example, if the desired trait is "germination ability," this can be easily confirmed by subjecting the seeds from which the spectral data has been acquired to a germination test that complies with the International Seed Inspection Standards established by ISTA or a similar germination test. If the number of seeds subjected to the germination test is A (pieces) and the number of seeds that actually germinated is B (pieces), the [preliminary pass rate (%) (i.e., germination rate)] can be determined as 100B/A (%) from the definition formula [number of germinated seeds (B pieces) among the seeds subjected to quality evaluation] / [number of seeds subjected to quality evaluation (A pieces)] × 100.
データセットの構築の(c)における、「各種子のスペクトルデータと品質検査の結果とを対応させること」を、具体的に形質が「発芽能」の場合を用いて説明するに、種子A、種子Bおよび種子Cに対して光を照射して得られた各種子のスペクトルデータ(反射、吸収、若しくは透過スペクトルデータまたはその派生スペクトルデータ)を、それぞれ、SA、SB、SCとし、また、当該種子A、種子Bおよび種子Cを品質検査に供することにより種子Aが正常発芽、種子Bが異常発芽し、種子Cが不発芽であるとの検査結果を得た場合に、(c)においては、SAは正常発芽、SBは異常発芽、SCは不発芽といったように、スペクトルデータと品質結果とを対応させることが意図される。なお、この際に、良否の判別を容易にする目的で、ダミー変数を充ててもよい。例えば、「発芽能」の場合、正常発芽には1(良好)、異常発芽および不発芽には-1(不良)といった変数を充てることができるが、これに限定されない。 To explain "matching the spectral data of each seed with the quality test results" in (c) of constructing the dataset specifically using the case where the trait is "germination ability", the spectral data (reflection, absorption, or transmission spectral data or derivative spectral data) of each seed obtained by irradiating light on seeds A, B, and C are SA, SB, and SC, respectively. Also, when seeds A, B, and C are subjected to a quality test and the test results show that seed A germinates normally, seed B germinates abnormally, and seed C does not germinate, in (c), the spectral data is intended to be matched with the quality results, such as SA being normal germination, SB being abnormal germination, and SC being non-germination. At this time, dummy variables may be assigned in order to make it easier to determine whether the result is good or bad. For example, in the case of "germination ability", variables such as 1 (good) can be assigned to normal germination and -1 (bad) can be assigned to abnormal germination and non-germination, but this is not limiting.
(第3工程)
本発明の第3工程は、第2工程で構築されたデータセットに対して多変量判別分析を適用することにより、各種子における所望の形質に対する判別スコアを算出するための良否判別モデルの候補を導出する工程である。
(Third process)
The third step of the present invention is a step of deriving candidates for a good/bad discrimination model for calculating a discrimination score for a desired trait in each seed by applying multivariate discriminant analysis to the data set constructed in the second step.
本工程における多変量判別分析は、自体公知の方法を用いればよく、本発明の所望の効果を得られる限り、いずれの方法を用いてもよい。本発明の方法において、多変量統計分析技法としては、部分的最小二乗法(Partial Least Squares;PLS)または主成分分析(Principal component analysis;PCA)を用いることができるがこれらに限定されない。多変量判別分析は、好ましくは部分的最小二乗判別分析(Partial Least Square-Discriminant Analysis;PLS-DA)であり得る。 The multivariate discriminant analysis in this step may be performed using a method known per se, and any method may be used as long as the desired effect of the present invention can be obtained. In the method of the present invention, the multivariate statistical analysis technique may be, but is not limited to, partial least squares (PLS) or principal component analysis (PCA). The multivariate discriminant analysis may preferably be partial least squares-Discriminant analysis (PLS-DA).
一例として、PLS-DAを用いて多変量判別分析を行う場合を概説する。 As an example, we will outline how to perform multivariate discriminant analysis using PLS-DA.
PLS-DAは、設定された説明変数および目的変数を用いて次式1のような式をモデリングするものである。 PLS-DA involves modeling an equation such as the following Equation 1 using the explanatory variables and target variables that have been set.
(式1)
y=a+b1x1+b2x2+b3x3+b4x4+…
(Equation 1)
y=a+b 1 x 1 +b 2 x 2 +b 3 x 3 +b 4 x 4 +...
ここで、x1、x2、x3およびx4は説明変数であり、yは目的変数であり、aは切片(定数)であり、b1、b2、b3およびb4は偏回帰係数(定数)である。 Here, x 1 , x 2 , x 3 and x 4 are explanatory variables, y is a response variable, a is an intercept (constant), and b 1 , b 2 , b 3 and b 4 are partial regression coefficients (constants).
第2工程で取得されたスペクトルデータを説明変数とし、品質検査により得られた結果を目的変数とし、各種子における所望の形質に対する判別スコアを算出するための良否判別モデルの候補を導出する。良否判別モデルの候補の導出方法もまた、自体公知の方法を用いればよいが、以下に、本発明の実施例において用いた方法を一例として説明する。 The spectral data acquired in the second step is used as explanatory variables, and the results obtained by the quality inspection are used as objective variables, and a candidate quality discrimination model for calculating a discrimination score for the desired traits in each seed is derived. The method for deriving the candidate quality discrimination model can also be a method known per se, but the method used in the examples of the present invention is described below as an example.
種子に光を照射して取得した反射、吸収、若しくは透過スペクトルデータ(R)から、自身(X1)を含めて、上述した変換方法およびその任意の組み合わせから、派生スペクトルデータXsを生成させる。判別分析では、その全派生スペクトルの、説明変数としての使用を試みるが、導出された判別モデルの予測性能が同等の場合は、なるべく単純な派生パターンを採択すればよい。 From the reflection, absorption, or transmission spectral data (R) obtained by irradiating light on the seeds, including the data itself (X1), derived spectral data Xs is generated using the above-mentioned conversion method and any combination thereof. In the discriminant analysis, an attempt is made to use all of the derived spectra as explanatory variables, but if the predictive performance of the derived discriminant model is equivalent, the simplest derived pattern can be adopted.
(良否判別モデルの候補の導出手順)
データセットの最終準備を次のように行う。
反射、吸収、または透過スペクトルを取得し、さらにその派生スペクトルを生成させ、説明変数データを準備する。必要に応じて、目的変数および説明変数のいずれか又は両方に対して標準化を行う。多変量判別分析により判別モデルの導出を行う。例えば、線形スパースモデリングによる判別モデルの導出は次のように行うことができる。目的変数と説明変数データのすべての組み合わせについて、最初にPLS回帰やRidge回帰により、各説明変数の重みの初期値を決定する。続いてAdaptive LASSO回帰により、正則化係数λと偏回帰係数、及び偏回帰係数が0とならない説明変数の個数の関係を表す解パスを算定する。回帰における残差平方和を最小化する説明変数の個数をpとすると、説明変数をp個以下に削減する場合に適した説明変数の組み合わせの候補は、解パスより決定できる。最終的な判別モデルはPLS-DAにより導出する。Adaptive LASSO回帰における解パスより決定した、2~p個の説明変数を使用して判別モデルを導出する。導出された判別モデルを良否判別モデルの候補とする。尚、この段階では多数の候補判別モデルが提示される。良種子に対して、より高い相対スコアを与える判別モデルの選択は、次の工程4において、実施される。
(Procedure for deriving candidate pass/fail discrimination models)
Final preparation of the dataset is as follows:
A reflection, absorption, or transmission spectrum is acquired, and a derived spectrum is generated to prepare explanatory variable data. If necessary, standardization is performed for either or both of the objective variable and the explanatory variable. A discriminant model is derived by multivariate discriminant analysis. For example, a discriminant model can be derived by linear sparse modeling as follows. For all combinations of objective variable and explanatory variable data, first, an initial value of the weight of each explanatory variable is determined by PLS regression or Ridge regression. Next, a solution path that represents the relationship between the regularization coefficient λ and the partial regression coefficient, and the number of explanatory variables whose partial regression coefficients are not 0, is calculated by Adaptive LASSO regression. If the number of explanatory variables that minimize the residual sum of squares in the regression is p, candidates for combinations of explanatory variables suitable for reducing the number of explanatory variables to p or less can be determined from the solution path. The final discriminant model is derived by PLS-DA. A discriminant model is derived using 2 to p explanatory variables determined from the solution path in Adaptive LASSO regression. The derived discriminant model is set as a candidate for the good/bad discrimination model. Note that at this stage, many candidate discriminant models are presented. The discriminant model that gives a higher relative score to good seeds is selected in the next step 4.
(第4工程)
本発明の第4工程は、第3工程で導出された良否判別モデルの候補から判別モデルを選択する工程である。
(Fourth step)
The fourth step of the present invention is a step of selecting a discrimination model from the candidates of the pass/fail discrimination model derived in the third step.
第4工程における、「回収率」、「適合率」、「標準選別条件」は、以下のように定義される。 In the fourth step, the "recovery rate," "conformance rate," and "standard sorting conditions" are defined as follows:
[回収率(%)]=[品質評価に供された種子のうち、判別スコアに基づいて選択される種子数]/[品質評価に供された種子数]×100 [Recovery rate (%)] = [Number of seeds selected based on the discrimination score from among the seeds subjected to quality evaluation] / [Number of seeds subjected to quality evaluation] x 100
[適合率(%)]=[品質評価に供された種子であって、判別スコアに基づいて選択される種子のうち、所望の形質を有している種子数]/[品質評価に供された種子のうち、判別スコアに基づいて選択される種子数]×100 [Conformity rate (%)] = [Number of seeds with the desired traits among the seeds subjected to quality evaluation and selected based on the discrimination score] / [Number of seeds selected based on the discrimination score among the seeds subjected to quality evaluation] x 100
[標準選別条件]:[事前良品率]と[回収率]を同値とする選別条件 [Standard sorting conditions]: Sorting conditions that make the [preliminary quality rate] and [recovery rate] equal.
回収率、適合率および標準選別条件を上述のとおり定義した時、適合率が事前良品率を上回るとの基準を満たす候補を良否判別モデルとして選択することで、第3工程で得られた多数のモデル候補から、所望の形質に対して良否を判別し得る判別モデルを選択することができる。なお、適合率が事前良品率を大きく上回るほど、優れた判別モデルであると考えられる。また、適合率を必要に応じて簡便に調整できることを可能とする目的で、回収率の増減と適合率の増減とに高い順位相関(即ち、回収率を減少させると適合率が確実に増加し、回収率を増加させると適合率が確実に減少する関係)を有する判別モデルを選択することが好ましい。 When the recovery rate, precision rate, and standard selection conditions are defined as described above, candidates that meet the criterion that the precision rate exceeds the preliminary pass rate are selected as pass/fail discrimination models, making it possible to select a discrimination model capable of discriminating pass/fail for the desired trait from the many model candidates obtained in the third step. Note that the greater the precision rate exceeds the preliminary pass rate, the better the discrimination model is considered to be. Furthermore, in order to enable easy adjustment of the precision rate as needed, it is preferable to select a discrimination model that has a high rank correlation between increases and decreases in the recovery rate and increases and decreases in the precision rate (i.e., a relationship in which a decrease in the recovery rate reliably increases the precision rate, and an increase in the recovery rate reliably decreases the precision rate).
尚、本明細書において、「適合率」を「事後良品率」と表記することがある。 In this specification, "conformance rate" may be referred to as "post-test quality rate."
(第5工程)
本発明の第5工程は、第1工程で種子集団から取り出されなかった残りの種子、および/または該種子と実質的に同一の条件下で得られた別の種子に対して光を照射してスペクトルデータを取得し、該スペクトルデータに第4工程で選択された判別モデルを適用することで各種子の判別スコアを決定する工程である。換言すれば、第5工程は、種子集団のうち、非訓練種子に対してのスペクトルデータを取得し、これに第4工程で得られた判別モデルを適用することで、各種子の判別スコアを決定する工程である。スペクトルデータの取得方法や条件などは、訓練種子において使用したものと同じものを用いればよく、また、所望の効果を得られる限り、部分的に異なっていてもよい。
(Fifth step)
The fifth step of the present invention is a step of irradiating the remaining seeds not extracted from the seed population in the first step and/or other seeds obtained under substantially the same conditions as the seeds to obtain spectral data, and determining the discrimination score of each seed by applying the discrimination model selected in the fourth step to the spectral data. In other words, the fifth step is a step of obtaining spectral data for non-training seeds in the seed population, and determining the discrimination score of each seed by applying the discrimination model obtained in the fourth step to the spectral data. The method and conditions for obtaining the spectral data may be the same as those used for the training seeds, and may be partially different as long as the desired effect is obtained.
(第6工程)
本発明の第6工程は、第5工程で決定された判別スコアに基づいて、判別スコアの閾値を決定する工程である。閾値は、種子集団における事前良品率が「k%」であったとすると、上位から「k%」に位置するスコアが標準的な選別を行うための閾値(標準閾値)と設定することができる。尚、上述の第4工程において、回収率の増減と事後良品率の増減とに高い順位相関を有する判別モデルを選択した場合は、標準閾値を基準として当該閾値を適宜増減すれば、適合率を調整することもできる。
(Sixth step)
The sixth step of the present invention is a step of determining a threshold value of the discrimination score based on the discrimination score determined in the fifth step. If the pre-quality rate in the seed population is "k%," the score that is "k%" from the top can be set as a threshold value (standard threshold value) for standard selection. In addition, if a discrimination model having a high rank correlation between the increase/decrease in the recovery rate and the increase/decrease in the post-quality rate is selected in the fourth step described above, the matching rate can be adjusted by appropriately increasing/decreasing the threshold value based on the standard threshold value.
(第7工程)
本発明の第7工程は、第5工程で決定された各種子の判別スコアと第6工程で決定された判別スコアの閾値とを比較することで、所望の形質を有すると予測される種子を回収する工程である。種子の回収は、所望の形質を有すると予測される種子を選別して取り出すことで達成してもよいし、或いは、所望の形質を有さないと予測される種子を選別して取り出し、取り出されなかった種子を、所望の形質を有する種子として回収してもよい。
(Seventh step)
The seventh step of the present invention is a step of recovering seeds predicted to have the desired trait by comparing the discrimination scores of each seed determined in the fifth step with the discrimination score threshold determined in the sixth step. The recovery of the seeds may be achieved by selecting and removing the seeds predicted to have the desired trait, or by selecting and removing the seeds predicted not to have the desired trait, and recovering the seeds not removed as seeds having the desired trait.
本発明の一態様において、本発明の方法は、2以上の所望の形質を有する植物種子の選別方法であり得る。 In one embodiment of the present invention, the method of the present invention can be a method for selecting plant seeds having two or more desired traits.
例えば、2以上の形質としては、先に列挙した形質(即ち、発芽能、発芽勢、遺伝子型、ストレス耐性、休眠性、病害抵抗性、虫害抵抗性、QTL特性、食味性、出穂期、および、葉の大きさなどの形態学的な特徴等)のいずれか2つ以上の組み合わせが挙げられるが、これらに限定されない。 For example, examples of two or more traits include, but are not limited to, a combination of two or more of the traits listed above (i.e., germination ability, germination vigor, genotype, stress resistance, dormancy, disease resistance, insect resistance, QTL characteristics, eating quality, heading time, and morphological characteristics such as leaf size, etc.).
本発明の方法が、2以上の所望の形質の選別方法である場合、2以上の所望の形質の予測は、以下の(i)~(iii)のいずれかの方法によって行われることを特徴とする。 When the method of the present invention is a method for selecting two or more desired traits, the prediction of the two or more desired traits is performed by any one of the following methods (i) to (iii).
(i)形質ごとの良否に着目して形質ごとの判別モデルにより個別に種子をスコア化し、すべてのスコアが閾値以上のものを良種子と予測する方法(「複数モデルによる逐次予測」)。 (i) A method that focuses on the quality of each trait, scores seeds individually using a discrimination model for each trait, and predicts that seeds with all scores above a threshold are good ("sequential prediction using multiple models").
(ii)形質ごとの良否は顧みず、所望の形質のすべてが良好である場合を良好とし、それ以外の場合は不良とすることで単一の判別モデルを選択し、当該単一の判別モデルにより種子をスコア化し、スコアが閾値以上のものを良種子と予測する方法(「単一モデルによる直接予測」)。 (ii) A method in which a single discriminant model is selected by ignoring the quality of each trait, determining that seeds with all desired traits in good condition are good, and determining all other traits as bad, scoring the seeds using that single discriminant model, and predicting that seeds with scores above a threshold are good seeds ("direct prediction using a single model").
(iii)形質ごとの良否に着目して形質ごとの判別モデルを選択した後に、当該形質ごとの判別モデルを用いて得られたスコアを統合して統合スコアとし、該統合スコアの順位に基づき、各種子の良否を予測する方法(「統合モデルによる直接予測」)。 (iii) A method in which a discrimination model for each trait is selected by focusing on the quality of each trait, the scores obtained using the discrimination model for each trait are integrated to obtain an integrated score, and the quality of each seed is predicted based on the ranking of the integrated score ("direct prediction using an integrated model").
いずれの方法を用いても、2以上の所望の形質を有する種子の選別が達成できる。 Using either method, selection of seeds with two or more desired traits can be achieved.
また、本発明の一態様において、本発明の方法は、判別モデルの精度を評価する工程をさらに含んでもよい。尚、精度評価とは、選択した判別モデルの「汎化的な性能」を評価するものである。良否判別モデルの精度評価は、具体的には以下のような手法により実施することができる。 In one aspect of the present invention, the method of the present invention may further include a step of evaluating the accuracy of the discrimination model. The accuracy evaluation is an evaluation of the "generalized performance" of the selected discrimination model. Specifically, the accuracy evaluation of the pass/fail discrimination model can be performed by the following method.
上述した工程の実施により得られた判別モデルについて、その導出時に使用した内部データ(即ち、データセットを構築した種子集団)に対して適用した際の内的妥当性検証および当該モデルの導出とは独立に取得した外部データに対して適用した際の外的妥当性検証を実施する。 The discrimination model obtained by carrying out the above-mentioned steps will undergo internal validation when applied to the internal data used in its derivation (i.e., the seed population from which the dataset was constructed) and external validation when applied to external data obtained independently of the derivation of the model.
スコア下限閾の設定値に依存しない、良否判別モデルの全般的な精度の評価には、ROC(Receiver Operating Characteristic)曲線や、PR(Precision-Recall)曲線が用いられ得る。とりわけ、誤判別のうち、良品の棄却(False Negative、偽陰性)よりも不良品の採択(False Positive,偽陽性)を深刻視する品質検査においては、偽陽性に対して鋭敏な、PR曲線が用いられることが多い。 To evaluate the overall accuracy of a pass/fail discrimination model that is not dependent on the set value of the lower score threshold, ROC (Receiver Operating Characteristic) curves and PR (Precision-Recall) curves can be used. In particular, in quality inspections where misclassification, in which the adoption of defective products (false positives) is considered more serious than the rejection of good products (false negatives), PR curves, which are sensitive to false positives, are often used.
ROC曲線、PR曲線のいずれにおいても、AUC(Area Under Curve,曲線下面積)が最大値である1に近い値を取る判別モデルほど優れているとされる。しかし、ROC-AUCでは、最小値が必ず0.5であるのに対し、PR-AUCでは、初期適合率、すなわち事前良品率が最小値となり、適用先のデータセットにより変動する。 In both ROC and PR curves, a discrimination model is considered to be superior when its AUC ( area under the curve ) is closer to the maximum value of 1. However, while the minimum value of ROC-AUC is always 0.5, the minimum value of PR-AUC is the initial precision rate, i.e., the prior quality rate, and this varies depending on the data set to which it is applied.
PR-AUCは、特定のデータセットに対し、数ある判別モデルのいずれが優れた性能を示すかを検証する目的には適しているが、単一の判別モデルが異なるデータセットに対し、同等の性能を発揮するかを検証する目的には適さない。これは、事前良品率に影響を受けてしまうためである。 PR-AUC is suitable for verifying which of a number of discriminant models shows superior performance for a specific data set, but is not suitable for verifying whether a single discriminant model shows equivalent performance for different data sets. This is because it is affected by the prior pass rate.
PR曲線における前記の欠点を補完するため、次式により表される相対適合率(Relative Precision)を定義する。 To compensate for the above-mentioned shortcomings of the PR curve, we define the relative precision, which is expressed by the following formula:
[相対適合率]=(適合率-初期適合率)/(1-初期適合率) [Relative precision] = (precision - initial precision) / (1 - initial precision)
相対適合率と再現率の関係をrPR(relative Precision-Recall)曲線、その曲線下面積をrPR-AUCと称する。rPR-AUCは、適用先のデータセットに依存せず、必ず閉区間[0,1]の値を取る。 The relationship between relative precision and recall is called the rPR (relative Precision-Recall) curve, and the area under the curve is called the rPR-AUC. rPR-AUC does not depend on the dataset to which it is applied, and always takes a value in the closed interval [0, 1].
良否判別においてn個のモデルを使用する場合、適合率や相対適合率と再現率との関係は、n+1次元空間を占める超曲面の形状を成すこととなる。ここでは便宜上、使用する判別モデルの数によらず、適合率及び相対適合率と再現率との関係はPR曲線及びrPR曲線、それらが成す超曲面より下部の比率はPR-AUC及びrPR-AUCと称する。 When n models are used in pass/fail discrimination, the relationship between precision and relative precision and recall forms a hypersurface occupying an n+1-dimensional space. For convenience, the relationships between precision and relative precision and recall are referred to as PR curves and rPR curves, and the ratios below the hypersurface they form are referred to as PR-AUC and rPR-AUC, regardless of the number of discrimination models used.
導出したすべての判別モデルについて、それぞれが対象とする作物種に由来するデータセットに適用した際のPR曲線及びrPR曲線を作成する。また、各曲線からPR-AUC及びrPR-AUCを算出する。作成された曲線の形状やAUC値から、判別モデルの精度評価が達成される。 For all derived discrimination models, PR and rPR curves are created when each is applied to the data set derived from the target crop species. In addition, PR-AUC and rPR-AUC are calculated from each curve. The accuracy of the discrimination model is evaluated from the shape of the created curve and the AUC value.
以下の実施例において本発明を更に具体的に説明するが、本発明はこれらの例によってなんら限定されるものではない。 The present invention will be explained in more detail in the following examples, but the present invention is not limited to these examples.
[出荷要件を満たす品質を備えた野菜種子の判別]
(材料)供試資料として、トキタ種苗株式会社の提供による、2015~2022年産の野菜種子(カボチャ、エンドウ、レタス、ネギ、トマト、カリフラワー)を用いた(表1)。
[Identifying vegetable seeds with quality that meets shipping requirements]
(Materials) Vegetable seeds (pumpkin, pea, lettuce, green onion, tomato, and cauliflower) harvested from 2015 to 2022 provided by Tokita Seed Co., Ltd. were used as test materials (Table 1).
種子に対する近赤外光の照射は、24V 250Wのアルミミラー付ハロゲンランプ(河北ライティングソリューションズ社製JTR24V250W10H/5-AL、GX5.3口金)2灯への直流電圧印加により行い、近赤外ハイパースぺクトル画像は、ラインスキャン型ハイパースぺクトルカメラ(住友電工社製CV-N801HS)を用いて撮影した。近赤外レンズは、焦点距離が30 mmの像側テレセントリックレンズ(住友電工社製)であり、撮影時の作動距離は28 cm、空間分解能は90 ppi、波長サンプリング間隔は6 nmであった。また、使用したカメラの波長感度域は980~2,350 nmであった。各波長における反射率は、コントラストターゲット(米Labsphere社製SRT-MS-050)を構成する、反射率が99%の標準反射板及び暗電流(反射率0%)の撮影画像をもとに、拡散反射率相当に較正した。種子の反射スペクトルは、各種子が占める領域に含まれる画素ごとに記録された反射スペクトルの平均を使用した。 The seeds were irradiated with near-infrared light by applying a DC voltage to two 24V 250W halogen lamps with aluminum mirrors (JTR24V250W10H/5-AL, GX5.3 base, manufactured by Kahoku Lighting Solutions), and near-infrared hyperspectral images were taken using a line-scan hyperspectral camera (CV-N801HS, manufactured by Sumitomo Electric Industries, Ltd.). The near-infrared lens was an image-side telecentric lens (manufactured by Sumitomo Electric Industries, Ltd.) with a focal length of 30 mm. The working distance during imaging was 28 cm, the spatial resolution was 90 ppi, and the wavelength sampling interval was 6 nm. The wavelength sensitivity range of the camera used was 980 to 2,350 nm. The reflectance at each wavelength was calibrated to the equivalent of diffuse reflectance based on the images of a standard reflector with a reflectance of 99% and dark current (reflectance of 0%) that constitute a contrast target (SRT-MS-050, Labsphere, USA). The reflectance spectrum of the seeds was the average of the reflectance spectra recorded for each pixel in the area occupied by each seed.
種子の外観は、白色LED光源(Leimac社製IDBA-HMS150WHV-S)の照射下において、8kカラーラインスキャンカメラ(e2v社製EV71C4CCL8005-BA0)を使用して撮影した。レンズは光学倍率が0.4倍の物体側テレセントリックレンズ(オプトアート社製FT04-150CL)であり、空間分解能は2,032 ppiであった。カメラのホワイトバランス調整には、コントラストターゲット(前記SRT-MS-050)を構成する反射率が25%(種子表面が暗色の場合)又は50%(種子表面が明色の場合)の標準反射板を使用した。 The appearance of the seeds was photographed using an 8k color line scan camera (e2v EV71C4CCL8005-BA0) under illumination by a white LED light source (Leimac IDBA-HMS150WHV-S). The lens was an object-side telecentric lens (Optoart FT04-150CL) with an optical magnification of 0.4x, and the spatial resolution was 2,032 ppi. A standard reflector with a reflectance of 25% (when the seed surface was dark) or 50% (when the seed surface was light) that constitutes the contrast target (SRT-MS-050 mentioned above) was used to adjust the white balance of the camera.
(方法)
I.データセットの構築
(1)種子に対し、ハロゲンランプより光を均一に照射し、近赤外波長域の分光情報を含む近赤外ハイパースペクトル画像を撮影した。種子の一部については、8kラインカメラにより、高解像度の外観画像も併せて撮影した(図1)。
(method)
I. Construction of Data Set (1) Seeds were uniformly irradiated with light from a halogen lamp, and near-infrared hyperspectral images containing spectral information in the near-infrared wavelength range were taken. For some seeds, high-resolution external images were also taken with an 8k line camera (Figure 1).
(2)近赤外ハイパースペクトル画像から、各種子が占める領域の近赤外反射スペクトルを算出した(図2)。 (2) The near-infrared reflectance spectrum of the area occupied by each seed was calculated from the near-infrared hyperspectral image (Figure 2).
(3)前記の種子は、ISTA(International Seed Testing Association)が定める国際種子検査規定に準拠した発芽試験に供し、それぞれの発芽形質を正常発芽、異常発芽、不発芽の3群に区分した。 (3) The seeds were subjected to germination tests in accordance with the international seed testing regulations established by the International Seed Testing Association (ISTA), and their germination characteristics were classified into three groups: normal germination, abnormal germination, and no germination.
(4)カリフラワー 品種47(以下、品種番号は省略する)は、F1雑種として利用されるが、不完全な自家不和合性により、自殖種子が形成される場合がある。それぞれの種子の交配形質がF1、自殖のいずれであったかについては、芽生えを用いたアイソザイム型解析により決定した。 (4) Cauliflower variety 47 (variety number omitted below) is used as an F1 hybrid, but due to incomplete self-incompatibility, self-fertilized seeds may be formed. The hybridization characteristics of each seed, whether F1 or self-fertilized, were determined by isozyme type analysis using seedlings.
(5)良否それぞれの種子にダミー変数を充て、これを後述の判別分析における目的変数とした。ダミー変数の値は、発芽形質の判別では、正常発芽種子を1(良好)、それ以外を-1(不良、以下同様)とした。カリフラワーにおける交配形質の判別では、F1雑種を1、それ以外を-1とした。また、発芽及び交配形質が揃って良種子の要件を満たしているかの判別では、正常発芽能を持つF1雑種を1、それ以外を-1とした。 (5) Dummy variables were assigned to both good and bad seeds, and these were used as the objective variables in the discriminant analysis described below. When discriminating between germination characteristics, the value of the dummy variable was set to 1 (good) for normally germinated seeds and -1 (bad) for others. When discriminating between cauliflower hybridization characteristics, F1 hybrids were assigned a value of 1 and others -1. When discriminating between germination and hybridization characteristics that meet the requirements for good seeds, F1 hybrids with normal germination ability were assigned a value of 1 and others -1.
II.野菜種子の良否判別モデルの導出
(6)種子の近赤外反射スペクトルから、それが所望の形質を備えているか(すべての作物種において正常発芽すること、カリフラワーではさらにF1雑種であること)を予測するための判別分析を行った。
II. Derivation of a model for discriminating between good and bad vegetable seeds (6) A discriminant analysis was performed to predict whether a seed has the desired traits (normal germination for all crop species, and F1 hybrid for cauliflower) from the near-infrared reflectance spectrum of the seed.
(7)判別分析の手順は、特開2023-125301に記載の手法に準拠した。本発明においては、近赤外反射スペクトルの平滑化微分や、異なる微分次数のスペクトルの列結合は省略した(図3)。 (7) The procedure for the discriminant analysis was based on the method described in JP 2023-125301 A. In the present invention, smoothed differentiation of the near-infrared reflectance spectrum and column combination of spectra with different differential orders were omitted (Figure 3).
(8)説明変数となる近赤外反射スペクトル(R)には、逆数(1/R)及び対数(疑似吸光度、-logR)変換のほか、標準正規化(SNV,Standard Normal Variate;有効波長範囲は980~2,200 nm)、SG(Savizky-Golay)フィルタによる平滑化(5点,3次多項式近似)を任意の組み合わせで施し、11通りの派生スペクトルを生成した。原スペクトルを含む計12通りのスペクトルを、説明変数として使用した(図4)。 (8) The near-infrared reflectance spectrum (R), which serves as the explanatory variable, was subjected to reciprocal (1/R) and logarithmic (pseudo absorbance, -logR) transformation, as well as standard normalization (SNV, Standard Normal Variate; effective wavelength range 980-2,200 nm) and smoothing using a Savizky-Golay (SG) filter (five-point, third-order polynomial approximation) in arbitrary combinations to generate 11 derived spectra. A total of 12 spectra, including the original spectrum, were used as explanatory variables (Figure 4).
(9)種子の良否判別モデルは、Adaptive LASSO(Least Absolute Shrinkage and Selection Operator)及びPLS-DA(Partial Least Squares-Discriminant Analysis)を組み合わせた多変量線形スパースモデリングの手法により導出した。この段階では多数の候補モデルが提示されるが、良種子に対し、より高い相対スコアを与えるものを選び出す必要がある。以下にモデル選択の方法を具体的に示す。 (9) A model for discriminating between good and bad seeds was derived using a multivariate linear sparse modeling technique that combines Adaptive LASSO (Least Absolute Shrinkage and Selection Operator) and PLS-DA (Partial Least Squares-Discriminant Analysis). At this stage, many candidate models are presented, but it is necessary to select the one that gives a higher relative score to good seeds. The model selection method is specifically shown below.
III.良否判別モデルの選択
(10)前記において提示された多数の候補の中から、望ましい判別モデルを選択する手法を考案するため、以下の思考実験を行った。
III. Selection of a Good/Bad Discrimination Model (10) In order to devise a method for selecting a desirable discrimination model from among the many candidates presented above, the following thought experiment was carried out.
1)良否それぞれの種子に対し、明確に離れたスコアを与える高精度な判別モデルと、幅広く重なるスコアを与える低精度な判別モデルを仮定する。ここでは、事前良品率を80%とし、良否それぞれの種子に対し、前者が平均1及び-1、標準偏差が共に0.25の正規分布に従うスコアを与え、後者が平均0.05及び-0.05、標準偏差が共に0.25の正規分布に従うスコアを与えるものとする(図5)。 1) Assume a high-precision discrimination model that gives clearly separated scores to good and bad seeds, and a low-precision discrimination model that gives widely overlapping scores. Here, the pre-qualifying rate is set to 80%, and the former gives scores to good and bad seeds that follow a normal distribution with means of 1 and -1 and standard deviations of 0.25, while the latter gives scores to good and bad seeds that follow a normal distribution with means of 0.05 and -0.05 and standard deviations of 0.25 (Figure 5).
2)スコアの降順に種子を回収する選種作業を行うものとする。高精度な判別モデルを使用した場合には、回収率が事前良品率と一致する80%に至るまで、適合率(事後良品率)は100%近傍を維持する(図5A(e))。一方、低精度な判別モデルを使用した場合には、適合率は速やかに事前良品率に収束する(図5B(e))。いずれの場合も、回収率が事前良品率と一致した時点で、適合率と再現率(良種子の回収率)は同値となる。 2) A sorting task is performed in which seeds are collected in descending order of score. When a high-precision discrimination model is used, the precision rate (posterior quality rate) remains close to 100% until the collection rate reaches 80%, which is the same as the prior quality rate (Figure 5A(e)). On the other hand, when a low-precision discrimination model is used, the precision rate quickly converges to the prior quality rate (Figure 5B(e)). In either case, when the collection rate matches the prior quality rate, the precision rate and recall rate (recovery rate of good seeds) become equal.
3)判別モデルを利用した良種子の選別では、全ての種子の計測を終えた後に、スコアが上位を占める種子を回収するのではなく、実用上は予め定めた下限閾を上回るスコアを示す種子を順次回収することとなる。スコアとその順位の関係は容易に求められるため、適合率や再現率を含む判別精度にかかわる各指標と回収率との関係(図5(e))は、判別スコアとの関係(図5(d))に描き替えることができる。 3) When selecting good seeds using a discrimination model, after all the seeds have been measured, the seeds with the highest scores are not collected, but in practice, seeds with scores above a predetermined lower threshold are collected one by one. Since the relationship between the score and its ranking can be easily calculated, the relationship between each index related to discrimination accuracy, including the precision rate and recall rate, and the recovery rate (Figure 5(e)) can be redrawn as the relationship with the discrimination score (Figure 5(d)).
4)ここで、事前良品率と回収率を同値とする選別条件を「標準選別条件」と定義し、且つ、その際に回収の対象となる種子の判別スコアの下限閾を「標準閾値」と定義する。 4) Here, the sorting conditions that make the preliminary non-defective rate and the recovery rate equal are defined as the "standard sorting conditions," and the lower threshold of the discrimination score of the seeds to be recovered at that time is defined as the "standard threshold."
前記より、標準選別条件を用いた選別を行ったときに、適合率(即ち、事後良品率)及び、それと同値となる再現率が、事前良品率を大きく上回ることが、優れた判別モデルの要件であることが分かる。 From the above, it can be seen that a requirement for an excellent discrimination model is that when selection is performed using standard selection conditions, the precision rate (i.e., the ex-post rate of acceptable products) and its equivalent, the recall rate, are significantly higher than the ex-ante rate of acceptable products.
(11)加えて、良種子の選別では、適合率を必要に応じて調整できることが望ましい。すなわち、回収率を下げる(スコアの下限閾を上げる)に従い、適合率を高確度に上げられる関係が成り立っていることがより好ましい。 (11) In addition, when selecting good seeds, it is desirable to be able to adjust the conformance rate as necessary. In other words, it is more desirable to have a relationship in which the conformance rate can be increased with high accuracy as the recovery rate is lowered (the lower threshold score is increased).
(12)以下の表2-1、表2-2及び表2-3に示した各作物種子に対する良否判別モデルは、前記2点を満たすことを要件として選択したものである。また、同表の各モデルにおいて、必要な説明変数として採択された波長帯と、標準偏回帰係数及び変数重要度(VIP,Variable Importance in Projection)との関係を図6に示した。 (12) The quality discrimination models for each crop seed shown in Tables 2-1, 2-2, and 2-3 below were selected based on the requirement that they satisfy the above two points. Figure 6 shows the relationship between the wavelength bands adopted as necessary explanatory variables in each model in the same table and the standard partial regression coefficients and variable importance in projection (VIP).
IV.複数の判別スコアの統合処理
複数の形質が評価される実施形態をこれ以降説明する。
IV. Combined Processing of Multiple Discriminant Scores Embodiments in which multiple traits are assessed are described hereafter.
(13)カリフラワーにおける良種子を(1)正常な発芽能を有し、且つ(2)F1雑種と定義したとき、カリフラワーの良種子を判別するには、以下2通りの方法を採用し得る。 (13) When good cauliflower seeds are defined as those that (1) have normal germination ability and (2) are F1 hybrids, the following two methods can be used to distinguish good cauliflower seeds.
方法1)発芽及び交配形質における良否を、2つのモデル(m-Ca47g及びm-Ca47x)により個別にスコア化し、両スコアが共に上位のものを良種子と予測する。方法2)個々の形質における良否は顧みず、良種子であるかどうかのみを予測する(m-Ca47)。 Method 1) The quality of germination and mating characteristics are scored individually using two models (m-Ca47g and m-Ca47x), and seeds with the highest scores are predicted to be good seeds. Method 2) The quality of individual characteristics is ignored, and only good seeds are predicted (m-Ca47).
(14)方法1)において、希望の判別精度を得るためには、2つのモデルに対してスコアの下限閾を設定する必要がある。n個の判別モデルを使用する場合、スコア下限閾の設定値と判別精度にかかわる各指標との関係は、n+1次元の超曲面の形状を成すこととなる。本例のようにn=2の場合は、各指標は曲面をなすが、ヒートマップとして平面的に描くこともできる(図7)。しかし、本方法は、方法2)に比べ、運用の簡便性は著しく劣る。 (14) In method 1), to obtain the desired discrimination accuracy, it is necessary to set a lower score threshold for the two models. When n discrimination models are used, the relationship between the setting value of the lower score threshold and each index related to discrimination accuracy forms the shape of an n+1-dimensional hypersurface. When n=2 as in this example, each index forms a curved surface, but it can also be drawn two-dimensionally as a heat map (Figure 7). However, this method is significantly less easy to operate than method 2).
(15)複数の判別モデルのスコアを統合し、設定すべき下限閾を1つで済むようにするため、以下の数理的処理を案出した。 (15) In order to integrate the scores of multiple discriminant models and set only one lower threshold, we devised the following mathematical process.
1)形質1~kの良否を判別するモデルをM1~Mk、これらに説明変数Xを代入して得られる判別スコアをy1(X)~yk(X)とする。 1) Let M1 through Mk be models that discriminate between good and bad traits 1 through k, and y1(X) through yk(X) be the discriminant scores obtained by substituting explanatory variable X into these models.
2)特定のロットを構成する十分数n(数百程度)の種子を計測し、それぞれに対応する説明変数x(x1~xn)から得られるスコアy1(x)~yk(x)を求める。形質ごとの事前良品率がp1~pkと想定される場合、y(y1~yk)の上位からn×p(p1~pk)番目のスコアをyp(yp1~ypk)とする。また、y(y1~yk)の標準偏差をs(s1~sk)とする。 2) A sufficient number n (several hundred) of seeds that make up a particular lot are measured, and scores y1(x) to yk(x) are obtained from the explanatory variables x (x1 to xn) that correspond to each. If the pre-qualification rates for each trait are assumed to be p1 to pk, then the n×p (p1 to pk)-th score from the top of y (y1 to yk) is defined as yp (yp1 to ypk). In addition, the standard deviation of y (y1 to yk) is defined as s (s1 to sk).
3)スコアy(y1~yk)を次式によりスコアz(z1~zk)に変換する。
z=(y-yp)/s
3) Score y (y1 to yk) is converted to score z (z1 to zk) using the following formula:
z=(y-yp)/s
4)スコアz(z1~zk)の最小値を、M1~Mkを使用した判別における統合スコアとする。 4) The minimum value of score z (z1 to zk) is the combined score for discrimination using M1 to Mk.
5)統合スコアの順位に基づき、各種子の良否を判別する。統合スコアを用いた判別においても、標準条件及び標準閾値を決定できる。 5) The quality of each seed is judged based on the ranking of the integrated score. Standard conditions and standard thresholds can also be determined when using the integrated score for judgment.
この処理をm-Ca47g及びm-Ca47xに適用し、モデルm-Ca47gxを生成した(表3-1及び表3-2)。 This process was applied to m-Ca47g and m-Ca47x to generate model m-Ca47gx (Table 3-1 and Table 3-2).
V.良否判別モデルの精度評価
(16)選択した全ての判別モデルについて、全種子又は良否別種子の判別スコアの分布域及び、スコア下限閾の設定値又は種子回収率と判別精度にかかわる各指標との関係を、図8及び図9に示した。図の構成は図5と共通である。
V. Accuracy evaluation of the good/bad discrimination model (16) For all selected discrimination models, the distribution area of the discrimination scores for all seeds or good/bad seeds, and the relationship between the set value of the score lower threshold or the seed recovery rate and each index related to the discrimination accuracy are shown in Figures 8 and 9. The configuration of the figures is the same as that of Figure 5.
(17)図8及び図9はそれぞれ、各判別モデルを、その導出時に使用した内部データに対して適用した際の内的妥当性検証及び、モデル導出とは独立に取得した外部データに対して適用した際の外的妥当性検証の結果に相当する。図9はすなわち、モデルの汎化性能の評価結果である。 (17) Figures 8 and 9 respectively correspond to the results of internal validity verification when each discriminant model is applied to the internal data used in its derivation, and external validity verification when it is applied to external data obtained independently of the model derivation. Figure 9 shows the evaluation results of the generalization performance of the model.
(18)スコア下限閾の設定値に依存しない、良否判別モデルの全般的な精度の評価には、ROC(Receiver Operating Characteristic)曲線や、PR(Precision-Recall)曲線が用いられることが多い。とりわけ、誤判別のうち、良品の棄却(False Negative,偽陰性)よりも不良品の採択(False Positive,偽陽性)を深刻視する品質検査においては、偽陽性に対して鋭敏な、PR曲線が用いられることが多い。 (18) ROC (Receiver Operating Characteristic) curves and PR (Precision-Recall) curves are often used to evaluate the overall accuracy of a pass/fail discrimination model that is independent of the set value of the lower score threshold. In particular, in quality inspections in which the adoption of defective products (false positives) is considered more serious than the rejection of good products (false negatives), the PR curve, which is sensitive to false positives, is often used.
(19)ROC曲線、PR曲線のいずれにおいても、AUC(Area Under Curve,曲線下面積)が最大値である1に近い値を取る判別モデルほど優れているとされる。しかし、ROC-AUCでは、最小値が必ず0.5であるのに対し、PR-AUCでは、初期適合率、すなわち事前良品率が最小値となり、適用先のデータセットにより変動する。 (19) In both the ROC curve and the PR curve, a discrimination model with an AUC ( area under the curve) value closer to 1, which is the maximum value, is considered to be superior. However, while the minimum value of the ROC-AUC is always 0.5, the minimum value of the PR-AUC is the initial precision rate, i.e., the prior quality rate, and varies depending on the data set to which it is applied.
(20)PR-AUCは、特定のデータセットに対し、数ある判別モデルのいずれが優れた性能を示すかを検証する目的には適しているが、単一の判別モデルが異なるデータセットに対し、同等の性能を発揮するかを検証する目的には適さない。これは、事前良品率に影響を受けてしまうためである。 (20) PR-AUC is suitable for verifying which of a number of discriminant models shows superior performance for a specific data set, but is not suitable for verifying whether a single discriminant model shows equivalent performance for different data sets. This is because it is affected by the prior pass rate.
(21)PR曲線における前記の欠点を補完するため、次式により表される相対適合率(Relative Precision)を定義する。 (21) To compensate for the above-mentioned shortcomings of the PR curve, we define the relative precision, which is expressed by the following formula:
[相対適合率]=(適合率-初期適合率)/(1-初期適合率) [Relative precision] = (precision - initial precision) / (1 - initial precision)
(22)相対適合率と再現率の関係をrPR(relative Precision-Recall)曲線、その曲線下面積をrPR-AUCと称する。rPR-AUCは、適用先のデータセットに依存せず、必ず閉区間[0,1]の値を取る。 (22) The relationship between relative precision and recall is called the rPR (relative Precision-Recall) curve, and the area under the curve is called the rPR-AUC. The rPR-AUC does not depend on the dataset to which it is applied, and always takes a value in the closed interval [0, 1].
(23)良否判別においてn個のモデルを使用する場合、適合率や相対適合率と再現率との関係も、n+1次元の超曲面の形状を成すこととなる。ここでは便宜上、使用する判別モデルの数によらず、適合率及び相対適合率と再現率との関係はPR曲線及びrPR曲線、それらが成す超曲面より下部の比率はPR-AUC及びrPR-AUCと称する。 (23) When n models are used in pass/fail discrimination, the relationship between the precision rate or relative precision rate and recall rate also forms an n+1-dimensional hypersurface shape. For convenience, the relationships between the precision rate and relative precision rate and recall rate are referred to as the PR curve and rPR curve, and the ratios below the hypersurface they form are referred to as the PR-AUC and rPR-AUC, regardless of the number of discrimination models used.
(24)導出したすべての判別モデルについて、それぞれが対象とする作物種に由来するデータセットに適用した際のPR曲線(図10A及び図11)及びrPR曲線(図10B及び図12)を示した。また、各曲線から算出されたPR-AUC及びrPR-AUCを表3-1及び表3-2に併記した。 (24) For all derived discrimination models, the PR curves (Figures 10A and 11) and rPR curves (Figures 10B and 12) are shown when applied to the data sets derived from the target crop species. In addition, the PR-AUC and rPR-AUC calculated from each curve are shown in Tables 3-1 and 3-2.
VI.判別モデルに基づく良種子の選別
(25)種子を撮影した近赤外ハイパースペクトル画像に対し、表2に示した判別モデルによるスコアを、画素又は種子が占有する領域ごとに計算し、疑似色又は輝度値に変換して可視化するソフトウェアを作成した(図13)。
VI. Selection of good seeds based on a discrimination model (25) For near-infrared hyperspectral images of seeds, we created software that calculates the scores according to the discrimination model shown in Table 2 for each pixel or area occupied by the seed, converts them into pseudocolors or brightness values, and visualizes them (FIG. 13).
(26)前記のソフトウェアを使用し、発芽又は交配形質にかかわる判別スコアが、上位、下位を占める種子をそれぞれ回収した。各区分について、種子の発芽及び初期成長を評価すると共に(図14)、カリフラワーについてはアイソザイム型を解析し、事後良品率(適合率)を求めた(表4)。 (26) Using the above software, seeds with high and low discrimination scores related to germination or mating characteristics were collected. For each category, seed germination and early growth were evaluated (Figure 14), and the isozyme type of cauliflower was analyzed to determine the ex-post quality rate (suitability rate) (Table 4).
(結果・考察)
(1)いずれの作物種においても、発芽試験後に明らかとなった種子の良否を、発芽前の外観及び近赤外反射スペクトルの概形から、主観的に検知することは不可能であった(図1及び図2)。
(Results and Discussion)
(1) For all crop species, it was impossible to subjectively determine the quality of the seeds after the germination test from their pre-germination appearance and near-infrared reflectance spectrum (Figures 1 and 2).
(2)導出した判別モデルを、その対象品種に適用した場合、スコアの分布域は例外なく、良種子から不良種子に向けて、高い側から低い側の順序となることが確認された(図8並びに図9(b)及び(c))。 (2) When the derived discrimination model was applied to the target varieties, it was confirmed that the distribution of scores, without exception, was in the order from high to low, from good seeds to bad seeds (Figure 8 and Figures 9(b) and (c)).
(3)発芽形質を予測するモデルにおいては、異常発芽種子が示すスコアの分布域は、正常発芽及び不発芽種子が示すスコアの中間に位置しており(図8A~H並びに図9A~F(b)及び(c))、判別スコアは、概ね種子に備わる成長力を反映しているものと推察された。 (3) In the model predicting germination traits, the distribution range of scores shown by abnormally germinated seeds was located between the scores shown by normally germinated and non-germinated seeds (Figures 8A-H and Figures 9A-F (b) and (c)), and it was inferred that the discrimination scores generally reflect the growth potential of the seeds.
(4)判別スコアの分布域は、良種子と不良種子の間で重複が生じており、いずれの作物種においても、誤判別を避けることはできなかった。また、トマトの品種94、221の間では、判別モデルの互換性は確認されたが、同品種に適用した場合と比べて、判別精度が劣ることが示唆された(表3並びに図12H及びI)。ネギにおいては、品種22と51の間で、判別モデルの共用は実用的でないと判断された(表3並びに図12F及びG)。 (4) The distribution range of the discrimination scores overlapped between good and bad seeds, and misclassification could not be avoided for any of the crop species. In addition, compatibility of the discrimination model was confirmed between tomato varieties 94 and 221, but it was suggested that the discrimination accuracy was inferior compared to when applied to the same variety (Table 3 and Figures 12H and I). In the case of leeks, it was determined that sharing the discrimination model between varieties 22 and 51 was not practical (Table 3 and Figures 12F and G).
(5)判別スコアの分布域は、同品種かつ同ロットの種子を使用した場合でも、試験の機会が異なれば同一とはならなかった(図8、図9(a)~(c);トマト 品種94は、図8と図9で使用ロットが異なる)。導出した判別モデルは、少なくとも同品種に対し、繰り返し適用できるものであったが、良種子を選別する際のスコアの下限閾は、機会ごとに最適化することが望まれた。 (5) The distribution range of the discrimination scores was not the same on different test occasions, even when the same variety and lot of seeds were used (Fig. 8, Fig. 9 (a) to (c)); different lots of tomato variety 94 were used in Fig. 8 and Fig. 9). The derived discrimination model was capable of repeated application, at least to the same variety, but it was desirable to optimize the lower threshold score for selecting good seeds for each occasion.
(6)無作為に抽出した種子から得られた近赤外スペクトルに判別モデルを適用し、スコアを算出した時点では、個々の種子の良否は不明である。また、スコアの密度分布は特徴のないひと山型を示すのみであり(図9C、E-I(a);A、B、Dは無作為抽出ではない)、この形状から希望に適った選別を行うためのスコアの下限閾を推定することは困難である。一方、推定される事前良品率と同率の種子を回収するための標準閾値は決定可能であり、これを標準的な選別条件(標準条件)と定めることは合理的である。よって、使用する種子の事前良品率は、可能な限り正確に把握しておくことが望まれる。 (6) When the discrimination model is applied to the near-infrared spectra obtained from randomly selected seeds and the scores are calculated, the quality of each seed is unknown. In addition, the density distribution of the scores shows only a featureless single peak (Figure 9C, E-I(a); A, B, and D are not randomly selected), and it is difficult to estimate the lower threshold score for desired sorting from this shape. On the other hand, it is possible to determine a standard threshold for recovering seeds with the same rate as the estimated prior quality rate, and it is reasonable to define this as the standard sorting condition (standard condition). Therefore, it is desirable to understand the prior quality rate of the seeds to be used as accurately as possible.
(7)前記の標準条件よりも事後良品率(適合率)を上げたい場合には、判別スコアの下限閾を上げ、再現率(良種子の回収率)を上げたい場合には、スコアの下限閾を下げれば良い。 (7) If you want to increase the ex-post quality rate (matching rate) more than under the standard conditions, you can increase the lower threshold of the discrimination score. If you want to increase the recall rate (recovery rate of good seeds), you can lower the lower threshold of the score.
(8)前記において、判別スコアの下限閾をどの程度変更すれば、適合率や再現率にどの程度の影響が及ぶのかを推定することも容易ではない。スコアから標準閾値を減じ、さらにスコアの標準偏差で除算する変換は、変数の標準化(正規化)と同じく、スコアのばらつきを一律にできる利点をもたらす。このことは、単一のモデルにより良種子を選別する場合においても有効である。 (8) In the above, it is not easy to estimate how much changing the lower threshold of the discriminant score will affect the precision rate and recall rate. The transformation of subtracting a standard threshold from the score and then dividing by the standard deviation of the score has the advantage of making the variance in the scores uniform, just like standardizing (normalizing) variables. This is also effective when selecting good seeds using a single model.
(9)前記のスコア変換により、複数の判別モデルを使用する良種子選別においても、調整すべき閾値を一つに統合することができる。 (9) The score conversion described above makes it possible to consolidate the thresholds to be adjusted into one even when selecting good seeds using multiple discrimination models.
(10)カリフラワーでは、良種子である正常発芽能を備えたF1雑種を予測するに際し、良否を直接的に判別する単一モデルを使用するよりも、発芽及び交配形質を個別に予測するモデルを組み合わせて使用した場合に、より高い判別精度が認められた(表3並びに図12L及びM)。 (10) In cauliflower, when predicting F1 hybrids with normal germination ability, which are good seeds, higher discrimination accuracy was observed when a combination of models predicting germination and mating characteristics individually was used rather than using a single model that directly discriminated between good and bad seeds (Table 3 and Figures 12L and M).
(11)発芽形質を予測するモデルを適用し、判別スコアが上位、下位を占める種子について、発芽及び初期成長を比較したところ、スコアが上位の種子は、発芽率が高いのみならず、旺盛な発芽後成長を示す傾向が認められた(図14)。導出した良否判別モデルの性能がやや劣っていたレタスにおいても傾向は同じであった。このことからも、発芽形質に対する判別スコアは、種子に備わる成長力を反映しているものと推察された。 (11) When a model predicting germination traits was applied and the germination and early growth of seeds with high and low discrimination scores were compared, it was found that the seeds with high scores not only had a high germination rate but also tended to show vigorous growth after germination (Figure 14). The same tendency was observed in lettuce, for which the derived quality discrimination model performed somewhat poorly. From this, it was inferred that the discrimination scores for germination traits reflect the growth potential of the seeds.
(12)標準選別条件における適合率(事後良品率と同義であり、再現率と一致する)が事前良品率を十分に上回ることを確認することにより、多数の候補判別モデルから、好ましいモデルを効率的に絞り込むことができた(表3及び図3)。さらに、相対適合率を定義し、これと再現率との関係を表すrPR曲線の曲線下面積(rPR-AUC)を算出することにより、モデルとデータセットの組み合わせにかかわりなく、判別精度を客観的に比較することが可能となった(表3及び図12)。 (12) By confirming that the precision rate (synonymous with the ex-post precision rate and identical to the recall rate) under standard selection conditions was sufficiently higher than the ex-ante precision rate, it was possible to efficiently narrow down the number of candidate discrimination models to a preferred model (Table 3 and Figure 3). Furthermore, by defining the relative precision rate and calculating the area under the rPR curve (rPR-AUC), which represents the relationship between this and the recall rate, it became possible to objectively compare the discrimination accuracy regardless of the combination of model and dataset (Table 3 and Figure 12).
(13)本発明技術では、種子の品質評価試験の結果を受けてから概ね1日以内には、その良否判別の可否が明らかとなり、可能な場合は具体的な判別モデルを得ることができる。よって、不良種子の発生が当該品種の遺伝的性質等に起因しない偶発的なものであったとしても、出荷基準に足る良品率を回復させるための選種作業を、問題の発覚から、程なく開始することができる。 (13) With the technology of the present invention, it is possible to determine whether the seeds are good or bad within about one day of receiving the results of the seed quality evaluation test, and if possible, a specific discrimination model can be obtained. Therefore, even if the occurrence of defective seeds is accidental and not caused by the genetic properties of the variety, etc., it is possible to start the selection process to restore the good seed rate to a level sufficient for shipping standards shortly after the problem is discovered.
(14)前記を可能とするためには、品質評価試験の結果と紐づけできる、種子の近赤外ハイパースペクトル画像を予め取得しておく必要がある。 (14) To make the above possible, it is necessary to obtain near-infrared hyperspectral images of the seeds in advance that can be linked to the results of quality evaluation tests.
(15)先述の通り、判別モデルの導出には、多大な時間や労力を要するわけではない。特定の作物品種に適用したモデルを、別の機会に再利用することを考慮して、精度向上のための更新を続けることも可能であるが、不良種子の発生事由は機会ごとに異なる場合が多い。様々な作物品種に対応した判別モデルのライブラリを充実させるだけでなく、問題の発生機会ごとに最適な良否判別モデルを導出し、解決に向けて即応できる種子の品質管理体制を整えることが重要である。 (15) As mentioned above, deriving a discrimination model does not require a great deal of time or effort. It is possible to continue updating a model applied to a specific crop variety to improve its accuracy, with the intention of reusing it on another occasion; however, the reasons for the occurrence of defective seeds often differ from occasion to occasion. It is important not only to enrich the library of discrimination models compatible with various crop varieties, but also to establish a seed quality control system that can derive the optimal good/bad discrimination model for each occasion when a problem occurs and respond immediately to resolve it.
(16)近赤外ハイパースペクトル画像は一度に取得できる一方で、そこから得られるデータは、種子の発芽特性や遺伝子型といった、複数の形質の判別に使用することができる。同画像は、粒径に関する情報を含む上、粒重の推定にも有効である。本発明は、機能の異なる複数の選別機を導入、設置することなく、低コストかつ省スペースに、高度な種子選別を実施する手段を与えるものである。 (16) Near-infrared hyperspectral images can be acquired all at once, and the data obtained from them can be used to distinguish multiple traits, such as seed germination characteristics and genotype. The images contain information about grain size and are also effective in estimating grain weight. The present invention provides a means of performing advanced seed sorting at low cost and in a small space, without introducing and installing multiple sorting machines with different functions.
本発明によれば、所望の形質を有する植物種子と、所望の形質を有さない植物種子の集団から、非破壊的、高効率、且つ高精度に所望の形質を有する植物種子を選別することができる。従って、本発明は、例えば、種苗産業において極めて有用である。 According to the present invention, it is possible to select plant seeds having desired traits from a population of plant seeds having and not having the desired traits non-destructively, highly efficiently, and with high accuracy. Therefore, the present invention is extremely useful, for example, in the seed and seedling industry.
本出願は、日本で出願された特願2023-089036(出願日:2023年5月30日)を基礎としており、その内容は本明細書に全て包含されるものである。 This application is based on patent application No. 2023-089036 filed in Japan (filing date: May 30, 2023), the contents of which are incorporated in their entirety into this specification.
Claims (6)
(第1工程)種子集団から一部の種子を取り出す工程、
(第2工程)取り出された種子に対するデータセットを構築する工程であって、該データセットを構築する工程は以下(a)~(c)を含み:
(a)取り出された種子に対して光を照射してスペクトルデータおよび/またはその派生スペクトルデータを取得すること、
(b)スペクトルデータを取得した種子を所望の形質に対する品質検査に供し、該取り出された種子の事前良品率を決定すること、
ここで、事前良品率は次式で定義される:
[事前良品率(%)]=[品質評価に供された種子のうち、所望の形質を有していた種子数]/[品質評価に供された種子数]×100
(c)各種子のスペクトルデータと品質検査の結果とを対応させること、
(第3工程)第2工程で構築されたデータセットに対して多変量判別分析を適用することにより、各種子における所望の形質に対する判別スコアを算出するための良否判別モデルの候補を導出する工程、
(第4工程)回収率および適合率を次式で定義し:
[回収率(%)]=[品質評価に供された種子のうち、判別スコアに基づいて選択される種子数]/[品質評価に供された種子数]×100
[適合率(%)]=[品質評価に供された種子であって、判別スコアに基づいて選択される種子のうち、所望の形質を有している種子数]/[品質評価に供された種子のうち、判別スコアに基づいて選択される種子数]×100
かつ、事前良品率と回収率を同値とする選別条件を「標準選別条件」と定義するとき、
該標準選別条件下で、適合率が事前良品率を上回るとの基準を満たす候補を良否判別モデルとして選択することを含む、第3工程で導出された良否判別モデルの候補から判別モデルを選択する工程、
(第5工程)第1工程で種子集団から取り出されなかった残りの種子、および/または該種子と実質的に同一の条件下で得られた別の種子に対して光を照射してスペクトルデータを取得し、該スペクトルデータに第4工程で選択された判別モデルを適用することで各種子の判別スコアを決定する工程、
(第6工程)第5工程で決定された判別スコアに基づいて、判別スコアの閾値を決定する工程、
(第7工程)第5工程で決定された各種子の判別スコアと第6工程で決定された判別スコアの閾値とを比較することで、所望の形質を有すると予測される種子を回収する工程。 A method for selecting a plant seed having a desired trait, comprising the following steps 1 to 7:
(Step 1) Removing a portion of seeds from a seed population;
(a) constructing a data set for the extracted seeds, the data set constructing step including:
(a) irradiating the extracted seeds with light to obtain spectral data and/or derived spectral data;
(b) subjecting the seeds from which the spectral data has been acquired to a quality test for a desired trait and determining a preliminary yield rate of the extracted seeds;
Here, the preliminary yield rate is defined as follows:
[Preliminary quality rate (%)] = [Number of seeds having the desired characteristics among the seeds subjected to quality evaluation] / [Number of seeds subjected to quality evaluation] × 100
(c) Correlating the spectral data of each seed with the results of quality testing;
(Step 3) applying multivariate discriminant analysis to the data set constructed in Step 2 to derive candidates for a good/bad discrimination model for calculating a discriminant score for a desired trait in each seed;
(4th step) The recovery rate and the conformity rate are defined by the following formula:
[Recovery rate (%)] = [Number of seeds selected based on the discriminant score among the seeds subjected to the quality evaluation] / [Number of seeds subjected to the quality evaluation] × 100
[Conformity rate (%)] = [Number of seeds having desired characteristics among the seeds subjected to the quality evaluation and selected based on the discriminant score] / [Number of seeds selected based on the discriminant score among the seeds subjected to the quality evaluation] × 100
In addition, when the selection conditions that make the preliminary non-defective rate and the recovery rate equal are defined as "standard selection conditions",
a step of selecting a discrimination model from the candidates of the quality discrimination model derived in the third step, the step including selecting, as a quality discrimination model, a candidate that satisfies a criterion that the matching rate exceeds a preliminary quality rate under the standard screening conditions;
(Step 5) A step of irradiating the remaining seeds not removed from the seed population in Step 1 and/or other seeds obtained under substantially the same conditions as those of the seeds with light to obtain spectral data, and applying the discrimination model selected in Step 4 to the spectral data to determine a discrimination score for each seed;
(Sixth step) determining a threshold value of the discriminant score based on the discriminant score determined in the fifth step;
(Step 7) A step of recovering seeds predicted to have desired traits by comparing the discrimination scores of each seed determined in step 5 with the discrimination score threshold determined in step 6.
(i)形質ごとの良否に着目して形質ごとの判別モデルにより個別に種子をスコア化し、すべてのスコアが閾値以上のものを良種子と予測する方法、
(ii)形質ごとの良否は顧みず、所望の形質のすべてが良好である場合を良好とし、それ以外の場合は不良とすることで単一の判別モデルを選択し、当該単一の判別モデルにより種子をスコア化し、スコアが閾値以上のものを良種子と予測する方法、
(iii)形質ごとの良否に着目して形質ごとの判別モデルを選択した後に、当該形質ごとの判別モデルを用いて得られたスコアを統合して統合スコアとし、該統合スコアの順位に基づき、各種子の良否を予測する方法。 The method according to claim 1 or 2, wherein the prediction of two or more desired traits is carried out by any one of the following methods (i) to (iii):
(i) A method of scoring seeds individually using a discrimination model for each trait, focusing on the quality of each trait, and predicting that seeds with all scores equal to or above a threshold are good seeds;
(ii) A method of selecting a single discrimination model by determining that a seed is good if all of the desired traits are good, and determining that the other traits are bad, regardless of whether each trait is good or bad, scoring the seeds using the single discrimination model, and predicting that seeds with a score equal to or higher than a threshold are good seeds;
(iii) A method in which a discrimination model for each trait is selected by focusing on the quality of each trait, the scores obtained using the discrimination model for each trait are integrated to obtain an integrated score, and the quality of each seed is predicted based on the ranking of the integrated score.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2023-089036 | 2023-05-30 | ||
| JP2023089036 | 2023-05-30 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024248014A1 true WO2024248014A1 (en) | 2024-12-05 |
Family
ID=93657930
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2024/019621 Pending WO2024248014A1 (en) | 2023-05-30 | 2024-05-29 | Method for sorting seeds |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2024248014A1 (en) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2005111583A1 (en) * | 2004-05-17 | 2005-11-24 | The New Industry Research Organization | Method for nondestructively examining component of vegetable or the like by near-infrared spectroscopy and its device |
| WO2016084452A1 (en) * | 2014-11-28 | 2016-06-02 | 住友林業株式会社 | Tree seed selecting method using near infrared light |
| KR102112088B1 (en) * | 2018-12-31 | 2020-05-18 | 충남대학교산학협력단 | Method for selecting cypress tree superior seed using hyperspectral image technology |
| WO2022175309A1 (en) * | 2021-02-17 | 2022-08-25 | KWS SAAT SE & Co. KGaA | Methods for analyzing plant material, for determining plant material components and for detecting plant diseases in plant material |
-
2024
- 2024-05-29 WO PCT/JP2024/019621 patent/WO2024248014A1/en active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2005111583A1 (en) * | 2004-05-17 | 2005-11-24 | The New Industry Research Organization | Method for nondestructively examining component of vegetable or the like by near-infrared spectroscopy and its device |
| WO2016084452A1 (en) * | 2014-11-28 | 2016-06-02 | 住友林業株式会社 | Tree seed selecting method using near infrared light |
| KR102112088B1 (en) * | 2018-12-31 | 2020-05-18 | 충남대학교산학협력단 | Method for selecting cypress tree superior seed using hyperspectral image technology |
| WO2022175309A1 (en) * | 2021-02-17 | 2022-08-25 | KWS SAAT SE & Co. KGaA | Methods for analyzing plant material, for determining plant material components and for detecting plant diseases in plant material |
Non-Patent Citations (1)
| Title |
|---|
| TIGABU MULUALEM, DANESHVAR ABOLFAZL, JINGJING REN, WU PENGFEI, MA XIANGQING, ODÉN PER CHRISTER: "Multivariate Discriminant Analysis of Single Seed Near Infrared Spectra for Sorting Dead-Filled and Viable Seeds of Three Pine Species: Does One Model Fit All Species?", FORESTS, MOLECULAR DIVERSITY PRESERVATION INTERNATIONAL (MDPI) AG., vol. 10, no. 6, pages 469, XP093244245, ISSN: 1999-4907, DOI: 10.3390/f10060469 * |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Sun et al. | Hyperspectral imaging detection of decayed honey peaches based on their chlorophyll content | |
| Blasco et al. | Citrus sorting by identification of the most common defects using multispectral computer vision | |
| Liu et al. | Discriminating and elimination of damaged soybean seeds based on image characteristics | |
| Gomes et al. | Applications of computer vision techniques in the agriculture and food industry: a review | |
| Wang et al. | Detection of external insect infestations in jujube fruit using hyperspectral reflectance imaging | |
| Moscetti et al. | Nondestructive detection of insect infested chestnuts based on NIR spectroscopy | |
| Lleó et al. | Comparison of multispectral indexes extracted from hyperspectral images for the assessment of fruit ripening | |
| RU2288461C2 (en) | Methods and devices for analyzing samples of agricultural product | |
| CN109470648B (en) | A fast and nondestructive determination method for imperfect grains of single-grain crops | |
| Hong et al. | Nondestructive prediction of pepper seed viability using single and fusion information of hyperspectral and X-ray images | |
| Saha et al. | Application of near-infrared hyperspectral imaging coupled with chemometrics for rapid and non-destructive prediction of protein content in single chickpea seed | |
| CN106290238A (en) | A kind of apple variety method for quick identification based on high light spectrum image-forming | |
| Kavdir et al. | Apple sorting using artificial neural networks and spectral imaging | |
| Wang et al. | Nondestructive detection of internal insect infestation in jujubes using visible and near-infrared spectroscopy | |
| Zhang et al. | Detection of seed purity of hybrid wheat using reflectance and transmittance hyperspectral imaging technology | |
| US10690592B2 (en) | Haploid seed classification using single seed near-infrared spectroscopy | |
| Sendin et al. | Hierarchical classification pathway for white maize, defect and foreign material classification using spectral imaging | |
| Francis et al. | Development of a unified framework of low-rank approximation and deep neural networks for predicting the spatial variability of SSC inSpania'watermelons using vis/NIR hyperspectral imaging | |
| Meng et al. | Visualisation of moisture content distribution maps and classification of freshness level of loquats | |
| Xing et al. | Detecting internal insect infestation in tart cherry using transmittance spectroscopy | |
| Lee | Plant health detection and monitoring | |
| Nugroho et al. | Determination of green and red spinach microgreen chlorophyll content using visible spectroscopy and wavelength selection | |
| Hong et al. | Nondestructive prediction of rice seed viability using spectral and spatial information modeling of visible–near infrared hyperspectral images | |
| WO2024248014A1 (en) | Method for sorting seeds | |
| Welle et al. | Application of near infrared spectroscopy on-combine in corn grain breeding |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24815496 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2025524117 Country of ref document: JP Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: CN2024800363543 Country of ref document: CN |