WO2018014002A1 - Systèmes et procédés destinés à faciliter la recherche génétique - Google Patents
Systèmes et procédés destinés à faciliter la recherche génétique Download PDFInfo
- Publication number
- WO2018014002A1 WO2018014002A1 PCT/US2017/042265 US2017042265W WO2018014002A1 WO 2018014002 A1 WO2018014002 A1 WO 2018014002A1 US 2017042265 W US2017042265 W US 2017042265W WO 2018014002 A1 WO2018014002 A1 WO 2018014002A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- spores
- tetrad
- dye
- tetrads
- markers
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N1/00—Sampling; Preparing specimens for investigation
- G01N1/28—Preparing specimens for investigation including physical details of (bio-)chemical methods covered elsewhere, e.g. G01N33/50, C12Q
- G01N1/30—Staining; Impregnating ; Fixation; Dehydration; Multistep processes for preparing samples of tissue, cell or nucleic acid material and the like for analysis
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N15/00—Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
- G01N15/10—Investigating individual particles
- G01N15/14—Optical investigation techniques, e.g. flow cytometry
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N15/00—Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
- G01N15/10—Investigating individual particles
- G01N15/14—Optical investigation techniques, e.g. flow cytometry
- G01N15/1434—Optical arrangements
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/40—Population genetics; Linkage disequilibrium
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16Z—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS, NOT OTHERWISE PROVIDED FOR
- G16Z99/00—Subject matter not provided for in other main groups of this subclass
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N15/00—Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
- G01N15/10—Investigating individual particles
- G01N15/14—Optical investigation techniques, e.g. flow cytometry
- G01N15/149—Optical investigation techniques, e.g. flow cytometry specially adapted for sorting particles, e.g. by their size or optical properties
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N1/00—Sampling; Preparing specimens for investigation
- G01N1/28—Preparing specimens for investigation including physical details of (bio-)chemical methods covered elsewhere, e.g. G01N33/50, C12Q
- G01N1/30—Staining; Impregnating ; Fixation; Dehydration; Multistep processes for preparing samples of tissue, cell or nucleic acid material and the like for analysis
- G01N2001/302—Stain compositions
Definitions
- the current disclosure provides systems and methods that facilitate genetic research.
- the systems and methods can utilize (1) fluorescent dyes to sort tetrads from vegetative cells, dyads and dead cells; (2) patterns of natural genetic sequences to capture tetrad relationships of recombinant progeny; and/or (3) markers in parental organisms to identify genetic recombination events in offspring in genomic regions of interest.
- Meiotic mapping a linkage-based method for analyzing the recombinant progeny of a mating cross.
- the method is possible in a wide range of eukaryotes, including genetically facile yeasts and less tractable microorganisms, such as the filamentous fungus Neurospora crassa and the unicellular green alga Chlamydomonas reinhardtii.
- the approach is enabled by tetrad disruption (e.g., dissection), a technique for isolating and cultivating all of the four spores (i.e., meiotic progeny) derived from an individual tetrad.
- the throughput of the process has historically been limited by the need to isolate tetrads out of a heterogeneous population of tetrads, vegetative cells, dyads and dead cells followed by manual separation or dissection of the spores contained in a tetrad.
- the process is time-consuming even for experienced researchers with access to specialized equipment.
- methods to isolate and separate spores in a high throughput manner that maintains or re-creates original spore relationships there is also a need for methods to detect individuals that harbor genetic recombination events in genomic regions of interest.
- the current disclosure provides systems and methods that improve the ability to perform genetic research.
- Particular embodiments provide systems and methods to quickly and efficiently isolate tetrads from vegetative cells, dyads, and dead cells. These embodiments can utilize fluorescent dyes and flow cytometry.
- Additional embodiments provide systems and methods to retain or re-create original spore relationships during spore analysis, including high throughput spore analysis. These embodiments rely on patterns of natural genetic sequences in an organism and do not require genetic modification of the organism ⁇ e.g., use of introduced or expressed fluorescent proteins and/or DNA-based molecular bar codes).
- Additional embodiments provide systems and methods to detect genetic recombination events in genomic regions of interest. These embodiments utilize markers in the genomes of parental organisms.
- markers do not create a detectable signal or create a first or second differential signal. If the markers in parental strains come together in an offspring, the unified markers create a detectable signal and/or a third differential signal.
- the detectable or third differential signal can signify the occurrence of the genetic recombination event.
- Each of the described embodiments can be practiced alone or in combination with other embodiments to generate various systems and methods that improve the ability to perform genetic research.
- Particular embodiments and combinations of embodiments improve the ability to perform genetic research without requiring genetic modification of the organism.
- These embodiments and combinations can be especially useful in industries such as the food and beverage industry, where genetic modification of organisms is discouraged or even prohibited.
- FIG. 1 Depiction of disclosed system and method utilizing (1) markers in parental organisms; (2) dye to stain viable tetrads; (3) flow cytometry to sort tetrads from vegetative cells, dyads and dead cells; (4) generation of colonies from individual spores; and (5) use of patterns of natural genetic sequences to capture tetrad relationships of recombinant progeny.
- FIGs. 2A, 2B Sporulation cultures containing vegetative yeast cells (A), tetrads (B), dyads (C) and dead cells (D) stained with DiBAC4(5).
- FIG. 3 DiBAC4(5) stained yeast tetrads can be isolated from vegetative cells, dyads and dead cells using flow cytometry.
- FIGs. 4A and 4B Each colony is derived from the spore of a hand-dissected tetrad. The four colonies in each column are all derived from the same individual tetrad, thus four colonies growing in a column indicates a completely viable tetrad. (3A) spores stained with fluorescent dye, DiBAC4(5); and (3B) non-stained control. The comparison shows that tetrad staining with DiBAC4(5) does not decrease spore viability.
- FIG. 5 Schematic comparison of prior art bar coding (left panel) versus disclosed methods to identify tetrad relationships of spores (right panel).
- FIGs. 6A, 6B (6A) Behavior of a single chromosome during meiosis. In the initial heterozygous diploid (top) there are two copies of the "A" haplotype (light gray chromatids) and two copies of the "B" haplotype (dark gray chromatids). Centromeres are shown as circles. Note that the two "A" centromeres stay together until the second meiotic division, as do the two "B” centromeres. Spores (haploid meiotic products) shown as dotted ovals. (6B) Segregation pattern shown for 3 chromosomes.
- FIG. 7 Comparison of delta with interaction information for 3- and 4-spore cases. All measures were computed on a simulated dataset (1461 markers and 1140 spores from 285 tetrads).
- the top panel shows the scatter plot of interaction information scores versus delta scores computed on all possible groups of 3 spores. Each group is colored gray ( ⁇ ; red in original) if all 3 spores of the group came from the same tetrad, dark gray ( ⁇ ; blue in original) if only 2 spores came from the same tetrad, and light gray ( ⁇ ; green in original) if all spores came from different tetrads.
- the bottom panel shows the scatter plot of the scores computed on all possible groups of 4 spores. Note that sign reversal occurs between the 3- and 4-way scales for both interaction information and delta.
- FIG. 8 Comparison of the amount of information between three spore tetrads ( 113) and their component two-spore subgroups (Ml) (bottom panel) and between four spore tetrads (II4) and their component three-spore subgroups (113) (top panel) as measured by interaction information (II). All measures were computed on the same data as in FIG. 7.
- a group is colored dark gray ( ⁇ ; red in original) for groups of 3 spores from the same tetrad considered along with the remaining spore from that tetrad, gray ( ⁇ ; blue in original) for groups of 3 spores from the same tetrad considered along with one spore from another tetrad and light gray ( ⁇ ; green in original) otherwise.
- a group is colored dark gray for sets of 2 spores from the same tetrad considered along with another spore from that tetrad, gray for groups of 2 spores from the same tetrad considered along with one spore from another tetrad and light gray for groups of 3 unrelated spores. Note that the scores of gray and dark gray sets are plotted in their entirety, whereas to plot the light gray sets 2 million groups were randomly selected. [0015] FIG. 9. An exemplary method of identifying tetrad relationships from spore genomes.
- FIG. 10 The exemplary method of identifying tetrad relationships of FIG. 9 show in greater detail.
- FIG. 11 Sub-portions of the exemplary method of FIG. 10 shown in greater detail.
- FIGs. 12A-12C Use of markers in the genome of parents to identify genetic recombination events in a genomic region of interest during reproduction.
- FIG. 12A Placement of genetic constructs encoding components of a marker pair within the genomes of parents around a genomic region of interest.
- FIG. 12B Alignment of chromosomes, and location of marker encoding sequences if no genetic recombination event occurs in the genomic region of interest. No detectable or differential signal is created.
- FIG. 12C Alignment of chromosomes, and location of marker encoding sequences if genetic recombination event occurs in the genomic region of interest. Detectable or differential signal is created in offspring with relevant recombination event.
- FIG. 13 Use of fluorescent dyes to isolate unsporulated diploids from tetrads having a recombination event when the marker is a fluorescent protein (expressed in Spore 2).
- FIG. 14 A dimorphic trait within a population of Saccharomyces cerevisiae natural isolates grown on CHROMagar Candida.
- CHROMagar Candida http://www.chromagar.com/
- FIG. 15 Segregation pattern of the purple and white phenotype among the progeny of a yeast cross is indicative of a monogenic trait.
- FIG. 16 The development of purple color on CHROMagar Candida maps to a region on chromosome II.
- FIG. 17 Fine-mapping region delineated by drug markers.
- FIG. 18 Fine mapping method isolates spores with an informative recombination event.
- FIG. 19 Fine mapping method maps purple trait to a single gene.
- FIG. 20 depicts is a high-level diagram showing components of a data-processing system that can be used with embodiments disclosed herein.
- Meiotic mapping a linkage-based method for analyzing the recombinant progeny of a mating cross. The method is possible in a wide range of eukaryotes, including genetically facile yeasts and less tractable microorganisms, such as the filamentous fungus Neurospora crassa and the unicellular green alga Chlamydomonas reinhardtii.
- the approach is enabled by tetrad disruption (e.g., dissection), a technique for isolating and cultivating all of the four spores (i.e. , meiotic progeny) derived from an individual tetrad.
- the throughput of the process has historically been limited by the need to isolate tetrads out of a heterogeneous population of tetrads, vegetative cells, dyads and dead cells followed by manual separation or dissection of the spores contained in a tetrad.
- the process is time-consuming even for experienced researchers with access to specialized equipment.
- methods to isolate and separate spores in a high throughput manner that maintains or re-creates original spore relationships there is also a need for methods to detect individuals that harbor genetic recombination events in genomic regions of interest.
- the current disclosure provides systems and methods that improve the ability to perform genetic research.
- Particular embodiments provide systems and methods to quickly and efficiently sort (e.g., enrich for or isolate) tetrads from vegetative cells, dyads and dead cells. These embodiments can utilize fluorescent dyes and flow cytometry.
- the systems and methods enrich for tetrads.
- the systems and methods isolate tetrads. Sorted tetrads can be analyzed in bulk form (i.e. , without disruption of individual spores).
- sorted tetrads can be disrupted and residual dye remaining on spores can be further used to enrich for or isolate spores from vegetative cells and non- digested tetrads.
- Spores isolated in this manner can be used to generate colonies, liquid cultures, or biochemical extracts (e.g. DNA, RNA, proteins, or metabolites) from individual spores. This approach is beneficial for, for example, random spore analysis.
- Additional embodiments provide systems and methods to capture the tetrad relationships of recombinant progeny, including in high throughput spore analysis following disruption of spores from tetrads. These embodiments rely on patterns of natural genetic sequences in an organism and do not require genetic modification of the organism (e.g., use of introduced or expressed fluorescent proteins and/or DNA-based molecular bar codes). Additional embodiments provide systems and methods to detect genetic recombination events in offspring in genomic regions of interest. These embodiments utilize markers in the genomes of parental organisms. In the parental organisms, markers do not create a detectable signal or create first and/or second differential signals.
- the unified marker creates a signal (or lack of signal) that is distinguishable from the signal (or lack of signal) in the original (non-recombined, parental strains).
- the detectable or differential signal can signify the occurrence of the genetic recombination event.
- Each of the described embodiments can be practiced alone or in combination with other embodiments to generate various systems and methods that improve the ability to perform genetic research.
- Particular embodiments and combinations of embodiments improve the ability to perform genetic research without requiring genetic modification of the organism.
- These embodiments and combinations can be especially useful in industries such as the food and beverage industry, where genetic modification of organisms is discouraged or even prohibited.
- Disclosed herein are methods that allow bulk sorting (e.g., enriching for or isolating) of tetrads from vegetative cells, dyads and dead cells, but that do not require genetic modification of the cells. These methods can also be used to sort spores following removal from the tetrad environment based on residual dye remaining at or near the surface of the spore.
- Enriching for means that the sorted target (tetrads or spores) occurs at a significantly higher frequency in a sample after sorting than before sorting.
- the frequency of the sorted target in a sample can increase by at least 50%; at least 75%; at least 100% or more between pre- and post-sort. Isolating results in a pure population of a sorted target (tetrads or spores) lacking all cell types intended to be removed by the isolation.
- optical characteristics of a dye refer to absorption and/or emission of electromagnetic radiation within the ultraviolet, visible and/or infrared spectrum.
- Particular embodiments label and sort tetrads utilizing fluorescent dyes and fluorescence- activated cell sorting (FACS).
- FACS fluorescence- activated cell sorting
- Exemplary fluorescent dyes include xanthene dyes, fluorescein dyes, rhodamine dyes, fluorescein isothiocyanate (FITC), 6 carboxyfluorescein (FAM), 6 carboxy-2',4',7',4,7- hexachlorofluorescein (HEX), 6 carboxy 4', 5' dichloro 2', 7' dimethoxyfluorescein (JOE or J), ⁇ , ⁇ , ⁇ ', ⁇ ' tetramethyl 6 carboxyrhodamine (TAMRA or T), 6 carboxy X rhodamine (ROX or R), 5 carboxyrhodamine 6G (R6G5 or G5), 6 carboxyrhodamine 6G (R6G6 or G6), and rhodamine 110; cyanine dyes, e.g.
- Cy3, Cy5 and Cy7 dyes include Alexa dyes, e.g. Alexa-fluor-555; coumarin, Diethylaminocoumarin, umbelliferone; benzamide dyes, e.g. Hoechst 33258; phenanthridine dyes, e.g.
- Texas Red ethidium dyes; acridine dyes; carbazole dyes; phenoxazine dyes; porphyrin dyes; polymethine dyes, BODIPY dyes, quinoline dyes, Pyrene, Fluorescein Chlorotriazinyl, R110, Eosin, Tetramethylrhodamine, Lissamine, ROX, Napthofluorescein, and the like, as well as other examples described elsewhere herein.
- Vital dyes are non-toxic dyes that have historically been used to differentiate live and dead cells within a population. Vital dyes stain based on a variety of cell characteristics that differ between live and dead cells, such as membrane potential, membrane permeability and enzyme activity. Examples of vital dyes include oxonol dyes. Oxonol dyes are lipophilic, anionic molecules that selectively stain dead cells due to collapse of membrane potential.
- Particular examples of vital dyes include Bis-(1 ,3-dibutylbarbituric acid) pentamethine oxonol (also known as DiBAC 4 (5); Anaspec AS- 84701), calcein AM, carboxyfluorescein diacetate, copper phthalocyanine tetrasulfonate [27360- 85-6], DiOC (3,3'-dihexyloxacarbocyanine iodide), Evans blue [CAS 61-73-4], gadolinium texaphyrin [156436-89-4], indocyanine green monosodium salt [CAS 3599-32-4], isosulfan blue [also known as Patent blue violet, CAS 68238-36-8], methylene blue [CAS 314-13-6], Nile red, patent blue V [CAS 3536-49-0], patent blue VF [CAS 129-17-9], propodium iodide (PI), rhodamine 123, and sulfobromo
- a buffer such as 1xPBS, pH 7.4 (1 mM Potassium Phosphate monobasic, 155 mM Sodium Chloride, 3 mM Sodium Phosphate dibasic
- 1xTBS Tris buffered saline (50mM Tris-HCI, pH7.6, 150mM NaCI); or 200mM Na2HP04, 100mM Sodium Citrate, pH 6.2.
- Fluorescent dye can then be added to the cell culture at a temperature and concentration and for a period of time that allows tetrad straining.
- room temperature is used.
- Appropriate concentrations of fluorescent dye can range from, for example, 0.1 ⁇ g/ml - 10 ⁇ g/ml.
- tetrads stain quickly, such that no significant minimum incubation time is required. Over time, the fluorescent dye's intensity can decrease. If sorting is not performed soon after incubation with fluorescent dye, steps can be taken to support stain visibility and viability of the cells, for instance by keeping the stained mixture of cells in the dark and/or on ice.
- FACS sorting utilizes a flow cytometry machine, wherein cells are interrogated by a laser.
- Cells can be separated into droplets with differential charges (e.g., either a positive or negative charge or varying degrees of a positive or negative charge), depending on the dye that is used.
- Droplets can then be sorted by charge presence or degree to allow for sorting and collection of populations of cells.
- diploid cells were put through meiosis (sporulated) and 10 6 cells (tetrads) were washed and resuspended in 1 ml of 1 x PBS (phosphate buffered saline).
- DiBAC 4 (5) was then added to 1 ⁇ g/ml (final concentration) and tetrads were stained at room temperature for 5 minutes prior to sorting by FACS. Red fluorescence intensity was used to sort tetrads, dyads, and dead cells away from live vegetative cells using a FACS sorter. This was accomplished using a BD FACS Aria II with 488nm emission and 595LP 610/20 filter.
- FIG. 3 To visualize tetrads by fluorescence microscopy (FIGs. 2A, 2B), cells were incubated for 30 minutes at 30°C in 1 ml YPD (1 % yeast extract, 2% peptone, 2% glucose) prior to staining (as described above). That staining does not affect the viability of the progeny was demonstrated by hand dissecting and growing stained (FIG. 4A) and unstained (FIG. 4B) tetrads.
- FIG. 2B depicts a tetrad where one of the 4 spores is stained in a way that indicates that it is dead.
- particular embodiments can also be used to detect the genetic phenomenon of synthetic lethality.
- the ability to isolate (e.g., by FACS) a population of tetrads where one or more of the spores are dead provides a novel method for performing synthetic lethality screens.
- the synthetic lethality screens can be performed in unmodified strains.
- tested strains including those isolated from oak trees, coffee beans, coconut pods, kefir, sake and Drosophila pseudoobscura (fruit flies). Some tested isolates are tetraploid, triploid or have multiple aneuploidies.
- Tested lab strains include gene deletions (including a strain that is a heterozygous deletion of an essential gene) and are auxotrophic for multiple amino acids. The method has also been confirmed to be effective in a prototrophic lab strain (no deletions or auxotrophies).
- systems and methods utilizing fluorescent dyes can reasonably be expected to be effective in any S. cerevisiae strain that sporulates, and any other fungal species that form ascospores, such as Schizosaccharomyces pombe and Neurospora crassa.
- WO/2014/059370 describes a high-throughput method to replace manual tetrad dissection with fluorescence-activated cell sorting (FACS) of asci onto plates, followed by physical disruption of the tetrads and isolation of individual spores/colonies.
- FACS fluorescence-activated cell sorting
- the current disclosure describes reconstruction of tetrad relationships between recombinant progeny by using data obtained from sequencing natural genetic sequences.
- Natural genetic sequences are DNA sequences encoded by an organism that are not introduced through laboratory-induced genetic manipulation. Meiosis is an example of a naturally-occurring process that alters the genome in ways that can create tetrad-specific markers. Meiosis occurs in diploid cells and produces 4 products, which, in yeast, become the 4 spores of a tetrad. At every position in the diploid that is heterozygous, two spores will inherit the "A" allele and two will inherit the "B" allele (FIG. 6A).
- each tetrad is characterized by a unique pattern of relatively sparse recombination events. For example, there are generally 90 crossovers per yeast meiosis, with each spore having 45 crossovers across the entire 12 Mb genome. Therefore, the number of DNA sequence polymorphisms that can be used as genetic markers is much larger than the number of crossovers.
- Using the information available from these recombination events it is possible to reconstitute tetrads based only on genome sequencing of the meiotic progeny, dispensing with the tetrad-specific barcode. These methods rely on specific features of tetrads that result from the mechanisms of meiosis.
- a diploid cell undergoes one round of DNA replication followed by two rounds of cell division to produce the four recombinant haploid progeny.
- the two homologous chromosomes recombine and then segregate to opposite poles of the meiotic spindle.
- the two chromatids of each recombinant chromosome segregate, essentially as occurs in mitosis (FIG. 6A).
- sequencing methods include light coverage whole genome sequencing by, for example, lllumina, PacBio, or Oxford Nanopore.
- Light coverage genome sequencing can be defined as genomic sequencing dataset whereby each base in the genome is represented by an average of 5 or fewer reads.
- Genotyping methods that can be used for tetrad characterization can include high density genotyping by microarray hybridization, RAD-seq, Nanostring hybridization, restriction fragment length polymorphism, and/or polymerase chain reaction (PGR).
- PGR polymerase chain reaction
- SNPs between the parental chromosomes in the diploid are used as markers.
- the number of markers will depend on the degree of heterozygosity between the parental chromosomes and the proportion of the genome sequenced. In previous datasets using diploids derived from strains from different yeast populations and sequencing 3% of the genome marker numbers in the range of hundreds to low thousands were obtained. These markers provided a first step for identifying tetrad relationships.
- the heuristic search can be used as a first attempt to partition the set of all spores into clusters of spores whose centromere-flanking markers are either a perfect match or the opposite - a complete mismatch. This can be performed using a greedy algorithm, based on the similarity between spores defined by the edit-distance calculated on the centromere-flanking markers. Delta scores can then be computed within each cluster and tetrads identified.
- the relative power of this approach will depend on the number of chromosomes in the organism being analyzed.
- yeast Saccharomyces cerevisiae there are 16 chromosomes so that the probability of either a perfect match, or perfect anti-match between two spores from different tetrads is 1/2 15 .
- the fission yeast S. pombe has only three chromosomes so that the probability of either a perfect match or anti-match is 1/2 2 .
- the heuristic should almost always place spores from the same tetrad in a single cluster.
- the heuristic can make assignment mistakes. Therefore, the heuristic search alone can be insufficient for high accuracy, but instead can be used to initially reduce the search space. This is also referred to as the divide-and-conquer approach because all spores can first be split into clusters based on the centromere information and the search for tetrads can be performed within each cluster independently. This groups many of the spores into tetrads thus reducing the search space for subsequent analysis.
- the first grouping step uses a heuristic based on natural genome patterns and can be used to reduce the computational complexity before implementing the second step by subdividing the search space.
- the second step might be used more heavily or even exclusively.
- Particular embodiments of the disclosed methods start with a set of spores representing all of the members of a group of tetrads, but with tetrad identity lost, such as in the high-throughput tetrad isolation and disruption method BEST (Ludlow et al., 2013 Nat Methods. 10: 671-675). These spores are grown into individual colonies from which DNA is isolated followed by, for example, whole-genome sequencing or high-density genome-wide genotyping, for example using RAD-seq (WO 2006/122215; U.S. Patent No. 9,365,893) (Baird et al., PLoS One 3: e3376, 2008).
- a two-step informatics approach can then be used two organize these recombinant progeny into their original tetrad relationships.
- spores are grouped into potential tetrads based on their redundant and mirrored features in the natural genome (e.g., redundant and mirrored centromere-linked markers), while in a second step any such groupings that include multiple tetrads are refined down into single tetrads.
- irst, redundant and mirrored features in the natural genome are used to group colonies into potential tetrads.
- This example describes the use of redundant and mirrored centromere-linked markers.
- Meiosis includes two divisions with recombinant homologous chromosomes separating in the first division, and sister chromatids in the second (FIG. 6A). Each of the two products of the first meiotic division gives rise to two spores and each of these pairs have matching alleles at each of their centromeres (they are recombinant in their arms) (FIG. 6B).
- recombinant progeny are only considered for grouping when they lack crossovers between the 2 markers flanking the centromere on every chromosome. Grouping is done with no error allowed, spores discarded at this step have a chance to be grouped into tetrads during the one or more second steps described below.
- the centromeres are short sequences (120bp).
- the centromere allele is defined based on the alleles observed at the closest flanking markers. This is done only when those markers have the same allele (both "A" or both "B"), i.e. no recombination detected in the centromere interval.
- the methods disclosed herein can be agnostic as to centromere length, but do require that the flanking markers display strong genetic linkage to the centromere. Note that in particular embodiments, incorrectly grouped recombinant progeny will not be placed into tetrads because they will fail to pass the second step, described below. These spores can be grouped into tetrads at later steps.
- the cut-offs to define a match between two spores using only centromere flanking markers are: at least 50% shared valid markers flanking centromeres, and perfect consensus between these markers. Valid means not missing, and no transition from one parent to another (which might indicate a crossover near the centromere).
- recombinant progeny relationships can also be identified based on allele frequencies and/or cross-over patterns. These patterns are created by the presence of four copies of each chromosome (two from one parent and two from the other parent) reciprocal nature of crossing over during meiotic recombination. As depicted schematically in FIGs. 6A and 6B, for a spore from a given tetrad with a given recombination event, another spore from that same tetrad will harbor a reciprocal recombination event.
- the 2:2 allele ratio of a given alleles are maintained between the four spores of a true tetrad, but not between four randomly selected spore from different tetrads or even three spore from the same tetrad and one spore assigned to that tetrad in error.
- the patterns of the recombination events themselves or the allele ratios that they generate can be used to identify recombinant progeny that arose from the same meiotic event (sister spores within the same tetrad).
- a second step is additionally or alternatively used to group spores into their original tetrad relationships.
- a Markov chain is a type of Markov process that has either discrete state space or discrete index set.
- Particular embodiments can utilize the second method based on information theory (Sakhanenko & Galas, 2015. J. Comp. Biol. 22(11): 1005-1024) to organize recombinant progeny into tetrads, starting with the groupings identified by the redundant and mirrored features in the natural genome approach.
- this second step can be necessary because, with sufficiently large numbers of progeny to analyze, and particularly if genotyping information is missing for some centromeres, multiple tetrads may share a genome sequence pattern.
- informatics methods consider all markers genome-wide and calculate an information-theory-based score for each pair, triple and quad of progeny within each grouped set as well as within a set of progeny with an ambiguous relationship.
- High scores correspond to progeny that share common information and thus are more likely to have originated from the same tetrad.
- Score cutoffs derived from the whole dataset can then be used to identify the true tetrad-groupings.
- the random background distribution of scores can be constructed, and a cutoff is selected to be significantly distinct from the background distribution.
- the pair-wise score is mutual information between progeny derived from two spores.
- Mutual information measures how much information one variable carries about the other variable.
- conditional interaction information is the expected value of the interaction information for N-1 variables given the value of the Nth variable. Since conditional interaction information is asymmetric relative to the conditional variable, the product of conditional interaction information across all conditional variables is taken.
- [0062] Referring and quoting passages from Sakhanenko and Galas more particularly, in particular embodiments the following approaches can be used: [0063] Interaction information for three-variable dependency is described.
- the three-variable interaction information, l(Xi , X2, Y), can be thought of as being based on two predictor variables, Xi and X2, and a target variable, Y.
- the three-variable interaction information can be written as the difference between the two-variable interaction information, with and without knowledge of the third variable:
- ⁇ ( ⁇ , X2) is the mutual information
- ⁇ ( ⁇ , X2 1 Y) is conditional mutual information given Y.
- H(X,) is entropy of a random variable X,
- Equation 3 the "differential interaction information,” ⁇ , is defined as the difference between values of successive
- the differential interaction information in Equation 4 is based on specifying the target variable, the variable added to the set of n-1 variables.
- the differential is the change that results from this addition and is therefore asymmetric in that variable designation (and thus not invariant under permutation.) See Equation 6 for an example of using different target variables. Since the purpose is to detect fully cooperative dependence among the variable set, any single measure should be symmetric. A more general measure then can be created by a simple construct that restores symmetry. If A's are multiplied with all possible choices of the target variable the resulting measure will be symmetric and will provide a general measure that is functional and straightforward. To be specific, the symmetric measure is defined as n
- an exhaustive search is used to group the remaining spores, when possible, into tetrads.
- the exhaustive search is more computationally expensive than the divide-and-conquer approach (first step) so combining these two techniques in this order results in a more computationally efficient (i.e. quicker) analysis that simply using the exhaustive search on the entire search space.
- the interaction information for three variables quantifies the difference between the two-variable interaction information (mutual information), with and without knowledge of the third variable:
- I(X, Y, Z) I(X, Y)- I(X, Y ⁇ Z)
- differential interaction information is not symmetric, because X in equation 5 is a special variable.
- the product of differential interaction information is taken with all possible choices of the target variable:
- a m is referred to as the delta measure, for m variables.
- the delta measure for m variables.
- this is a general, multi- variable measure, in these embodiments the focus is on delta computed only on 3- and 4-variable sets.
- Three- and 4-variable delta, as well as the pair-wise measure, mutual information can be used to scan the data from large sets of yeast spores and detect and assemble spore tetrads and their components.
- FIG. 7 compares delta with interaction information for 3 spore (top panel) and 4 spore (bottom panel) cases.
- a simulated data set with 1461 markers and 1 140 spores from 285 tetrads was used.
- interaction information was applied to the genotypes of spores from this simulated spore dataset, groups of 3 and 4 spores from real tetrads scored strongly, as expected.
- a "real" tetrad is a set of genomic data from four spores that are known (because the data is simulated) to have originated from the same tetrad.
- the delta measure (Galas et al., 2014. J. Comp. Biol. 21 (2), 118-140; Sakhanenko & Galas, 2015. J. Comp. Biol. 22(11): 1005-1024; Galas & Sakhanenko, 2016.
- Multivariate information measures a unification using Mobius operators on subset lattices. arXiv 1601.06780), which is based on differential interaction information can be used.
- Differential interaction information quantifies the change in interaction information that occurs when another variable is added to a set of variables. Note that, unlike interaction information, differential interaction information is not symmetric, but is specific to which target variable is considered "added.” In order to create a symmetric measure, the product of differential interaction information is taken with all possible choices of the target variable.
- FIG. 8 shows an example of combining information at the 3- and 4-spore level (top panel) and the 3- and 2-spore level.
- the high interaction information associated with real tetrads was observed at both the 4-spore level and also at the 3-spore level for all subgroups of 3 spores from that tetrad (FIG. 8, top panel), in contrast, while an incorrect tetrad might score highly at the 4-spore level, that did not extend to the 3-spore level for its subgroups, i.e. the noise at the 3- and 4-spore levels is not correlated (FIG. 8, top panel).
- FIGs. 9-11 show aspects of exemplary methods of identifying tetrad relationships based on genomic data obtained from individual spores.
- the methods discussed in this disclosure are delineated as separate operations represented as independent blocks. However, these separately delineated operations should not be construed as necessarily order dependent in their performance.
- the order in which the process is described is not intended to be construed as a limitation, and any number of the described process blocks may be combined in any order to implement the methods, or alternate methods. Moreover, it is also possible that one or more of the provided operations may be modified or omitted.
- FIG. 9 shows an exemplary method 900 for computationally inferring tetrad relationships from randomly arrayed yeast spores.
- Method 900 includes four phases: a data preprocessing and set up phase (block 902), a divide-and-conquer heuristic search phase (blocks 904 and 906), an exhaustive search phase (blocks 908-912), and an output phase (block 914).
- the data is preprocessed to remove strains with low numbers of marker calls and highly "heterozygous" strains, likely reflecting contamination of one strain by another.
- tetrads are identified within these clusters based on exhaustive searches using delta, first searching for 4-spore tetrads, and then 3-spore.
- FIG. 10 shows exemplary method 1000, which provides additional details for each of the four phases introduced in FIG. 9.
- the preprocessing portion of method 1000 corresponds to block 902 of FIG. 9.
- an input file is parsed.
- the input file may contain the genomic information from the spores.
- the input file may be in any format suitable for representing genomic information as electric data such as a text file, a FASTA file, or other file type.
- the input file can be received directly from a DNA sequencer or obtained indirectly via a network, memory device, or other computing device.
- the preprocessing continues by removing spores, identifying missing data, and removing duplicate entries.
- cutoff thresholds are computed.
- the thresholds are used to identify the candidate tetrads of spores (as well as triplets and pairs - incomplete tetrads).
- the next phase can begin at block 1008.
- flanking centromere markers are selected. The information contained in these markers is used to cluster the spores.
- edit-distances are computed for all possible spore pairs based on the flanking markers.
- Edit distance is a way of quantifying how dissimilar two strings (e.g. words) are to one another by counting the minimum number of operations required to transform one string into the other.
- edit distance can be used to quantify the similarity of DNA sequences, which can be viewed as strings of the letters A, C, G and T.
- these two spores are assigned to the same cluster.
- spores are formed into clusters based on edit-distance using a clustering algorithm. This attempts to partition the set of all spores into clusters of spores whose centromere- flanking markers are either a perfect match or the opposite - a complete mismatch.
- clustering algorithm is a greedy algorithm. Blocks 1008-1014 correspond to block 904 of FIG. 9.
- method 1000 attempts to create all possible groupings of four spores called "quads" Q.
- a quad is not necessarily a tetrad.
- an exhaustive search is performed for all tetrads on all the spores in each cluster. Identified tetrads are not included in subsequent analysis. In particular embodiments, if the number of quads is such that performing an exhaustive search would be too computationally expensive, then method 1000 proceeds from block 1018 along the no path to 1022.
- Block 1022 for all spores remaining in a cluster of two or more spores that were not included in a tetrad, a "shadow search" is performed. Details of the shadow search are described below in the discussion of FIG. 1 1. Blocks 1016-1022 correspond to block 906 of FIG. 9.
- the third phase begins at block 1024.
- all remaining spores that are not part of a tetrad are grouped into one cluster G.
- the first search is for tetrads exhaustively.
- method 1000 determines if there are more than three spores in cluster G. If not, method 1000 proceeds to triplet analysis at block 1036. If yes, then method 1000 follows the yes path to block 1028.
- block 1030 it is determined if the number of quads is less than the maximum number. This is similar to block 1018. In particular embodiments, if the number is equal or greater than the maximum number, method 1000 proceeds to block 1032 and flags the remaining spores for later analysis. If the number of quads is less than the maximum, then method 1000 proceeds to block 1034.
- Block 1034 the exhaustive search is performed. This is similar to block 1020. Any complete tetrads are identified and those spores are excluded from further analysis. Blocks 1024- 1034 correspond to block 908 (search for tetrads) of FIG. 9.
- method 1000 proceeds along the no path to block 1050 and begins pair analysis. If there are at least three spores, method 1000 proceeds along the yes path to block 1038.
- a shadow search is performed to find and remove spores that form complete or incomplete tetrads (triplets).
- the shadow search identifies triplets that can form tetrads by the addition of an additional spore.
- the tetrads are removed from the set of triplets T.
- every triplet in T is identified as a partial tetrad.
- block 1046 it is determined if the number of quads is less than the maximum number. This is similar to blocks 1018 and 1030. If the number is equal or greater than the maximum number, method 1000 proceeds to block 1050 and begins pair analysis. If the number of quads is less than the maximum, then method 1000 proceeds to block 1048.
- the exhaustive search is performed. This is similar to blocks 1020 and 1034. Any complete tetrads are identified and those spores are excluded from further analysis.
- Blocks 1036-1048 correspond to block 910 (search for triplets) of FIG. 9.
- method 1000 determines if there is more than one spore remaining in cluster G. If not, then method 1000 proceeds along the no path to block 1058. If there are two or more spores, then method 1000 proceeds along the yes path to block 1052.
- method 1000 proceeds along the no path to block 1058.
- the fourth phase of output begins at block 1058.
- block 1058 in particular embodiments all remaining spores that have not been included in a tetrad, triple, or pair are labeled as singles.
- all labeled spores are output.
- the output may include generating data in a human-usable form such as outputting information onto a display of a computing device, printing the output data, etc.
- Blocks 1058 and 1060 correspond to block 914 of FIG. 9.
- FIG. 1 1 shows methods corresponding to specific functions used in method 1000 and shown in FIG. 10.
- the specific functions are a search function 1100 for /V-tuples in a set of tuples T, a shadow search function 1 102 for tetrads and triplets in cluster C, a triplet function 1 104 for identification of triplets t in cluster C, and a pairs function 1 106 for identification of pairs in cluster C.
- delta scores for all tuples in the set are computed ( ⁇ / ⁇ ( ), and those tuples that pass the significance filter (above the threshold) are tested for the 2-2 segregation (or 2-1 segregation in case of triplets) and successfully labeled and removed from further consideration.
- the search function 1 100 returns all labeled /V-tuples as well as the set of remaining /V-tuples. This search function 1100 is used to perform an exhaustive search in blocks 1020, 1034, and 1048 of FIG. 10.
- the shadow search function 1102 encodes a search based on the heuristic that many tetrads contain triplets of spores with significant delta scores.
- the shadow search function 1 102 computes the delta scores for all triplets of spores (( ⁇ 3( ) from the cluster C. For those triplets that pass the significance filter, the function creates a set of 4-tuples by combining the triplets with all other spores and performs the exhaustive search (i.e. Search(4,Q)).
- the shadow search function 1 102 then removes successfully identified tetrads and returns the set of remaining triplets T that passed the significance filter.
- the shadow search function 1102 is used in FIG. 10 at blocks 1022 and 1028.
- the triplet function 1104 identifies triplets based on delta scores.
- the triplet function 1 104 computes the delta score for the triplet t, checks that it passes the significance filter and the 2-1 segregation filter, labels it as an incomplete or partial tetrad and removes from the cluster C.
- the triplet function 1104 is included in FIG. 10 at block 1040.
- the pairs function 1106 identifies pairs of spores based on mutual information (Ml) scores.
- the pairs function 1106 computes mutual information scores for all pairs in the cluster C and labels those pairs that pass the significance filter as incomplete tetrads (i.e. as pairs). Any spores remaining are returned as single spores.
- the pairs function 1106 is included in FIG. 10 at block 1052.
- a genetic construct encoding a marker of a marker pair is inserted into one parent's genome and a second genetic construct encoding the second marker of the marker pair is inserted into the second parent's genome. If both markers of the pair are expressed together in the offspring, a detectable or differential signal distinct from the signal of either member of the pair alone is generated, thus identifying a genetic recombination event in the genomic region of interest.
- An exemplary marker pair includes two different drug resistance markers or two different fluorescent proteins.
- a genetic construct encoding one element of a split marker pair is inserted into one parent's genome and a second genetic construct encoding a second element of the split marker pair is inserted into the second parent's genome. If the split marker's components are expressed together in the offspring, a detectable or differential signal distinct from the signal of either half of the split marker alone is generated, thus identifying a genetic recombination event in the genomic region of interest.
- drug resistance markers can be utilized as markers. Exemplary drug resistance markers include acetamide assimilation genes (Kelly & Hynes, EMBO J.
- fluorescent proteins and analogs thereof can be utilized as markers.
- Exemplary fluorescent proteins include blue fluorescent proteins (e.g. eBFP, eBFP2, Azurite, mKalamal, GFPuv, Sapphire, T-sapphire); cyan fluorescent proteins (e.g. eCFP, Cerulean, CyPet, AmCyanl, Midoriishi-Cyan); green fluorescent proteins (e.g.
- split fluorescent proteins include those described in Paulmurugan et al. (PNAS USA 99(24): 15608-15613, 2002) and Demidov et al. (PNAS USA 103(7):2052-2056, 2006). See also Internatiaonl Patent Publication No. WO 2012/135535; U.S. Patent Publications 2012-0282643, 2015-0099271 , 2015- 0010932, and 2014-0024555; and U.S. Patent Nos. 8,685,667 and 9,081 ,014.
- cerulenin resistance genes e.g., fas2m, PDR4; Inokoshi et al., Biochemistry 64: 660, 1992; Hussain et al., Gene 101 : 149, 1991
- copper resistance genes CUP1 ; Marin et ai, Proc. Natl. Acad. Sci. USA. 81 : 337, 1984
- geneticin resistance gene G418r
- Additional useful markers include ⁇ -galactosidase ( ⁇ -gal) and ⁇ -glucuronidase (GUS) (see, e.g., European Patent Publication EP2423316). These reporter proteins function by hydrolyzing a secondary marker molecule (e.g., a ⁇ -galactoside or a ⁇ -glucuronide). Thus it will be understood that methods and systems that employ one of these marker proteins will also involve providing the compound(s) needed to produce a detectable reaction product. Assays for detecting ⁇ -gal or GUS activity are well known in the art.
- auxotrophic markers include methionine auxotrophic markers (e.g., met1 , met2, met3, met4, met5, met6, met7, met8, met10, met13, met14 or met20); tyrosine auxotrophic markers (e.g., tyrl or isoleucine); valine auxotrophic markers (e.g., ilvl , ilv2, ilv3 or ilv5); phenylalanine auxotrophic markers (e.g.
- lysine auxotrophic markers e.g. , Iys1 , Iys2, Iys4, Iys5, Iys7, Iys9, Iys11 , Iys13 or Iys14
- tryptophan auxotrophic markers e.g., trpl , trp2, trp3, trp4 or trp5
- leucine auxotrophic markers e.g., Ieu1 , Ieu2, Ieu3, Ieu4 or Ieu5
- histidine auxotrophic markers e.g., hisl , his2, his3, his4, his5, his6, his7 or his8).
- the genetic constructs include regulatory sequences to control the expression of the nucleic acid molecules.
- the regulatory sequence can result in the constitutive or inducible expression of markers encoded by the genetic construct.
- the regulatory sequences to control expression of the genetic constructs include promoters selected for use to enable autonomous expression in spores.
- Exemplary promoters include Saccharomyces promoters such as pADH1 , pTDH3, pPGK1 , pADH2, pPDC2, pPMA1 and pGPDl
- the regulatory sequences can include or encode an interaction domain, for example, to drive sufficient refolding of a split marker protein to allow for function and signal creation.
- exemplary interaction domains include protein-protein interaction domains such as EF1 , EF2, SH2, SH3, PDZ, 14-3-3, WW and PTB and Notch and Delta ectodomains, as well as integrin a and ⁇ subunits.
- the regulatory sequences can include or encode a restriction site.
- restriction sites can flank the genomic region of interest such that when genomic DNA is digested with the appropriate enzyme, a fragment that can be isolated by size selection or compatible end mediated ligation capture (onto a bead or into a plasmid) is produced. If the restriction site is not naturally present in the genome, the only fragment that should be isolated is the one flanked by the introduced sites. Any naturally occurring restriction sites that are spaced farther apart than the fragments for targeted isolation should not interfere with the process due to use of, for example, size selection. In particular embodiments, less than every 100 kb is reasonable. In particular embodiments, restriction sites need not be used and the whole genome can be sequenced.
- Exemplary restriction sites include sites for homing endonucleases, which are a type of endonuclease that cuts DNA upon recognition of a large specific sequence(12-40bp). Use of a restriction enzyme with a large recognition sequence can help minimize the likelihood that the enzyme will cut DNA at unintended sites. For example, the likelihood is one in one billion that a random sequence will match any given recognition sequence that is 15bp long.
- One appropriate restriction site for use is l-Scel. Additional examples of restriction sites include DNA sequences recognized by Sfi I, Acci, Afl III, Sapl, Pie I, Tsp45 I, ScrF I , Tse I, PpuM I, Rsr II, and SgrA I.
- TALENs transcription activatorlike effector nucleases
- TALE transcription activator-like effector
- TALENs are used to edit genes and genomes by inducing double strand breaks (DSBs) in the DNA, which induce repair mechanisms in cells.
- DSBs double strand breaks
- two TALENs must bind and flank each side of the target DNA site for the DNA cleavage domain to dimerize and induce a DSB.
- the DSB is repaired in the cell by non-homologous end-joining (NHEJ) or by homologous recombination (HR) with an exogenous double-stranded donor DNA fragment.
- NHEJ non-homologous end-joining
- HR homologous recombination
- TALENs have been engineered to bind a target sequence of, for example, an endogenous genome, and cut DNA at the location of the target sequence.
- the TALEs of TALENs are DNA binding proteins secreted by Xanthomonas bacteria.
- the DNA binding domain of TALEs include a highly conserved 33 or 34 amino acid repeat, with divergent residues at the 12 th and 13 th positions of each repeat. These two positions, referred to as the Repeat Variable Diresidue (RVD), show a strong correlation with specific nucleotide recognition. Accordingly, targeting specificity can be improved by changing the amino acids in the RVD and incorporating nonconventional RVD amino acids.
- RVD Repeat Variable Diresidue
- Examples of DNA cleavage domains that can be used in TALEN fusions are wild-type and variant Fokl endonucleases.
- the Fokl domain functions as a dimer requiring two constructs with unique DNA binding domains for sites on the target sequence.
- the Fokl cleavage domain cleaves within a five or six base pair spacer sequence separating the two inverted half-sites.
- MegaTALs have a single chain rare-cleaving nuclease structure in which a TALE is fused with the DNA cleavage domain of a meganuclease.
- Meganucleases also known as homing endonucleases, are single peptide chains that have both DNA recognition and nuclease function in the same domain. In contrast to the TALEN, the megaTAL only requires the delivery of a single peptide chain for functional activity.
- ZFNs zinc finger nucleases
- ZFNs are a class of site-specific nucleases engineered to bind and cleave DNA at specific positions. ZFNs are used to introduce DSBs at a specific site in a DNA sequence which enables the ZFNs to target unique sequences within a genome in a variety of different cells. Moreover, subsequent to double-stranded breakage, homologous recombination or non-homologous end joining takes place to repair the DSB, thus enabling genome editing.
- ZFNs are synthesized by fusing a zinc finger DNA-binding domain to a DNA cleavage domain.
- the DNA-binding domain includes three to six zinc finger proteins which are transcription factors.
- the DNA cleavage domain includes the catalytic domain of, for example, Fokl endonuclease.
- Guide RNA can be used, for example, with gene-editing agents such as CRISPR-Cas systems.
- CRISPR-Cas systems include CRISPR repeats and a set of CRISPR-associated genes (Cas). See, for example, Mans et al. ⁇ FEMS Yeast Res. 15(2), 2015; doi: 10.1093/femsyr/fov004); DiCarlo et al. (NAR 1-8, 2013; doi: 10.1093/nar/gkt135); Laughery et al. (Yeast, 32(12):711-720, 2015; doi: 10.1002/yea.3098);
- the CRISPR repeats include a cluster of short direct repeats separated by spacers of short variable sequences of similar size as the repeats.
- the repeats range in size from 24 to 48 base pairs and have some dyad symmetry which implies the formation of a secondary structure, such as a hairpin, although the repeats are not truly palindromic.
- the spacers, separating the repeats match exactly the sequences from prokaryotic viruses, plasmids, and transposons.
- the Cas genes encode nucleases, helicases, RNA-binding proteins, and a polymerase that unwind and cut DNA.
- Cas1 , Cas2, and Cas9 are examples of Cas genes.
- At least three different Cas9 nucleases have been developed for genome editing.
- the first is the wild type Cas9 which introduces DSBs at a specific DNA site, resulting in the activation of DSB repair machinery.
- DSBs can be repaired by the NHEJ pathway or by homology-directed repair (HDR) pathway.
- the second is a mutant Cas9, known as the Cas9D10A, with only nickase activity, which means that it only cleaves one DNA strand and does not activate NHEJ. Thus, the DNA repairs proceed via the HDR pathway only.
- the third is a nuclease-deficient Cas9 (dCas9) which does not have cleavage activity but is able to bind DNA.
- dCas9 nuclease-deficient Cas9
- dCas9 is able to target specific sequences of a genome without cleavage.
- dCas9 can be used either as a gene silencing or activation tool.
- the parental marker aspect of the disclosure can be used to identify and select offspring that have genetic recombination events in a specific area of the genome.
- identification of individuals harboring such genetic recombination events can be used for the purpose of improving the efficiency of genetic mapping.
- Genetic "fine mapping" experiments seek to identify the causative gene(s) that contribute to a trait with a genomic region that contains many genes. In these studies only the small proportion of the progeny resulting from a cross (for instance, those that contain a recombination event within the area of interest) are informative for refining this interval. Thus, selecting individuals that harbor a recombination event in the area (at the outset) improves the efficiency of these experiments by reducing the number of individual progeny that need to be produced, genotyped, phenotyped, and maintained.
- an organism e.g. yeast
- a phenotypic trait of interest e.g. heat tolerance
- the gene leading to the phenotypic trait of interest is believed to be within a particular area of the genome ("genomic region of interest").
- FIG. 12A a portion of the haploid genome of Parent 1 with a phenotypic trait of interest is shown as "1".
- a marker construct is inserted within a defined number of base pairs 5' (or 3') of the genomic region of interest.
- the genetic construct includes or encodes (i) a promoter, (ii) an interaction domain, (iii) an N-terminal fragment of a split marker (or the C-terminal fragment), (iv) a restriction site (RS), and (v) a termination signal.
- a portion of the haploid genome of Parent 2 without a phenotypic trait of interest is shown as "3".
- a construct is inserted within a defined number of base pairs 3' (or 5') of the genomic region of interest.
- the genetic construct includes or encodes (i) a restriction site (RS), (ii) a promoter, (iii) an interaction domain, (iv) the respective complementary N- or C-terminal fragment of the split marker, and (v) a termination signal.
- the two chromosomes duplicate at the beginning of meiosis. If recombination does not occur within the genomic region of interest, the haploid progeny (products of meiosis) will harbor and express either the N-terminal construct or the C-terminal construct, but not both. Because differential signal creation requires the expression of both the N- terminal and C-terminal portions of the split marker within the same cell, no differential signal will be observed in any of the four meiotic progeny.
- each genetic construct encodes a complete marker (e.g., one of the drug markers Kan or Nat).
- a complete marker e.g., one of the drug markers Kan or Nat.
- Recombinant progeny that have had a recombination event in the genomic region of interest have a differential signal in that they include both drug markers.
- different genetic constructs can encode fluorescent proteins of different colors (e.g., full length GFP and YFP).
- fluorescent proteins of different colors e.g., full length GFP and YFP.
- recombination events in the genomic region of interest would be indicated by the presence of both green and yellow signals.
- sequences of sorted offspring having recombination events within the genomic region of interest with different phenotypic traits can then be compared, providing faster and cheaper identification of genes of interest.
- An additional useful property of tetrads and the disclosed systems and methods is that every time there is a recombination event (e.g. the one that produces, for example, a nat-kan double) in a region that gets packaged into one of the spores, the reciprocal recombination product is packaged into one of its sister spores.
- a recombination event e.g. the one that produces, for example, a nat-kan double
- the reciprocal recombination product is packaged into one of its sister spores.
- a differential signal in the original strains e.g. a single drug marker
- This method greatly enhances efficiency of identifying the gene(s) within an area that confer phenotypic traits of interest.
- a restriction site at the opposite end of the genomic region of interest (* in FIG. 12C) can be included to provide restriction sites to cut the genome in spore 3. Note that the unique (or extremely rare) restriction sites flanking the candidate genes in Spore (2) allow cutting of the genome to the area of interest so that shorter segments require sequencing, saving additional resources over whole genome sequencing.
- Placing restriction sites such that they are proximal to the genomic region of interest allows the genomic region of interest to be excised from recombinant progeny without having to sequence the construct DNA. Given the efficiency with which whole genome sequencing can be performed, however, this feature is optional for organisms with relatively small genomes (e.g. S. cerevisiae) but may provide substantial cost savings for larger genomes (e.g. plant genomes).
- Beer 16 by the circled white colony labeled "Beer" (Schacherer et al., 2009 Nature 458: 342-345; Cromie et al., 2013 Genomic Sequence Diversity and Population Structure of Saccharomyces cerevisiae Assessed by RAD-seq. G3 (Bethesda)).
- FIG. 15 depicts the steps of a yeast cross; a population of heterozygous diploids derived by mating the two parents of interest (e.g. IL-01 and CLIB382r) is sporulated, resulting in individual tetrads which each contain the four recombinant progeny of a single meiotic event.
- the image included in FIG. 15 shows individual S. cerevisiae colonies grown on CHROMagar Candida including the parents of the cross and a sampling of the 1336 progeny obtained from hand-dissecting tetrads.
- the white parent (CLIB382r) and purple parent (IL-01) of the cross are shown in the upper corner of the image while the four sister-spores from individual tetrads are arrayed in columns across the plate.
- the Mendelian (2:2) segregation of the purple and white phenotype among the progeny indicates, in this genetic background, a single gene is linked to the colorimetric trait.
- RAD-seq is a cost-effective method that sequences the same 1 % of the genome in all strains defining a set of genomic markers for QTL mapping. As demonstrated by the plot shown in FIG.
- QTL mapping identified a single major-effect QTL peak on chromosome II linked to the purple phenotype.
- the LOD (logarithm of odds) score of 159 far exceeds the significance threshold of LOD 4, indicating a very high degree of likelihood that the region under the peak contains the causative gene.
- FIG. 17 provides a close-up view of the QTL peak identified on chromosome II from FIG. 16.
- the x-axis indicates the genomic region on chromosome II with distance in centiMorgans (cM), and the y-axis indicates the LOD score of the peak. While the LOD score of 159 is highly significant, the 1.5 LOD support interval includes a 42kb region with 30 genes extending from the marker positions 405685 to 447286 (nucleotide positions on chromosome II). A representation of genes (http://chromozoom.org/) included within this interval is shown in FIG. 17.
- the region was first flanked by integrating a selectable drug marker (natMX4) (Goldstein & Mccusker, 1999 Yeast 15: 1541-1553) at the 5' end of the interval in the purple parent (YO2302) and integrating a second selectable drug marker (kanMX4) (Wach et al., 1994 Yeast 10: 1793-1808) at the 3' end of the interval in the white parent (YO2308).
- This cross is henceforth identified as the N-K cross.
- a second diploid using a set of reciprocally marked strains was also constructed.
- FIG. 18 depicts the DNA region marked by the natMX4 (dark gray rectangle) and kanMX4 (light gray rectangle) drug cassettes delineating the fine mapping region.
- An informative recombination event requires a crossover (dark line connecting the two parental chromosomes) within the marked region, thereby linking the causal polymorphism (black or white rectangles) to both drug markers.
- meiosis I Ml
- a crossover within the fine mapping region generally results in a tetrad as depicted by the starred tetrad within FIG.
- one spore inherits both natMX4 and kanMX4 markers (represented by the mixed dark gray and light gray rectangle within the spore), one spore inherits neither drug marker (represented by the empty spore), and two spores, with no crossover in the region, inherit the original parental haplotypes and are thus marked with only natMX4 or kanMX4 cassette (represented by the dark gray or light gray rectangles).
- the informative progeny carried forward in this example inherit both drug markers; however, it is noteworthy that unmarked strains also harbor informative crossovers and can be selected based on sensitivity to both G418 and nourseothricin.
- an event within the fine mapping region may occur at low frequency among a population of diploids undergoing meiosis.
- this fine mapping method overcomes this constraint by selecting and isolating spores with an informative recombination from non-informative spores and unsporulated diploids. To this end, individual diploid colonies of the N-K cross and the K-N cross were grown overnight in 3 ml_ YPD cultures at 30°C, and the cell pellets were sporulated (Ludlow et al., 2013 Nat Methods.
- Tetrads were stained using DiBAC4(5) (Anaspec AS-84701) as follows: 1 ml_ sporulation culture was washed once in phosphate buffered saline (PBS) and resuspended and stained for 1 minute in the dark in 1 ml_ PBS with a final concentration of ⁇ g/mL DiBAC4(5); cells were washed twice in 1 ml_ PBS, resuspended in 5 ml PBS and briefly sonicated.
- PBS phosphate buffered saline
- tetrads were separated from dyads by gating the far lower right population on FSC-W/FSC-H and the high FSC-W/PE population (FIG. 3).
- tetrads were sorted onto 8 YPD plates (200 tetrads per plate) supplemented with G418 and nourseothricin (3.2x10 3 tetrads in total). Spores were disrupted and plated as described previously (Ludlow et al., 2013 Nat Methods.
- Strains grown in this same 96-well format were grouped into 4 pools for genotyping. All strains included in pools contained both drug marker cassettes, and equal numbers of purple and white strains were selected for pools. Pool 1 included 74 purple strains from cross N-K. Pool 2 included 14 purple strains from cross K-N. Pool 3 included 14 white strains from cross N-K, and Pool 4 included 74 white strains from cross K-N. While only the 42 kb fine mapping region required sequencing, it proved cost-effective to whole genome sequence each of the 4 pools using standard lllumina methods (https://www.illumina.com/).
- the global maximum likelihood strain estimate plot depicted in FIG. 19 indicates the genetic region most likely associated with the purple phenotype. Circles on the plot depict individual RAD-seq markers and their relative likelihood (p-value) of being linked to the purple phenotype. Ovals at the top depict the genes within the region of the markers. Fine mapping results identify the causative genes of the purple trait on CHROMagar Candida as the tandemly arrayed PH03 and PH05 acid phosphatases (colored gray ovals).
- PH03 and PH05 share 87% amino acid sequence homology (Bajwa et al., 1984 Nucleic Acids Res 12: 7721-7739); however, they are differentially regulated. PH03 is expressed regardless of internal phosphate concentration while PH05, known as the repressible acid phosphatase, is expressed only in phosphate limiting conditions (Nosaka et al., 1989 FEMS Microbiol Lett 51 : 55-59; O'Neill et al., 1996 Science 271 : 209-212; Sambuk et al., 201 1 Acid phosphatases of budding yeast as a model of choice for transcription regulation research. Enzyme Res 2011 : 356093).
- fluorescent dyes can be used to further isolate tetrads from diploids. For example, because fluorescence constructs are present in unsporulated diploids, marked recombinant progeny and unsporulated diploids fluoresce, thereby confounding the isolation of recombinant tetrads. However, fluorescent dyes are able to accumulate in the interspore area of a tetrad. Using two channel flow cytometry, tetrads can be isolated from diploids by FACS gating based on size and fluorescent dye staining (e.g. red fluorescence).
- Tetrads within the population that harbor spores with a recombination event in the interval will also be positive for fluorescence conferred by expression of both parts of the fluorescent protein(s) (e.g. green for GFP; FIG. 13).
- This enhancement of the method can further expedite gene mapping by pre-screening unsporulated diploids out of a mapping analysis.
- Fluorescent markers including fluorescent dyes, have a wide range of absorption/emission profiles. Sorters typically have several filter options to use with each laser so that the user can select narrow bands of the emission profile which helps to separate fluorescent markers that have emission profiles that bleed over into the other markers.
- DiBAC 4 (5) is a red fluorescent dye and a structural analog of the commonly used oxonol, DiBAC 4 (3). However, it's emission spectrum has little overlap in the green channel, reducing compensation adjustments required for flow cytometry gating when used in conjunction with green fluorescent markers such as GFP or stains such as FITC (Hernlem and Hua, Curr Microbiol. 61 : 57-63, 2010).
- particular embodiments can utilize combinations of fluorescent signals (fluorescent dyes and a unified fluorescent protein) wherein the fluorescent signals are chosen in combinations to reduce or avoid overlap between emission profiles.
- the selected fluorescent signals will have emission wavelength peaks that are separated by at least 50 nm; at least 100 nm; at least 150 nm; or at least 200 nm.
- the emission wavelength peak of DiBAC4(5) is 616 nm and GFP's emission wavelength peak is 510.
- Propidium iodide (PI) has an emission wavelength peak similar to DiBAC4(5), and in particular embodiments is beneficially used in combination with GFP.
- DiBAC4(3) can be used in combination with Red Fluorescent Protein (RFP).
- the emission wavelength peak of DiBAC4(3) is 516 nm and RFP's emission wavelength peak is 584.
- At least three aspects of the described method can create significant efficiencies alone or in combination: (1) identification of offspring with a genetic recombination within the genomic region of interest; (2) restriction sites inserted around the genomic region of interest to shorten the length of genome requiring sequencing; and (3) isolation of recombinant tetrads from unsporulated diploids.
- Exemplary data-processing architecture Aspects of the current disclosure are described in terms of algorithms and/or symbolic representations of operations on data bits and/or binary digital signals stored within a computing system, such as within a computer and/or computing system memory. These algorithmic descriptions and/or representations are the techniques used by those of ordinary skill in the data processing arts to convey the substance of their work to others skilled in the art.
- An algorithm is here, and generally, considered to be, a self-consistent sequence of operations and/or similar processing leading to a desired result.
- the operations and/or processing may involve physical manipulations of physical quantities. Typically, although not necessarily, these quantities may take the form of electrical and/or magnetic signals capable of being stored, transferred, combined, compared and/or otherwise manipulated.
- FIG. 20 depicts is a high-level diagram showing components of a data-processing system 2001 for analyzing data and performing other analyses described herein, and related components.
- the system 2001 may include a processor 2086, a peripheral system 2020, a user interface system 2030, and a data storage system 2040.
- the peripheral system 2020, the user interface system 2030 and the data storage system 2040 are communicatively connected to the processor 2086.
- Processor 2086 can be communicatively connected to network 2050 (shown in phantom), e.g., the Internet or other communications network, as discussed below.
- network 2050 shown in phantom
- the term “device” can refer to any one or more of processor 2086, peripheral system 2020, user interface system 2030, data storage system 2040. Any of these, or other devices, can each connect to one or more network(s) 2050.
- Processor 2086 can each include one or more microprocessors, microcontrollers, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), programmable logic devices (PLDs), programmable logic arrays (PLAs), programmable array logic devices (PALs), or digital signal processors (DSPs).
- FPGAs field-programmable gate arrays
- ASICs application-specific integrated circuits
- PLDs programmable logic devices
- PLAs programmable logic arrays
- PALs programmable array logic devices
- DSPs digital signal processors
- Processor 2086 can implement processes of various aspects described herein.
- Processor 2086 can be or include one or more device(s) for automatically operating on data, e.g., a central processing unit (CPU), microcontroller (MCU), desktop computer, laptop computer, mainframe computer, personal digital assistant, digital camera, cellular phone, smartphone, or any other device for processing data, managing data, or handling data, whether implemented with electrical, magnetic, optical, biological components, or otherwise.
- CPU central processing unit
- MCU microcontroller
- desktop computer laptop computer
- mainframe computer mainframe computer
- personal digital assistant digital camera
- cellular phone smartphone
- any other device for processing data, managing data, or handling data, whether implemented with electrical, magnetic, optical, biological components, or otherwise.
- the phrase "communicatively connected” includes any type of connection, wired or wireless, for communicating data between devices or processors. These devices or processors can be located in physical proximity or not. For example, subsystems such as peripheral system 2020, user interface system 2030, and data storage system 2040 are shown separately from the data processing system 2086 but can be stored completely or partially within the data processing system 2086.
- the peripheral system 2020 can include or be communicatively connected with one or more devices configured or otherwise adapted to provide digital content records to the processor 2086 or to take action in response to processor 186.
- the peripheral system 2020 can include digital still cameras, digital video cameras, DNA sequencers, flow cytometers, or other data generating equipment.
- the processor 2086 upon receipt of digital content from a device in the peripheral system 2020, can store such digital content in the data storage system 2040.
- the user interface system 2030 can convey information in either direction, or in both directions, between a user 2038 and the processor 2086 or other components of system 2001.
- the user interface system 2030 can include a mouse, a keyboard, another computer (connected, e.g., via a network or a null-modem cable), or any device or combination of devices from which data is input to the processor 2086.
- the user interface system 2030 also can include a display device, a printer, a processor-accessible memory, or any device or combination of devices to which data is output by the processor 2086.
- the user interface system 2030 and the data storage system 2040 can share a processor-accessible memory.
- processor 2086 includes or is connected to communication interface 2015 that is coupled via network link 2016 (shown in phantom) to network 2050.
- communication interface 2015 can include an integrated services digital network (ISDN) terminal adapter or a modem to communicate data via a telephone line; a network interface to communicate data via a local-area network (LAN), e.g., an Ethernet LAN, or wide-area network (WAN); or a radio to communicate data via a wireless link, e.g., WiFi or GSM.
- ISDN integrated services digital network
- LAN local-area network
- WAN wide-area network
- Radio e.g., WiFi or GSM.
- Communication interface 2015 sends and receives electrical, electromagnetic or optical signals that carry digital or analog data streams representing various types of information across network link 2016 to network 2050.
- Network link 2016 can be connected to network 2050 via a switch, gateway, hub, router, or other networking device.
- system 2001 can communicate, e.g., via network 2050, with a data processing system 2002, which can include the same types of components as system 2001 but is not required to be identical thereto.
- Systems 2001 , 2002 are communicatively connected via the network 2050.
- Each system 2001 , 2002 executes computer program instructions to carry out functions disclosed herein.
- Processor 2086 can send messages and receive data, including program code, through network 2050, network link 2016 and communication interface 2015.
- a server can store requested code for an application program (e.g., a JAVA applet) on a tangible non-volatile computer-readable storage medium to which it is connected.
- the server can retrieve the code from the medium and transmit it through network 2050 to communication interface 2015.
- the received code can be executed by processor 2086 as it is received, or stored in data storage system 2040 for later execution.
- Data storage system 2040 can include or be communicatively connected with one or more processor-accessible memories configured or otherwise adapted to store information.
- the memories can be internal, e.g., within a chassis, or as parts of a distributed system.
- processor-accessible memory is intended to include any data storage device to or from which processor 2086 can transfer data (using appropriate components of peripheral system 2020), whether volatile or nonvolatile; removable or fixed; electronic, magnetic, optical, chemical, mechanical, or otherwise.
- processor-accessible memories include but are not limited to: registers, floppy disks, hard disks, tapes, bar codes, Compact Discs, DVDs, read-only memories (ROM), erasable programmable read-only memories (EPROM, EEPROM, or Flash), and random-access memories (RAMs).
- One of the processor-accessible memories in the data storage system 2040 can be a tangible non-transitory computer-readable storage medium, i.e. a non-transitory device or article of manufacture that participates in storing instructions that can be provided to processor 2086 for execution.
- data storage system 2040 includes code memory 2041 , e.g., a RAM, and disk 2043, e.g., a tangible computer-readable storage device or medium such as a hard drive.
- Computer program instructions are read into code memory 2041 from disk 2043.
- Processor 2086 then executes one or more sequences of the computer program instructions loaded into code memory 2041 , as a result performing process steps described herein. In this way, processor 2086 carries out a computer implemented process. For example, steps of methods 900, 1000, 1100, 1102, 1 104, and 1 106 described herein, blocks of the flowchart illustrations or block diagrams herein, and combinations of those, can be implemented by computer program instructions.
- Code memory 2041 can also store data or can store only code.
- at least one of code memory 2041 or disk 2043 can be or include a computer-readable medium (CRM), e.g., a tangible non-transitory computer storage medium.
- CRM computer-readable medium
- aspects described herein may be embodied as systems or methods. Accordingly, various aspects herein may take the form of an entirely hardware aspect, an entirely software aspect (including firmware, resident software, micro-code, etc.), or an aspect combining software and hardware aspects These aspects can all generally be referred to herein as a "service,” “circuit,” “circuitry,” “module,” or “system.”
- various aspects herein may be embodied as computer program products including computer readable program code ("program code”) stored on a computer readable medium, e.g., a tangible non-transitory computer storage medium or a communication medium.
- a computer storage medium can include tangible storage units such as volatile memory, nonvolatile memory, or other persistent or auxiliary computer storage media, removable and nonremovable computer storage media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data.
- a computer storage medium can be manufactured as is conventional for such articles, e.g., by pressing a CD-ROM or electronically writing data into a Flash memory.
- communication media may embody computer-readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transmission mechanism.
- a modulated data signal such as a carrier wave or other transmission mechanism.
- computer storage media do not include communication media. That is, computer storage media do not include communications media consisting solely of a modulated data signal, a carrier wave, or a propagated signal, per se.
- the program code includes computer program instructions that can be loaded into processor 2086 (and possibly also other processors), and that, when loaded into processor 2086, cause functions, acts, or operational steps of various aspects herein to be performed by processor 2086 (or other processor).
- Computer program code for carrying out operations for various aspects described herein may be written in any combination of one or more programming language(s), and can be loaded from disk 2043 into code memory 2041 for execution.
- the program code may execute, e.g., entirely on processor 2086, partly on processor 2086 and partly on a remote computer connected to network 2050, or entirely on the remote computer.
- processor(s) 2086 and, if required, data storage system 2040 or portions thereof, are referred to for brevity herein as a "control unit.”
- a control unit can include a CPU or DSP and a computer storage medium or other tangible, non-transitory computer-readable medium storing instructions executable by that CPU or DSP to cause that CPU or DSP to perform functions described herein.
- a control unit can include an ASIC, FPGA, or other logic device(s) wired (e.g., physically, or via blown fuses or logic- cell configuration data) to perform functions described herein.
- a "control unit" as described herein includes processor(s) 2086.
- a control unit can also include, if required, data storage system 2040 or portions thereof.
- a control unit can include a CPU or DSP and a computer storage medium or other tangible, non-transitory computer-readable medium storing instructions executable by that CPU or DSP to cause that CPU or DSP to perform functions described herein.
- a control unit can include an ASIC, FPGA, or other logic device(s) wired (e.g., physically, or via blown fuses or logic-cell configuration data) to perform functions described herein.
- a control unit does not include computer-readable media storing executable instructions.
- Computing systems can belong to, or include, a variety of categories or classes of devices such as traditional server-type devices, desktop computer-type devices, mobile-type devices, special purpose-type devices, and/or embedded-type devices.
- any of embodiments 1-3 wherein the fluorescent dye is selected from xanthene dyes, fluorescein dyes, rhodamine dyes, fluorescein isothiocyanate (FITC), 6 carboxyfluorescein (FAM), 6 carboxy-2',4',7',4,7-hexachlorofluorescein (HEX), 6 carboxy 4', 5' dichloro 2', 7' dimethoxyfluorescein (JOE or J), ⁇ , ⁇ , ⁇ ', ⁇ ' tetramethyl 6 carboxyrhodamine (TAMRA or T), 6 carboxy X rhodamine (ROX or R), 5 carboxyrhodamine 6G (R6G5 or G5), 6 carboxyrhodamine 6G (R6G6 or G6), and rhodamine 110; cyanine dyes, e.g.
- Cy3, Cy5 and Cy7 dyes Alexa dyes, e.g. Alexa-fluor-555; coumarin, Diethylaminocoumarin, umbelliferone; benzamide dyes, e.g. Hoechst 33258; phenanthridine dyes, e.g. Texas Red; ethidium dyes; acridine dyes; carbazole dyes; phenoxazine dyes; porphyrin dyes; polymethine dyes, BODIPY dyes, quinoline dyes, Pyrene, Fluorescein Chlorotriazinyl, R1 10, Eosin, Tetramethylrhodamine, Lissamine, or Napthofluorescein.
- Alexa dyes e.g. Alexa-fluor-555
- coumarin Diethylaminocoumarin, umbelliferone
- benzamide dyes e.g. Hoechst 33258
- the vital dye is selected from Bis-(1 ,3-dibutylbarbituric acid) pentamethine oxonol; Anaspec AS-84701 , calcein AM, carboxyfluorescein diacetate, copper phthalocyanine tetrasulfonate, DiOC (3,3'-dihexyloxacarbocyanine iodide), Evans blue, gadolinium texaphyrin, indocyanine green monosodium salt, isosulfan, methylene blue, Nile red, patent blue V, patent blue VF, propodium iodide, rhodamine 123, and sulfobromophthaleine.
- the vital dye is selected from Bis-(1 ,3-dibutylbarbituric acid) pentamethine oxonol; Anaspec AS-84701 , calcein AM, carboxyfluorescein diacetate, copper phthalocyanine t
- a method of sorting tetrads from vegetative cells, dyads, and dead cells including: incubating a mixture of tetrads, vegetative cells, dyads, and dead cells in a fluorescent dye solution to produce a stained mixture of cells; and
- the fluorescent dye solution includes a xanthene dye, fluorescein dye, rhodamine dye, FITC, FAM, HEX, JOE, TAMRA, ROX, R6G5, R6G6, rhodamine 110; cyanine dye, Cy3, Cy5 Cy7; Alexa dye, Alexa-fluor-555; coumarin, Diethylaminocoumarin, umbelliferone; benzamide dye, Hoechst 33258; phenanthridine dye, Texas Red; ethidium dye; acridine dye; carbazole dye; phenoxazine dye; porphyrin dye; polymethine dye, BODIPY dye, quinoline dye, Pyrene, Fluorescein Chlorotriazinyl, R110, Eosin, Tetramethylrhodamine, Lissamine, or Napthofluorescein.
- a method of embodiment 13 wherein the vital dye is selected from Bis-(1 ,3-dibutylbarbituric acid) pentamethine oxonol; Anaspec AS-84701 , calcein AM, carboxyfluorescein diacetate, copper phthalocyanine tetrasulfonate, DiOC (3,3'-dihexyloxacarbocyanine iodide), Evans blue, gadolinium texaphyrin, indocyanine green monosodium salt, isosulfan, methylene blue, Nile red, patent blue V, patent blue VF, propodium iodide, rhodamine 123, and sulfobromophthaleine.
- the vital dye is selected from Bis-(1 ,3-dibutylbarbituric acid) pentamethine oxonol; Anaspec AS-84701 , calcein AM, carboxyfluorescein diacetate, copper
- a method of embodiment 13 wherein the vital dye is pentamethine oxonol or propodium iodide.
- a method of embodiment 16 wherein the FACS-based sorting utilizes fluorescence intensity to sort tetrads, dyads, and dead cells away from live vegetative cells.
- a method of embodiment 16 or 17 wherein the FACS-based sorting utilizes 488nm emission and a 595LP 610/20 filter.
- a method of performing a synthetic lethality screen including:
- a method of embodiment 20 further including sorting tetrads with at least one dead spore from other tetrads, vegetative cells, dyads, and dead cells based on an optical characteristic attributable to the fluorescent dye.
- the fluorescent dye solution includes a xanthene dye, fluorescein dye, rhodamine dye, FITC, FAM, HEX, JOE, TAMRA, ROX, R6G5, R6G6, rhodamine 1 10; cyanine dye, Cy3, Cy5 Cy7; Alexa dye, Alexa-fluor-555; coumarin, Diethylaminocoumarin, umbelliferone; benzamide dye, Hoechst 33258; phenanthridine dye, Texas Red; ethidium dye; acridine dye; carbazole dye; phenoxazine dye; porphyrin dye; polymethine dye, BODIPY dye, quinoline dye, Pyrene, Fluorescein Chlorotriazinyl, R110, Eosin, Tetramethylrhodamine, Lissamine, or Napthofluorescein.
- a method of embodiment 23 wherein the vital dye is selected from Bis-(1 ,3-dibutylbarbituric acid) pentamethine oxonol; Anaspec AS-84701 , calcein AM, carboxyfluorescein diacetate, copper phthalocyanine tetrasulfonate, DiOC (3,3'-dihexyloxacarbocyanine iodide), Evans blue, gadolinium texaphyrin, indocyanine green monosodium salt, isosulfan, methylene blue, Nile red, patent blue V, patent blue VF, propodium iodide, rhodamine 123, and sulfobromophthaleine.
- the vital dye is selected from Bis-(1 ,3-dibutylbarbituric acid) pentamethine oxonol; Anaspec AS-84701 , calcein AM, carboxyfluorescein diacetate, copper
- a method of embodiment 23 wherein the vital dye is pentamethine oxonol or propodium iodide.
- a method of embodiment 26 wherein the FACS-based sorting utilizes fluorescence intensity to sort tetrads with a dead spore from tetrads having all living spores; dyads; and live vegetative cells.
- a method of embodiment 26 or 27 wherein the FACS-based sorting utilizes 488nm emission and a 595LP 610/20 filter.
- a method of capturing the tetrad relationship of recombinant progeny from a yeast cross using patterns of natural genetic sequences including:
- a method of embodiment 30 wherein the aspects of the natural genetic sequence include centromere-linked markers; allele presence; and/or location and/or number of recombination events.
- a method of embodiment 30 or 31 wherein the sequencing is whole genome sequencing or restriction-associated DNA (RAD) sequencing.
- a method of embodiment 30 or 31 wherein the sequencing includes sequencing less than 20% of the whole genome; less than 10% of the whole genome; or less than 5% of the whole genome.
- a method of embodiment 30 or 31 wherein the sequencing includes sequencing 3% of the whole genome.
- a method of any of embodiments 30-36 further including assessing and/or refining the grouping utilizing mutual information between two or more of the recombinant progeny.
- a method of any of embodiments 30-38 further including assessing and/or refining the grouping utilizing delta scores.
- a method of any of embodiments 30-38 further including assessing and/or refining the grouping by calculating a pair-wise score.
- a computer readable medium encoding computer-readable instructions that, when executed, cause one or more processors to perform the method of any of embodiments 30-40.
- a data-processing system including at least one processor and at least one data storage system, the at least one data storage system including computer-readable instructions that, when executed by the at least one processor, cause the data-processing system to perform the method of any of embodiments 30-40.
- a method of detecting a genetic recombination event or lack thereof in an offspring including:
- a method of embodiment 48 or 49 wherein the first marker and/or the second marker is a drug resistance marker, a fluorescent protein, a cerulenin resistance marker or an auxotrophic marker.
- restriction site is a homing endonuclease restriction site.
- a method of detecting a genetic recombination event or lack thereof in an offspring including:
- a method of any of embodiments 48-64 further including incubating offspring in a fluorescent dye solution.
- a method of embodiment 65 further including separating unsporulated diploids from tetrads having a recombination event.
- Chromosomes of embodiment 70 wherein the marker is a drug resistance marker, a fluorescent protein, a cerulenin resistance marker or an auxotrophic marker.
- Chromosomes of embodiment 70 or 71 wherein different chromosomes include different genetic constructs encoding different markers.
- Chromosomes of embodiment 79 or 80 wherein the genetic constructs include a promoter, a sequence encoding an interaction domain, and optionally a rare or unique restriction site.
- a rare or unique restriction site optionally, a rare or unique restriction site.
- a mating pair from a sporulating organism that utilizes meiosis in sexual reproduction wherein each member of the mating pair includes a chromosome modified with a genetic construct encoding a marker.
- a mating pair of embodiment 90 or 91 wherein different chromosomes include different genetic constructs encoding different markers.
- a mating pair of embodiment 99 or 100 wherein the genetic constructs include a promoter, a sequence encoding an interaction domain, and optionally a rare or unique restriction site.
- a rare or unique restriction site optionally, a rare or unique restriction site.
- a mating pair wherein one member of the mating pair has a chromosome modified to controllably express an aspect of a split marker and the second member of the mating pair has a chromosome modified to controllably express a complementary aspect of the split marker.
- kit for practicing a use or method of any of the preceding embodiments wherein the kit includes one or more of a fluorescent dye, a chromosome of any of embodiments 70-89, and/or a mating pair of any of embodiments 90-110.
- a kit for genetic mapping including a chromosome of any of embodiments 70-89, and/or a mating pair of any of embodiments 90-110.
- a method of capturing the tetrad relationship of recombinant progeny from a yeast cross using patterns of natural genetic sequences including:
- the method of embodiment 115 further including computing a significance cutoff for at least one of: pairs of recombinant progeny, triplets of recombinant progeny, or tetrads of recombinant progeny, the significance cutoff based on background noise.
- a computer readable medium encoding computer-readable instructions that, when executed, cause one or more processors to perform the method of any of embodiments 115-120.
- a data-processing system including at least one processor and at least one data storage system, the at least one data storage system including computer-readable instructions that, when executed by the at least one processor, cause the data-processing system to perform the method of any of embodiments 115-120.
- each embodiment disclosed herein can comprise, consist essentially of or consist of its particular stated element, step, ingredient or component.
- the terms “include” or “including” should be interpreted to recite: “comprise, consist of, or consist essentially of.”
- the transition term “comprise” or “comprises” means includes, but is not limited to, and allows for the inclusion of unspecified elements, steps, ingredients, or components, even in major amounts.
- the transitional phrase “consisting of” excludes any element, step, ingredient or component not specified.
- the transition phrase “consisting essentially of” limits the scope of the embodiment to the specified elements, steps, ingredients or components and to those that do not materially affect the embodiment.
- a material effect would cause a statistically-significant reduction in the ability to - within the appropriate context - (1) sort tetrads from vegetative cells, dyads and dead cells; (2) identify recombinant progeny originating from spores of the same tetrad; or (3) identify a genetic recombination event in an offspring in a genomic region of interest.
- the term "about” has the meaning reasonably ascribed to it by a person skilled in the art when used in conjunction with a stated numerical value or range, i.e. denoting somewhat more or somewhat less than the stated value or range, to within a range of ⁇ 20% of the stated value; ⁇ 19% of the stated value; ⁇ 18% of the stated value; ⁇ 17% of the stated value; ⁇ 16% of the stated value; ⁇ 15% of the stated value; ⁇ 14% of the stated value; ⁇ 13% of the stated value; ⁇ 12% of the stated value; ⁇ 1 1 % of the stated value; ⁇ 10% of the stated value; ⁇ 9% of the stated value; ⁇ 8% of the stated value; ⁇ 7% of the stated value; ⁇ 6% of the stated value; ⁇ 5% of the stated value; ⁇ 4% of the stated value; ⁇ 3% of the stated value; ⁇ 2% of the stated value; or ⁇ 1 % of the stated value.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Physics & Mathematics (AREA)
- Analytical Chemistry (AREA)
- Organic Chemistry (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Genetics & Genomics (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Molecular Biology (AREA)
- Biochemistry (AREA)
- Immunology (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Microbiology (AREA)
- General Engineering & Computer Science (AREA)
- Pathology (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Dispersion Chemistry (AREA)
- Physiology (AREA)
- Ecology (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Biomedical Technology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
L'invention concerne des systèmes et des procédés qui facilitent la recherche génétique. Les systèmes et les procédés peuvent utiliser des colorants fluorescents (1) pour trier des tétrades à partir de cellules végétatives, de dyades et de cellules mortes; (2) des séquences génétiques naturelles permettant de capturer les relations entre les tétrades de la descendance recombinée; et des marqueurs (3) dans des organismes parents pour identifier des événements de recombinaison génétique dans des régions génomiques d'intérêt.
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP17828589.6A EP3485044A4 (fr) | 2016-07-15 | 2017-07-14 | Systèmes et procédés destinés à faciliter la recherche génétique |
| CA3030968A CA3030968A1 (fr) | 2016-07-15 | 2017-07-14 | Systemes et procedes destines a faciliter la recherche genetique |
| US16/317,770 US20190265151A1 (en) | 2016-07-15 | 2017-07-14 | Systems and methods to facilitate genetic research |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201662362708P | 2016-07-15 | 2016-07-15 | |
| US62/362,708 | 2016-07-15 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2018014002A1 true WO2018014002A1 (fr) | 2018-01-18 |
Family
ID=60953419
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2017/042265 Ceased WO2018014002A1 (fr) | 2016-07-15 | 2017-07-14 | Systèmes et procédés destinés à faciliter la recherche génétique |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US20190265151A1 (fr) |
| EP (1) | EP3485044A4 (fr) |
| CA (1) | CA3030968A1 (fr) |
| WO (1) | WO2018014002A1 (fr) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110706755A (zh) * | 2019-08-26 | 2020-01-17 | 上海科技发展有限公司 | 结核菌耐药性检测方法、装置、计算机设备和存储介质 |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20120164677A1 (en) * | 2000-07-24 | 2012-06-28 | Genprime, Inc. | Method and apparatus for viable and nonviable prokaryotic and eukaryotic cell quantitation |
| WO2014059370A1 (fr) * | 2012-10-12 | 2014-04-17 | Institute For Systems Biology | Système à haut débit amélioré pour les études génétiques |
-
2017
- 2017-07-14 WO PCT/US2017/042265 patent/WO2018014002A1/fr not_active Ceased
- 2017-07-14 EP EP17828589.6A patent/EP3485044A4/fr not_active Withdrawn
- 2017-07-14 CA CA3030968A patent/CA3030968A1/fr not_active Abandoned
- 2017-07-14 US US16/317,770 patent/US20190265151A1/en not_active Abandoned
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20120164677A1 (en) * | 2000-07-24 | 2012-06-28 | Genprime, Inc. | Method and apparatus for viable and nonviable prokaryotic and eukaryotic cell quantitation |
| WO2014059370A1 (fr) * | 2012-10-12 | 2014-04-17 | Institute For Systems Biology | Système à haut débit amélioré pour les études génétiques |
Non-Patent Citations (4)
| Title |
|---|
| ANONYMOUS: "BD LSRFortessa X-20 Cell Analyzer", 30 October 2017 (2017-10-30), pages 1 - 2, XP055592347, Retrieved from the Internet <URL:https://www.bdbiosciences.com/documents/lsr_fortessax20_filter_guide.pdf> * |
| IGNAC ET AL.: "Discovering Pair-Wise Genetic Interactions: An Information Theory-Based Approach", PLOS ONE, vol. 9, no. 3, 26 March 2014 (2014-03-26), pages 1 - 14, XP055455524 * |
| SAKHANENKO ET AL.: "Biological Data Analysis as an Information Theory Problem: Multivariable Dependence Measures and the Shadows Algorithm", JOURNAL OF COMPUTATIONAL BIOLOGY, vol. 22, no. 11, 2015, pages 1005 - 1024, XP055455517 * |
| See also references of EP3485044A4 * |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110706755A (zh) * | 2019-08-26 | 2020-01-17 | 上海科技发展有限公司 | 结核菌耐药性检测方法、装置、计算机设备和存储介质 |
Also Published As
| Publication number | Publication date |
|---|---|
| EP3485044A4 (fr) | 2020-01-22 |
| US20190265151A1 (en) | 2019-08-29 |
| CA3030968A1 (fr) | 2018-01-18 |
| EP3485044A1 (fr) | 2019-05-22 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Duina et al. | Budding yeast for budding geneticists: a primer on the Saccharomyces cerevisiae model system | |
| Ryan et al. | Selection of chromosomal DNA libraries using a multiplex CRISPR system | |
| Dunlap et al. | Enabling a community to dissect an organism: overview of the Neurospora functional genomics project | |
| Wilkening et al. | An evaluation of high-throughput approaches to QTL mapping in Saccharomyces cerevisiae | |
| Fleiss et al. | Reshuffling yeast chromosomes with CRISPR/Cas9 | |
| Fournier et al. | High-quality de novo genome assembly of the Dekkera bruxellensis yeast using nanopore MinION sequencing | |
| Kalapis et al. | Evolution of robustness to protein mistranslation by accelerated protein turnover | |
| Thacker et al. | Exploiting spore-autonomous fluorescent protein expression to quantify meiotic chromosome behaviors in Saccharomyces cerevisiae | |
| Zadrag-Tecza et al. | Cell size influences the reproductive potential and total lifespan of the Saccharomyces cerevisiae yeast as revealed by the analysis of polyploid strains | |
| Ludlow et al. | High-throughput tetrad analysis | |
| Mozzachiodi et al. | Aborting meiosis allows recombination in sterile diploid yeast hybrids | |
| Schmidt et al. | Evaluation of Saccharomyces cerevisiae wine yeast competitive fitness in enologically relevant environments by barcode sequencing | |
| US7074584B2 (en) | Yeast arrays, methods of making such arrays, and methods of analyzing such arrays | |
| Cachera et al. | CRI-SPA: a high-throughput method for systematic genetic editing of yeast libraries | |
| Goold et al. | Construction and iterative redesign of synXVI a 903 kb synthetic Saccharomyces cerevisiae chromosome | |
| Alvaro et al. | Systematic hybrid LOH: a new method to reduce false positives and negatives during screening of yeast gene deletion libraries | |
| US20190265151A1 (en) | Systems and methods to facilitate genetic research | |
| WO2014059370A1 (fr) | Système à haut débit amélioré pour les études génétiques | |
| Berry et al. | Cats: cas9-assisted tag switching. A high-throughput method for exchanging genomic peptide tags in yeast | |
| Serero et al. | Recombination in a sterile polyploid hybrid yeast upon meiotic Return-To-Growth | |
| Salzberg et al. | A widespread inversion polymorphism conserved among Saccharomyces species is caused by recurrent homogenization of a sporulation gene family | |
| Ota et al. | An efficient method for isolating mating‐competent cells from bottom‐fermenting yeast using mating pheromone‐supersensitive mutants | |
| Pačnik et al. | Identification of novel genes involved in neutral lipid storage by quantitative trait loci analysis of Saccharomyces cerevisiae | |
| Yone et al. | Gene mapping methodology powered by induced genome rearrangements | |
| Schell et al. | Genetic architecture of a mutation’s expressivity |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17828589 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 3030968 Country of ref document: CA |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2017828589 Country of ref document: EP Effective date: 20190215 |