WO2004111267A2 - Methodes de preparation d'une bibliotheque de reserves pourvues de plaques ayant une resolution inferieure a une megabase et utilisations - Google Patents
Methodes de preparation d'une bibliotheque de reserves pourvues de plaques ayant une resolution inferieure a une megabase et utilisations Download PDFInfo
- Publication number
- WO2004111267A2 WO2004111267A2 PCT/CA2004/000859 CA2004000859W WO2004111267A2 WO 2004111267 A2 WO2004111267 A2 WO 2004111267A2 CA 2004000859 W CA2004000859 W CA 2004000859W WO 2004111267 A2 WO2004111267 A2 WO 2004111267A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- genomic
- clones
- smrt
- clone
- genome
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 129
- 238000002360 preparation method Methods 0.000 title description 27
- 239000012634 fragment Substances 0.000 claims abstract description 121
- 239000000523 sample Substances 0.000 claims abstract description 70
- 150000007523 nucleic acids Chemical group 0.000 claims abstract description 48
- 108020004414 DNA Proteins 0.000 claims description 152
- 230000003321 amplification Effects 0.000 claims description 73
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 73
- 210000000349 chromosome Anatomy 0.000 claims description 60
- 238000004458 analytical method Methods 0.000 claims description 51
- 238000009396 hybridization Methods 0.000 claims description 51
- 108090000623 proteins and genes Proteins 0.000 claims description 30
- 201000010099 disease Diseases 0.000 claims description 24
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 24
- 102000039446 nucleic acids Human genes 0.000 claims description 22
- 108020004707 nucleic acids Proteins 0.000 claims description 22
- 239000007787 solid Substances 0.000 claims description 21
- 238000012545 processing Methods 0.000 claims description 18
- 108091034117 Oligonucleotide Proteins 0.000 claims description 14
- 230000000052 comparative effect Effects 0.000 claims description 14
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 claims description 13
- 238000012300 Sequence Analysis Methods 0.000 claims description 11
- 230000004077 genetic alteration Effects 0.000 claims description 11
- 231100000118 genetic alteration Toxicity 0.000 claims description 11
- 239000003153 chemical reaction reagent Substances 0.000 claims description 9
- 238000001514 detection method Methods 0.000 claims description 7
- 238000011282 treatment Methods 0.000 claims description 7
- 102000052510 DNA-Binding Proteins Human genes 0.000 claims description 5
- 238000000151 deposition Methods 0.000 claims description 5
- 230000008995 epigenetic change Effects 0.000 claims description 5
- 102000004169 proteins and genes Human genes 0.000 claims description 5
- 108010077544 Chromatin Proteins 0.000 claims description 4
- 108700020911 DNA-Binding Proteins Proteins 0.000 claims description 4
- 210000003483 chromatin Anatomy 0.000 claims description 4
- 230000014509 gene expression Effects 0.000 claims description 4
- 239000013612 plasmid Substances 0.000 claims description 4
- 238000007385 chemical modification Methods 0.000 claims description 3
- 238000003745 diagnosis Methods 0.000 claims description 3
- 230000007067 DNA methylation Effects 0.000 claims description 2
- 208000032236 Predisposition to disease Diseases 0.000 claims description 2
- 108020005202 Viral DNA Proteins 0.000 claims description 2
- 238000002487 chromatin immunoprecipitation Methods 0.000 claims description 2
- 230000008711 chromosomal rearrangement Effects 0.000 claims description 2
- 102000054765 polymorphisms of proteins Human genes 0.000 claims description 2
- 230000005945 translocation Effects 0.000 claims description 2
- 102100030569 Nuclear receptor corepressor 2 Human genes 0.000 claims 1
- 101710153660 Nuclear receptor corepressor 2 Proteins 0.000 claims 1
- 238000003499 nucleic acid array Methods 0.000 abstract description 4
- 238000007899 nucleic acid hybridization Methods 0.000 abstract description 3
- 238000002509 fluorescent in situ hybridization Methods 0.000 abstract description 2
- 210000004436 artificial bacterial chromosome Anatomy 0.000 description 104
- 241000282414 Homo sapiens Species 0.000 description 57
- 238000003491 array Methods 0.000 description 52
- 238000006243 chemical reaction Methods 0.000 description 47
- 238000012163 sequencing technique Methods 0.000 description 47
- 239000013615 primer Substances 0.000 description 45
- 206010028980 Neoplasm Diseases 0.000 description 35
- 238000012217 deletion Methods 0.000 description 26
- 230000037430 deletion Effects 0.000 description 25
- 210000004027 cell Anatomy 0.000 description 24
- 238000012360 testing method Methods 0.000 description 23
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 22
- 239000000243 solution Substances 0.000 description 22
- 201000011510 cancer Diseases 0.000 description 21
- 239000000047 product Substances 0.000 description 21
- 108091008146 restriction endonucleases Proteins 0.000 description 20
- 239000013598 vector Substances 0.000 description 17
- 238000000126 in silico method Methods 0.000 description 16
- 238000003908 quality control method Methods 0.000 description 16
- 230000027455 binding Effects 0.000 description 15
- 230000004075 alteration Effects 0.000 description 14
- 239000002299 complementary DNA Substances 0.000 description 14
- 238000013507 mapping Methods 0.000 description 14
- 239000002773 nucleotide Substances 0.000 description 14
- ZHNUHDYFZUAESO-UHFFFAOYSA-N Formamide Chemical compound NC=O ZHNUHDYFZUAESO-UHFFFAOYSA-N 0.000 description 12
- 125000003729 nucleotide group Chemical group 0.000 description 12
- 238000007854 ligation-mediated PCR Methods 0.000 description 11
- 230000011987 methylation Effects 0.000 description 11
- 238000007069 methylation reaction Methods 0.000 description 11
- 230000008569 process Effects 0.000 description 11
- 238000010200 validation analysis Methods 0.000 description 11
- 239000000872 buffer Substances 0.000 description 10
- 238000000746 purification Methods 0.000 description 10
- 101100384865 Neurospora crassa (strain ATCC 24698 / 74-OR23-1A / CBS 708.71 / DSM 1257 / FGSC 987) cot-1 gene Proteins 0.000 description 9
- 238000002372 labelling Methods 0.000 description 9
- 230000031864 metaphase Effects 0.000 description 9
- 208000024191 minimally invasive lung adenocarcinoma Diseases 0.000 description 9
- 238000011160 research Methods 0.000 description 9
- 239000000758 substrate Substances 0.000 description 9
- 230000002759 chromosomal effect Effects 0.000 description 8
- 238000010276 construction Methods 0.000 description 8
- 238000002474 experimental method Methods 0.000 description 8
- 239000000203 mixture Substances 0.000 description 8
- 238000012546 transfer Methods 0.000 description 8
- 208000009458 Carcinoma in Situ Diseases 0.000 description 7
- 238000013459 approach Methods 0.000 description 7
- 201000004933 in situ carcinoma Diseases 0.000 description 7
- 239000008188 pellet Substances 0.000 description 7
- 230000008707 rearrangement Effects 0.000 description 7
- HEMHJVSKTPXQMS-UHFFFAOYSA-M Sodium hydroxide Chemical compound [OH-].[Na+] HEMHJVSKTPXQMS-UHFFFAOYSA-M 0.000 description 6
- 238000013377 clone selection method Methods 0.000 description 6
- 230000000295 complement effect Effects 0.000 description 6
- 238000007865 diluting Methods 0.000 description 6
- 238000010790 dilution Methods 0.000 description 6
- 239000012895 dilution Substances 0.000 description 6
- 230000004807 localization Effects 0.000 description 6
- 239000012528 membrane Substances 0.000 description 6
- 230000009467 reduction Effects 0.000 description 6
- 108091035539 telomere Proteins 0.000 description 6
- 210000003411 telomere Anatomy 0.000 description 6
- 102000055501 telomere Human genes 0.000 description 6
- 241001465754 Metazoa Species 0.000 description 5
- 210000001106 artificial yeast chromosome Anatomy 0.000 description 5
- 230000029087 digestion Effects 0.000 description 5
- 238000013467 fragmentation Methods 0.000 description 5
- 238000006062 fragmentation reaction Methods 0.000 description 5
- 238000003384 imaging method Methods 0.000 description 5
- 239000007788 liquid Substances 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000003252 repetitive effect Effects 0.000 description 5
- 238000011012 sanitization Methods 0.000 description 5
- 230000002103 transcriptional effect Effects 0.000 description 5
- QGKMIGUHVLGJBR-UHFFFAOYSA-M (4z)-1-(3-methylbutyl)-4-[[1-(3-methylbutyl)quinolin-1-ium-4-yl]methylidene]quinoline;iodide Chemical compound [I-].C12=CC=CC=C2N(CCC(C)C)C=CC1=CC1=CC=[N+](CCC(C)C)C2=CC=CC=C12 QGKMIGUHVLGJBR-UHFFFAOYSA-M 0.000 description 4
- 206010006187 Breast cancer Diseases 0.000 description 4
- 208000026310 Breast neoplasm Diseases 0.000 description 4
- 108010083123 CDX2 Transcription Factor Proteins 0.000 description 4
- 102000006277 CDX2 Transcription Factor Human genes 0.000 description 4
- IAZDPXIOMUYVGZ-UHFFFAOYSA-N Dimethylsulphoxide Chemical compound CS(C)=O IAZDPXIOMUYVGZ-UHFFFAOYSA-N 0.000 description 4
- 102000004190 Enzymes Human genes 0.000 description 4
- 108090000790 Enzymes Proteins 0.000 description 4
- KFZMGEQAYNKOFK-UHFFFAOYSA-N Isopropanol Chemical compound CC(C)O KFZMGEQAYNKOFK-UHFFFAOYSA-N 0.000 description 4
- 206010058467 Lung neoplasm malignant Diseases 0.000 description 4
- 206010025323 Lymphomas Diseases 0.000 description 4
- 238000002105 Southern blotting Methods 0.000 description 4
- 230000000903 blocking effect Effects 0.000 description 4
- 230000002860 competitive effect Effects 0.000 description 4
- 238000011109 contamination Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 229960001760 dimethyl sulfoxide Drugs 0.000 description 4
- 239000012154 double-distilled water Substances 0.000 description 4
- 201000005202 lung cancer Diseases 0.000 description 4
- 208000020816 lung neoplasm Diseases 0.000 description 4
- 238000004519 manufacturing process Methods 0.000 description 4
- 239000002987 primer (paints) Substances 0.000 description 4
- 230000037452 priming Effects 0.000 description 4
- 239000011541 reaction mixture Substances 0.000 description 4
- 230000002829 reductive effect Effects 0.000 description 4
- 238000012795 verification Methods 0.000 description 4
- 230000004304 visual acuity Effects 0.000 description 4
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 4
- 206010009944 Colon cancer Diseases 0.000 description 3
- 208000001333 Colorectal Neoplasms Diseases 0.000 description 3
- 108010009392 Cyclin-Dependent Kinase Inhibitor p16 Proteins 0.000 description 3
- 238000001712 DNA sequencing Methods 0.000 description 3
- 206010059866 Drug resistance Diseases 0.000 description 3
- 230000010558 Gene Alterations Effects 0.000 description 3
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 3
- 208000025205 Mantle-Cell Lymphoma Diseases 0.000 description 3
- 108091092878 Microsatellite Proteins 0.000 description 3
- 208000003445 Mouth Neoplasms Diseases 0.000 description 3
- 108091028043 Nucleic acid sequence Proteins 0.000 description 3
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 3
- 229920005654 Sephadex Polymers 0.000 description 3
- 239000012507 Sephadex™ Substances 0.000 description 3
- 208000000102 Squamous Cell Carcinoma of Head and Neck Diseases 0.000 description 3
- 239000007844 bleaching agent Substances 0.000 description 3
- 238000010367 cloning Methods 0.000 description 3
- 238000004132 cross linking Methods 0.000 description 3
- 230000001419 dependent effect Effects 0.000 description 3
- 229960000633 dextran sulfate Drugs 0.000 description 3
- 239000003085 diluting agent Substances 0.000 description 3
- 229940094991 herring sperm dna Drugs 0.000 description 3
- 208000012987 lip and oral cavity carcinoma Diseases 0.000 description 3
- 238000002493 microarray Methods 0.000 description 3
- 230000037361 pathway Effects 0.000 description 3
- 238000007639 printing Methods 0.000 description 3
- 230000001105 regulatory effect Effects 0.000 description 3
- 230000035945 sensitivity Effects 0.000 description 3
- 208000000587 small cell lung carcinoma Diseases 0.000 description 3
- 238000010186 staining Methods 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 description 2
- 102100021569 Apoptosis regulator Bcl-2 Human genes 0.000 description 2
- IJGRMHOSHXDMSA-UHFFFAOYSA-N Atomic nitrogen Chemical compound N#N IJGRMHOSHXDMSA-UHFFFAOYSA-N 0.000 description 2
- 108091012583 BCL2 Proteins 0.000 description 2
- 241000894006 Bacteria Species 0.000 description 2
- 108020004635 Complementary DNA Proteins 0.000 description 2
- 102000012410 DNA Ligases Human genes 0.000 description 2
- 108010061982 DNA Ligases Proteins 0.000 description 2
- 230000004544 DNA amplification Effects 0.000 description 2
- 230000007023 DNA restriction-modification system Effects 0.000 description 2
- 206010058314 Dysplasia Diseases 0.000 description 2
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 2
- 108010003751 Elongin Proteins 0.000 description 2
- 102000004662 Elongin Human genes 0.000 description 2
- 241000196324 Embryophyta Species 0.000 description 2
- 101000614806 Homo sapiens cAMP-dependent protein kinase type II-beta regulatory subunit Proteins 0.000 description 2
- 241000243328 Hydridae Species 0.000 description 2
- 208000026350 Inborn Genetic disease Diseases 0.000 description 2
- 108090001090 Lectins Proteins 0.000 description 2
- 102000004856 Lectins Human genes 0.000 description 2
- 102000003960 Ligases Human genes 0.000 description 2
- 108090000364 Ligases Proteins 0.000 description 2
- 206010025066 Lung carcinoma cell type unspecified stage 0 Diseases 0.000 description 2
- TWRXJAOTZQYOKJ-UHFFFAOYSA-L Magnesium chloride Chemical compound [Mg+2].[Cl-].[Cl-] TWRXJAOTZQYOKJ-UHFFFAOYSA-L 0.000 description 2
- 239000004677 Nylon Substances 0.000 description 2
- 238000012408 PCR amplification Methods 0.000 description 2
- 229920001213 Polysorbate 20 Polymers 0.000 description 2
- 206010041067 Small cell lung cancer Diseases 0.000 description 2
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 2
- 239000007983 Tris buffer Substances 0.000 description 2
- 102100033254 Tumor suppressor ARF Human genes 0.000 description 2
- 230000001594 aberrant effect Effects 0.000 description 2
- 239000011543 agarose gel Substances 0.000 description 2
- XAGFODPZIPBFFR-UHFFFAOYSA-N aluminium Chemical compound [Al] XAGFODPZIPBFFR-UHFFFAOYSA-N 0.000 description 2
- 229910052782 aluminium Inorganic materials 0.000 description 2
- 238000003556 assay Methods 0.000 description 2
- 238000009835 boiling Methods 0.000 description 2
- 102100021205 cAMP-dependent protein kinase type II-beta regulatory subunit Human genes 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000003776 cleavage reaction Methods 0.000 description 2
- 239000000470 constituent Substances 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 2
- 230000002559 cytogenic effect Effects 0.000 description 2
- 238000004925 denaturation Methods 0.000 description 2
- 230000036425 denaturation Effects 0.000 description 2
- 230000008021 deposition Effects 0.000 description 2
- 229960005156 digoxin Drugs 0.000 description 2
- BNIILDVGGAEEIG-UHFFFAOYSA-L disodium hydrogen phosphate Chemical compound [Na+].[Na+].OP([O-])([O-])=O BNIILDVGGAEEIG-UHFFFAOYSA-L 0.000 description 2
- 229910000397 disodium phosphate Inorganic materials 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 239000000975 dye Substances 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000001976 enzyme digestion Methods 0.000 description 2
- 238000001704 evaporation Methods 0.000 description 2
- 230000008020 evaporation Effects 0.000 description 2
- 238000010195 expression analysis Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 239000011888 foil Substances 0.000 description 2
- 230000008014 freezing Effects 0.000 description 2
- 238000007710 freezing Methods 0.000 description 2
- 208000016361 genetic disease Diseases 0.000 description 2
- 230000002068 genetic effect Effects 0.000 description 2
- 238000012268 genome sequencing Methods 0.000 description 2
- 230000037442 genomic alteration Effects 0.000 description 2
- 238000010438 heat treatment Methods 0.000 description 2
- 238000011065 in-situ storage Methods 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000002955 isolation Methods 0.000 description 2
- 239000002523 lectin Substances 0.000 description 2
- 239000003446 ligand Substances 0.000 description 2
- 230000000670 limiting effect Effects 0.000 description 2
- 201000004320 lung carcinoma in situ Diseases 0.000 description 2
- 239000006166 lysate Substances 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000001404 mediated effect Effects 0.000 description 2
- -1 micro-deletions Proteins 0.000 description 2
- 238000010369 molecular cloning Methods 0.000 description 2
- 208000002154 non-small cell lung carcinoma Diseases 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 210000004940 nucleus Anatomy 0.000 description 2
- 229920001778 nylon Polymers 0.000 description 2
- 102000002574 p38 Mitogen-Activated Protein Kinases Human genes 0.000 description 2
- 108010068338 p38 Mitogen-Activated Protein Kinases Proteins 0.000 description 2
- 239000000256 polyoxyethylene sorbitan monolaurate Substances 0.000 description 2
- 235000010486 polyoxyethylene sorbitan monolaurate Nutrition 0.000 description 2
- 230000000135 prohibitive effect Effects 0.000 description 2
- 230000005855 radiation Effects 0.000 description 2
- 239000013074 reference sample Substances 0.000 description 2
- 238000005057 refrigeration Methods 0.000 description 2
- 230000002441 reversible effect Effects 0.000 description 2
- 230000007017 scission Effects 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 238000013515 script Methods 0.000 description 2
- 239000011780 sodium chloride Substances 0.000 description 2
- 239000001509 sodium citrate Substances 0.000 description 2
- NLJMYIDDQXHKNR-UHFFFAOYSA-K sodium citrate Chemical compound O.O.[Na+].[Na+].[Na+].[O-]C(=O)CC(O)(CC([O-])=O)C([O-])=O NLJMYIDDQXHKNR-UHFFFAOYSA-K 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 210000001519 tissue Anatomy 0.000 description 2
- 238000011269 treatment regimen Methods 0.000 description 2
- LENZDBCJOHFCAS-UHFFFAOYSA-N tris Chemical compound OCC(N)(CO)CO LENZDBCJOHFCAS-UHFFFAOYSA-N 0.000 description 2
- 208000029729 tumor suppressor gene on chromosome 11 Diseases 0.000 description 2
- MZOFCQQQCNRIBI-VMXHOPILSA-N (3s)-4-[[(2s)-1-[[(2s)-1-[[(1s)-1-carboxy-2-hydroxyethyl]amino]-4-methyl-1-oxopentan-2-yl]amino]-5-(diaminomethylideneamino)-1-oxopentan-2-yl]amino]-3-[[2-[[(2s)-2,6-diaminohexanoyl]amino]acetyl]amino]-4-oxobutanoic acid Chemical compound OC[C@@H](C(O)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@H](CC(O)=O)NC(=O)CNC(=O)[C@@H](N)CCCCN MZOFCQQQCNRIBI-VMXHOPILSA-N 0.000 description 1
- 101150096372 1.3 gene Proteins 0.000 description 1
- FWBHETKCLVMNFS-UHFFFAOYSA-N 4',6-Diamino-2-phenylindol Chemical compound C1=CC(C(=N)N)=CC=C1C1=CC2=CC=C(C(N)=N)C=C2N1 FWBHETKCLVMNFS-UHFFFAOYSA-N 0.000 description 1
- 108091093088 Amplicon Proteins 0.000 description 1
- 241000256844 Apis mellifera Species 0.000 description 1
- 241000219194 Arabidopsis Species 0.000 description 1
- 235000007319 Avena orientalis Nutrition 0.000 description 1
- 241000209763 Avena sativa Species 0.000 description 1
- 235000007558 Avena sp Nutrition 0.000 description 1
- 108090001008 Avidin Proteins 0.000 description 1
- 108010001478 Bacitracin Proteins 0.000 description 1
- 108020000946 Bacterial DNA Proteins 0.000 description 1
- 241000244201 Caenorhabditis briggsae Species 0.000 description 1
- 101100152433 Caenorhabditis elegans tat-1 gene Proteins 0.000 description 1
- 108091029523 CpG island Proteins 0.000 description 1
- 108050006400 Cyclin Proteins 0.000 description 1
- 102000016736 Cyclin Human genes 0.000 description 1
- 238000000018 DNA microarray Methods 0.000 description 1
- 239000003155 DNA primer Substances 0.000 description 1
- 239000003298 DNA probe Substances 0.000 description 1
- 230000004568 DNA-binding Effects 0.000 description 1
- 101710096438 DNA-binding protein Proteins 0.000 description 1
- 241000252212 Danio rerio Species 0.000 description 1
- 102000007260 Deoxyribonuclease I Human genes 0.000 description 1
- 108010008532 Deoxyribonuclease I Proteins 0.000 description 1
- 208000012239 Developmental disease Diseases 0.000 description 1
- LTMHDMANZUZIPE-AMTYYWEZSA-N Digoxin Natural products O([C@H]1[C@H](C)O[C@H](O[C@@H]2C[C@@H]3[C@@](C)([C@@H]4[C@H]([C@]5(O)[C@](C)([C@H](O)C4)[C@H](C4=CC(=O)OC4)CC5)CC3)CC2)C[C@@H]1O)[C@H]1O[C@H](C)[C@@H](O[C@H]2O[C@@H](C)[C@H](O)[C@@H](O)C2)[C@@H](O)C1 LTMHDMANZUZIPE-AMTYYWEZSA-N 0.000 description 1
- 241000255581 Drosophila <fruit fly, genus> Species 0.000 description 1
- 102100033925 GS homeobox 1 Human genes 0.000 description 1
- 101710123431 GS homeobox 1 Proteins 0.000 description 1
- 241000287828 Gallus gallus Species 0.000 description 1
- 244000068988 Glycine max Species 0.000 description 1
- 235000010469 Glycine max Nutrition 0.000 description 1
- 108700005087 Homeobox Genes Proteins 0.000 description 1
- 101000984626 Homo sapiens Low-density lipoprotein receptor-related protein 12 Proteins 0.000 description 1
- 101000582254 Homo sapiens Nuclear receptor corepressor 2 Proteins 0.000 description 1
- 241000243251 Hydra Species 0.000 description 1
- 108060003951 Immunoglobulin Proteins 0.000 description 1
- 108091092195 Intron Proteins 0.000 description 1
- ROHFNLRQFUQHCH-YFKPBYRVSA-N L-leucine Chemical compound CC(C)C[C@H](N)C(O)=O ROHFNLRQFUQHCH-YFKPBYRVSA-N 0.000 description 1
- ROHFNLRQFUQHCH-UHFFFAOYSA-N Leucine Natural products CC(C)CC(N)C(O)=O ROHFNLRQFUQHCH-UHFFFAOYSA-N 0.000 description 1
- 102100027120 Low-density lipoprotein receptor-related protein 12 Human genes 0.000 description 1
- 241000124008 Mammalia Species 0.000 description 1
- 102100025169 Max-binding protein MNT Human genes 0.000 description 1
- 229910003177 MnII Inorganic materials 0.000 description 1
- 241000699666 Mus <mouse, genus> Species 0.000 description 1
- 108700026495 N-Myc Proto-Oncogene Proteins 0.000 description 1
- 108010057466 NF-kappa B Proteins 0.000 description 1
- 102000003945 NF-kappa B Human genes 0.000 description 1
- 206010061309 Neoplasm progression Diseases 0.000 description 1
- 239000000020 Nitrocellulose Substances 0.000 description 1
- 238000000636 Northern blotting Methods 0.000 description 1
- 108700020796 Oncogene Proteins 0.000 description 1
- 102000043276 Oncogene Human genes 0.000 description 1
- 102000038030 PI3Ks Human genes 0.000 description 1
- 108091007960 PI3Ks Proteins 0.000 description 1
- 108010011536 PTEN Phosphohydrolase Proteins 0.000 description 1
- 102000014160 PTEN Phosphohydrolase Human genes 0.000 description 1
- 241000282577 Pan troglodytes Species 0.000 description 1
- 102100041030 Pancreas/duodenum homeobox protein 1 Human genes 0.000 description 1
- 101710144033 Pancreas/duodenum homeobox protein 1 Proteins 0.000 description 1
- 108091000080 Phosphotransferase Proteins 0.000 description 1
- 102100023884 Probable ribonuclease ZC3H12D Human genes 0.000 description 1
- 102000001253 Protein Kinase Human genes 0.000 description 1
- 241000700159 Rattus Species 0.000 description 1
- 235000014680 Saccharomyces cerevisiae Nutrition 0.000 description 1
- 102000012479 Serine Proteases Human genes 0.000 description 1
- 108010022999 Serine Proteases Proteins 0.000 description 1
- 108010090804 Streptavidin Proteins 0.000 description 1
- 241001441723 Takifugu Species 0.000 description 1
- 108010006785 Taq Polymerase Proteins 0.000 description 1
- 108091023040 Transcription factor Proteins 0.000 description 1
- 102000040945 Transcription factor Human genes 0.000 description 1
- 102000004142 Trypsin Human genes 0.000 description 1
- 108090000631 Trypsin Proteins 0.000 description 1
- 108060008682 Tumor Necrosis Factor Proteins 0.000 description 1
- 102000044209 Tumor Suppressor Genes Human genes 0.000 description 1
- 108700025716 Tumor Suppressor Genes Proteins 0.000 description 1
- 102100040247 Tumor necrosis factor Human genes 0.000 description 1
- 241000700605 Viruses Species 0.000 description 1
- 240000008042 Zea mays Species 0.000 description 1
- 235000016383 Zea mays subsp huehuetenangensis Nutrition 0.000 description 1
- 235000002017 Zea mays subsp mays Nutrition 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 239000008186 active pharmaceutical agent Substances 0.000 description 1
- 208000009956 adenocarcinoma Diseases 0.000 description 1
- 238000013019 agitation Methods 0.000 description 1
- 150000001412 amines Chemical class 0.000 description 1
- 125000003277 amino group Chemical group 0.000 description 1
- 210000003484 anatomy Anatomy 0.000 description 1
- 238000004873 anchoring Methods 0.000 description 1
- 238000000137 annealing Methods 0.000 description 1
- 239000000427 antigen Substances 0.000 description 1
- 108091007433 antigens Proteins 0.000 description 1
- 102000036639 antigens Human genes 0.000 description 1
- 210000004507 artificial chromosome Anatomy 0.000 description 1
- 238000000429 assembly Methods 0.000 description 1
- 230000000712 assembly Effects 0.000 description 1
- 210000003719 b-lymphocyte Anatomy 0.000 description 1
- 229960003071 bacitracin Drugs 0.000 description 1
- 229930184125 bacitracin Natural products 0.000 description 1
- CLKOFPXJLQSYAH-ABRJDSQDSA-N bacitracin A Chemical compound C1SC([C@@H](N)[C@@H](C)CC)=N[C@@H]1C(=O)N[C@@H](CC(C)C)C(=O)N[C@H](CCC(O)=O)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H]1C(=O)N[C@H](CCCN)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@H](CC=2C=CC=CC=2)C(=O)N[C@@H](CC=2N=CNC=2)C(=O)N[C@H](CC(O)=O)C(=O)N[C@@H](CC(N)=O)C(=O)NCCCC1 CLKOFPXJLQSYAH-ABRJDSQDSA-N 0.000 description 1
- 230000001580 bacterial effect Effects 0.000 description 1
- PXXJHWLDUBFPOL-UHFFFAOYSA-N benzamidine Chemical compound NC(=N)C1=CC=CC=C1 PXXJHWLDUBFPOL-UHFFFAOYSA-N 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 238000001574 biopsy Methods 0.000 description 1
- 229960002685 biotin Drugs 0.000 description 1
- 235000020958 biotin Nutrition 0.000 description 1
- 239000011616 biotin Substances 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 150000001720 carbohydrates Chemical class 0.000 description 1
- 238000005119 centrifugation Methods 0.000 description 1
- 210000002230 centromere Anatomy 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 239000007795 chemical reaction product Substances 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 239000013599 cloning vector Substances 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000012864 cross contamination Methods 0.000 description 1
- 239000002577 cryoprotective agent Substances 0.000 description 1
- 238000012258 culturing Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000010432 diamond Substances 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- LTMHDMANZUZIPE-PUGKRICDSA-N digoxin Chemical compound C1[C@H](O)[C@H](O)[C@@H](C)O[C@H]1O[C@@H]1[C@@H](C)O[C@@H](O[C@@H]2[C@H](O[C@@H](O[C@@H]3C[C@@H]4[C@]([C@@H]5[C@H]([C@]6(CC[C@@H]([C@@]6(C)[C@H](O)C5)C=5COC(=O)C=5)O)CC4)(C)CC3)C[C@@H]2O)C)C[C@@H]1O LTMHDMANZUZIPE-PUGKRICDSA-N 0.000 description 1
- LTMHDMANZUZIPE-UHFFFAOYSA-N digoxine Natural products C1C(O)C(O)C(C)OC1OC1C(C)OC(OC2C(OC(OC3CC4C(C5C(C6(CCC(C6(C)C(O)C5)C=5COC(=O)C=5)O)CC4)(C)CC3)CC2O)C)CC1O LTMHDMANZUZIPE-UHFFFAOYSA-N 0.000 description 1
- XPPKVPWEQAFLFU-UHFFFAOYSA-J diphosphate(4-) Chemical compound [O-]P([O-])(=O)OP([O-])([O-])=O XPPKVPWEQAFLFU-UHFFFAOYSA-J 0.000 description 1
- 235000011180 diphosphates Nutrition 0.000 description 1
- 239000012153 distilled water Substances 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 238000007876 drug discovery Methods 0.000 description 1
- 238000001962 electrophoresis Methods 0.000 description 1
- 238000000295 emission spectrum Methods 0.000 description 1
- YQGOJNYOYNNSMM-UHFFFAOYSA-N eosin Chemical compound [Na+].OC(=O)C1=CC=CC=C1C1=C2C=C(Br)C(=O)C(Br)=C2OC2=C(Br)C(O)=C(Br)C=C21 YQGOJNYOYNNSMM-UHFFFAOYSA-N 0.000 description 1
- 230000004049 epigenetic modification Effects 0.000 description 1
- 210000000981 epithelium Anatomy 0.000 description 1
- 238000012869 ethanol precipitation Methods 0.000 description 1
- ZMMJGEGLRURXTF-UHFFFAOYSA-N ethidium bromide Chemical compound [Br-].C12=CC(N)=CC=C2C2=CC=C(N)C=C2[N+](CC)=C1C1=CC=CC=C1 ZMMJGEGLRURXTF-UHFFFAOYSA-N 0.000 description 1
- 229960005542 ethidium bromide Drugs 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 201000003444 follicular lymphoma Diseases 0.000 description 1
- 125000002485 formyl group Chemical class [H]C(*)=O 0.000 description 1
- 239000000499 gel Substances 0.000 description 1
- 238000003500 gene array Methods 0.000 description 1
- 238000011223 gene expression profiling Methods 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000013537 high throughput screening Methods 0.000 description 1
- 102000044699 human NCOR2 Human genes 0.000 description 1
- 230000006607 hypermethylation Effects 0.000 description 1
- 102000018358 immunoglobulin Human genes 0.000 description 1
- 239000012133 immunoprecipitate Substances 0.000 description 1
- 238000001114 immunoprecipitation Methods 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 238000007901 in situ hybridization Methods 0.000 description 1
- 238000000338 in vitro Methods 0.000 description 1
- 238000001727 in vivo Methods 0.000 description 1
- 238000011534 incubation Methods 0.000 description 1
- 230000002401 inhibitory effect Effects 0.000 description 1
- 238000009830 intercalation Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000000968 intestinal effect Effects 0.000 description 1
- 210000000936 intestine Anatomy 0.000 description 1
- QRXWMOHMRWLFEY-UHFFFAOYSA-N isoniazide Chemical compound NNC(=O)C1=CC=NC=C1 QRXWMOHMRWLFEY-UHFFFAOYSA-N 0.000 description 1
- 238000012177 large-scale sequencing Methods 0.000 description 1
- 230000003902 lesion Effects 0.000 description 1
- 208000032839 leukemia Diseases 0.000 description 1
- 210000004072 lung Anatomy 0.000 description 1
- 229910001629 magnesium chloride Inorganic materials 0.000 description 1
- 235000009973 maize Nutrition 0.000 description 1
- 101150112095 map gene Proteins 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000011859 microparticle Substances 0.000 description 1
- 108700024542 myc Genes Proteins 0.000 description 1
- 239000013642 negative control Substances 0.000 description 1
- 229920001220 nitrocellulos Polymers 0.000 description 1
- 229910052757 nitrogen Inorganic materials 0.000 description 1
- 210000003463 organelle Anatomy 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000002974 pharmacogenomic effect Effects 0.000 description 1
- 102000020233 phosphotransferase Human genes 0.000 description 1
- 238000000053 physical method Methods 0.000 description 1
- 230000004962 physiological condition Effects 0.000 description 1
- 239000004033 plastic Substances 0.000 description 1
- SCVFZCLFOSHCOH-UHFFFAOYSA-M potassium acetate Chemical compound [K+].CC([O-])=O SCVFZCLFOSHCOH-UHFFFAOYSA-M 0.000 description 1
- 238000001556 precipitation Methods 0.000 description 1
- 108060006633 protein kinase Proteins 0.000 description 1
- 238000001303 quality assessment method Methods 0.000 description 1
- 230000002285 radioactive effect Effects 0.000 description 1
- 108020003175 receptors Proteins 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000022983 regulation of cell cycle Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000010839 reverse transcription Methods 0.000 description 1
- 238000007789 sealing Methods 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 210000003765 sex chromosome Anatomy 0.000 description 1
- 238000010008 shearing Methods 0.000 description 1
- 238000004513 sizing Methods 0.000 description 1
- 239000002904 solvent Substances 0.000 description 1
- 230000000392 somatic effect Effects 0.000 description 1
- 238000000527 sonication Methods 0.000 description 1
- 238000001899 southwestern blot Methods 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 238000009987 spinning Methods 0.000 description 1
- 229910001220 stainless steel Inorganic materials 0.000 description 1
- 239000010935 stainless steel Substances 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
- 239000012536 storage buffer Substances 0.000 description 1
- 210000002536 stromal cell Anatomy 0.000 description 1
- 239000006228 supernatant Substances 0.000 description 1
- 238000001356 surgical procedure Methods 0.000 description 1
- 239000008399 tap water Substances 0.000 description 1
- 235000020679 tap water Nutrition 0.000 description 1
- 238000010257 thawing Methods 0.000 description 1
- 238000002560 therapeutic procedure Methods 0.000 description 1
- 238000005382 thermal cycling Methods 0.000 description 1
- 231100000622 toxicogenomics Toxicity 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 108091006107 transcriptional repressors Proteins 0.000 description 1
- 238000011222 transcriptome analysis Methods 0.000 description 1
- 239000012588 trypsin Substances 0.000 description 1
- 210000004881 tumor cell Anatomy 0.000 description 1
- 230000005751 tumor progression Effects 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1034—Isolating an individual clone by screening libraries
Definitions
- the present invention pertains to the field of genome analysis, and particularly to array-based genome analysis and diagnostics.
- Nucleic acid arrays provide a tool to analyze information about large numbers of genes simultaneously.
- the majority of commercially available nucleic acid arrays are cDNA arrays.
- One type of cDNA array comprises short synthetic oligonucleotides which are either synthesized in silico (e.g. GeneChip® Human Genome U133 from Affymetrix Inc.), or which are synthesized in vitro and subsequently deposited onto a substrate in a grid-like or arrayed configuration (e.g. MWG-Biotech's Pan® Human array).
- oligonucleotide-based arrays are limited by the length of the DNA oligonucleotides (which affects hybridization characteristics), by the distance between the markers selected for the array, and by the coverage of the array, which is limited to certain parts of the genome sequence.
- a second type of cDNA array comprises cDNA generated by reverse transcription of RNA, followed by its ordered deposition onto the substrate. This type of cDNA array, however, represents only the transcriptional state of a select tissue type under a narrow range of physiological conditions.
- cDNA As cDNA is derived from transcribed regions of the genome only, a cDNA array cannot comprise the entire gene set of a given genome. Furthermore, the content of a cDNA array excludes transcriptional regulatory sequences, introns, intergenic sequences and telomeres. Thus, the majority of commercially available arrays are used for gene expression profiling with very few being designed for detecting genetic alterations such as gene amplifications and deletions at any locus within the genome. Such genetic alterations are involved in some forms of cancer and other genetic diseases, and in development and differentiation.
- genomic arrays comprising genomic DNA, which can help to eliminate the requirement of handling large quantities of individual clones.
- genomic arrays are not straightforward for a number of reasons.
- multiple, commercial scale array constructions would require reculturing of the host cells harboring BAC plasmids in order to replenish supplies.
- the use of purified DNA preparations may also introduce contamination of the host bacterial DNA into the array as it is not readily separable from the cloned DNA.
- the Affymetrix Mapping 1OK Array provides a representation of the human genome with an average intermarker distance of 210 kB.
- the Vysis GenosensorTM Array 300 is a genomic array containing 287 probes that include telomeres, micro-deletions, oncogenes and tumor suppressor genes. With less than 300 probes representing the 3 billion base pairs of the human genome, the Vysis GenosensorTM Array 300 offers an average 10 Megabase resolution.
- Spectral Genomics produces a series of BAC arrays for array- based genome profiling. Spectral Genomics genomic arrays are produced at 1-4 Megabase resolution.
- Human BAC array 1400 (KTH3-1400) comprises 1400 non- overlapping BAC clones from the RPCI BAC library spotted in duplicate, with a resolution in the 2-4 Megabase range representing 5% or less coverage of the genome.
- Genomic arrays prepared to date include whole genome arrays as well as chromosome specific arrays.
- Snijders et al. ⁇ Nature Genetics 29:263-264, 2001) describe a whole genome array prepared according to the method described in U.S. Patent Application No. 20030087231 comprising representations of 2460 human genomic DNA BAC and Pl clones, which is reported to cover the entire genome at a resolution of 1.4 Mb.
- a whole genome array has also been constructed using the DOP-PCR based method described by Fiegler et al. (ibid).
- This array was generated using a set of approximately 3500 genomic DNA BAC and PAC clones selected to cover the entire genome at a resolution of 1 Mb. As the resolution of this array is not in the submegabase range, it will still be ⁇ unable to detect gene alterations that occur at the submegabase level of resolution.
- a high-resolution chromosome-specific array has also been described (Buckley et al. Human Molecular Genetics (2002) 11 :3221-3229) that is reported to provide an average resolution of 75 kb, but covers only 34.7 Mb of the long arm of chromosome 22.
- An object of the present invention is to provide methods for the preparation of a library of submegabase resolution tiling pools and uses thereof.
- a method of preparing a submegabase resolution library comprising a collection of synthetic nucleic acid fragment pools, each of said synthetic nucleic acid fragment pools corresponding to a genomic clone, said method comprising the steps of: selecting a set of genomic clones from at least one library of genomic clones, each of said clones comprising a genomic insert, wherein between about 17 bp and about 1,500 bp of the sequence of the genomic insert of at least 95% of the clones in the set overlaps with the sequence of the genomic insert of an adjacent genomic clone; and preparing a synthetic nucleic acid fragment pool from each genomic clone in the set by fragmenting the genomic clone to produce nucleic acid fragments; and amplifying said fragments to generate a synthetic nucleic acid pool
- a submegabase resolution library comprising a collection of synthetic nucleic acid fragment pools, wherein said library is prepared by a method of the invention.
- an array comprising one or more submegabase resolution library of the invention.
- a method of preparing a submegabase resolution tiling set of genomic clones representing at least a portion of a genome comprising selecting a set of genomic clones from at least one library of genomic clones representing said genome, each of said clones containing a genomic insert, wherein between about 17 bp and about 1,500 bp of the sequence of the genomic insert of at least 95% of the clones in the set overlaps with the sequence of the genomic insert of an adjacent genomic clone.
- a method of preparing a synthetic nucleic acid fragment pool from a genomic clone comprising: (a) preparing genomic clone DNA; (b) fragmenting genomic clone DNA to produce DNA fragments; and (c) amplifying said DNA fragments to generate a SMRT pool, wherein step b) or step c) comprises one or more dilution-processing steps.
- a high throughput method for determining the identity of a genomic clone having a genomic insert comprising the steps of: preparing a solution comprising at least 20 fmol of said genomic clone, a primer labelled with a detectable label and amplification reagents; submitting said solution to between 65 and 100 cycles of thermal amplification to provide an amplified solution; submitting said amplified solution to sequence analysis to determine a sequence of at least 17 base pairs in length of said genomic insert; and comparing said sequence to a reference database in order to determine the identity of said genomic clone.
- an array providing a representation of a tiling set of genomic clones, said array comprising a plurality of pools of synthetic nucleic acid fragments deposited on one or more solid support, wherein each pool is derived from one of said genomic clones and is present at one or more distinct locations on said one or more solid support, and wherein between 17 bp and 1,500 bp of the sequence of the genomic insert of at least 95% of the clones in said tiling set overlaps with the sequence of the genomic insert of an adjacent genomic clone.
- a method of preparing an array comprising the steps of: selecting a set of genomic clones from at least one library of genomic clones, each of said clones containing a genomic insert, wherein between 17 bp and 1,500 bp of the sequence of the genomic insert of at least 95% of the clones in the set overlaps with the sequence of the genomic insert of an adjacent genomic clone; preparing a synthetic nucleic acid fragment pool from each genomic clone in the set by fragmenting the genomic clone to produce nucleic acid fragments; and amplifying said fragments to generate a synthetic nucleic acid pool, and depositing each of said synthetic nucleic acid pools onto a solid support at one or more distinct locations.
- an array according of the invention for the diagnosis of disease, determination of predisposition to disease, determination of resistance to treatment, or to enable the selection of a treatment regime.
- a use of an array of the invention for the analysis of gene expression for the analysis of gene expression.
- Figure 1 presents a flow diagram of preparation and identification of SubMegabase Resolution Tiling (SMRT) pools in one embodiment of the invention.
- A Multistep process for the conversion of BAC DNA to SMRT pools.
- B Target fragments for specific primer extension for SMRT pool analysis.
- Figure 2 depicts the coverage of the sequence assembly provided by the clones in a SMRT set in one embodiment of the invention. For each chromosome, the coverage of adjacent 700 kb regions is plotted according to the legend at the top of the figure. Regions in the assembly without sequence information appear as black areas. Distance scale is in Mb.
- Figure 3 depicts the coverage resolution provided by the clones in a SMRT set in one embodiment of the invention.
- the average clone cover size is coded by colour. Regions in the assembly without sequence information appear as black areas.
- Distance scale is in Mb.
- Figure 4 depicts three SMRT pool sequence products.
- A Sequence read of a SMRT pool derived from B AC RP 11 - 124P 12 with an Msel restriction site 260 bp downstream of the T7 primer.
- B Sequence read of a SMRT pool derived from BAC RPl 1-125E6 with an Msel restriction site 127 bp downstream of the T7 primer.
- C Sequence read of a SMRT pool derived from BAC RPl 1-124P22 with an Msel restriction site 17 bp downstream of the T7 primer.
- Figure 5 demonstrates the probability of identifying a 96 well plate.
- the number of SMRT pools sequenced increase the probability of identifying the plate.
- Solid squares denote SP6 primer sequencing.
- Solid diamonds denote T7 primer sequencing.
- Solid triangles denote sequencing of the SMRT pools with both Sp6 and T7 primers. 95% confidence intervals are represented by vertical bars on all data points.
- Figure 6 depicts identification of a SMRT pool by Southern Analysis. 200 ng RPl 1- 156K13 Hindlll digest (lane 1). 200 ng RPl 1-104F14 HiruSR digest (lane 2).
- A In silico fingerprint of RPl 1-156K13 and RPl 1-104F14.
- B Southern transfer hybridized with radiolabeled SMRT pool from BAC clone RPl 1156K13 without Cot- 1 DNA blocking.
- C Southern transfer hybridized with radiolabeled SMRT pool from BAC clone RPl 1156Kl 3 with 50 ⁇ g Cotl DNA blocking.
- Figure 7 depicts identification of a SMRT pool by FISH analysis.
- Red represents a random primed SMRT pool probe generated from clone RP 11328P22 (locus:
- Figure 8 depicts (A) Whole genome profile of a TAT-I lymphoma cell line versus normal male DNA. Vertical lines represent a log 2 ratio of 0.5 and log 2 ratio of -0.5, as labelled. Each dot represents one unique LMPCR amplified BAC on the whole genome array.
- B Chromosome view of 8q showing MYC amplification between BAC clones RPl 1-143H8 and RP11-263C20.
- C Chromosome view of 18q showing BCL2 amplification between BAC clones RPl 1-159K14 and RPl 1-565D23. Vertical lines are scale bars indicating Iog2 ratios of +0.5 and -0.5, respectively.
- Figure 9 presents the results of a SMRT array comparative genome hybridization experiment using lung cancer cell line H526.
- A Whole-genome view of H526 versus reference male DNA.
- B Amplified view of deletion breakpoint at 3p21.1 between BAC clones RPl 1-63205 and RPl 1-594F16, also seen in A. Vertical lines are scale bars indicating log 2 ratios of +0.5 and -0.5, respectively.
- C FISH confirmation of breakpoint in B showing single-copy loss of BAC clone RPl 1-594F16 (green) and normal copy number of BAC clone RPl 1-632O05 (red).
- Figure 10 presents the results of amplification of chromosome 8q24.12-13 in colorectal cancer cell line COLO320. This 1.9-Mb amplification containing MYC is bounded by BAC clones RPl 1-810D23 and RPl 1-294P7. Vertical lines are scale bars indicating Iog2 ratios of +0.5 and -0.5, respectively.
- Figure 11 presents a detailed analysis of microamplifications on chromosome arms 13q, 15q, 16p, and 22q in COLO320 cells.
- Figure 12 depicts the identification of a new microamplification by SMRT array comparative genome hybridization in the cancer cell line COLO320.
- A 300-kb microamplification on chromosome 13ql2.2 containing genes GSHl, CDX2 and IPFl and bounded by BAC clones RPl 1153M24 and RPl 1-152N3. Vertical lines are scale bars indicating log 2 ratios of +0.5 and -0.5, respectively.
- B High copy-number amplification of RPl 1-153M24 detected by FISH hybridization. Amplification was located in a homogeneously staining region.
- Figure 13 depicts the identification of microdeletions.
- A Identification of a 1.25-Mb deletion at 9p21.3 in a mantle cell lymphoma cell line containing CDKN2A bounded by BAC clones RPl 1-328C2 and RPl 1-275H17.
- B 240-kb deletion at 7q22.3 in breast cancer cell line BT474 containing PRKAR2B and HBPl bounded by BAC clones RPl 1-258L19 and RPl 1-262G16.
- Vertical lines are scale bars indicating Iog2 ratios of +0.5 and -0.5, respectively .
- Figure 14 depicts the results of a SMRT array comparative genome hybridization profile of HCC 15 showing a two-fold copy number deletion at 6q24.3-ter.
- Figure 15 depicts the results of a SMRT array comparative genome hybridization analysis of reference male versus reference female hybridization
- Figure 16 depicts the results of a SMRT array comparative genome hybridization analysis of oral tumors at 8q21-24.
- A -(D) CGH plots of 4 oral tumors. Shaded areas highlight regions of amplification.
- A Tumor 21 IT shows no copy number change within 8q21-24.
- B Tumor 199T shows amplification of the entire tiling set (8q21-24).
- D - (D) Tumor 528T and 24T show multiple amplifications at 8q22, in addition to that at 8q24.
- Figure 17 depicts the identification of a minimal region of alteration at 8q22.
- A Tiling set of BAC clones, selected from the human RPCI-11 BAC library, spanning part of 8q22. Black boxes indicate clones present on the array.
- B Regions of segmental copy increase observed in four samples (566T, 574T, 166T, 573T) aligned with the BAC tiling set. The 5.3 Mb MRA and the three genes subjected to expression analysis are indicated.
- C BAC array CGH profile of sample 574T. Data are displayed as a normalized signal ratio between tumor and reference DNA for each BAC clone. Each data point represents the average of three replicate spots on the array and includes the standard deviation. Shading indicates large regions with copy number increase, which contains LRP12 at 8q22 and MYC at 8q24.
- Figure 18 depicts the results of a SMRT array comparative genome hybridization analysis of lung CIS samples at 8q21-24.
- Figure 19 presents a map of chromosome region 6ql6-q22.1 showing known deletions. Relative locations of genes, markers, and BAC, PAC, and YAC clones suitable for use as FISH probes are shown. Shading indicates minimal region of deletion.
- Figure 20 depicts the results of using a SMRT pool product from BAC clone 127Cl 2 (Green) and 619019 (Red) in standard FISH hybridization protocol.
- Figure 21 depicts the detection of an imprinted gene in a 150 kb region on 18q21.1.
- the boxed region shows where Elongin A3, a reported imprinted gene, is hypermethylated in a lymphobast cell line derived from a normal individual.
- Vertical lines denote log 2 signal ratios from -1 to 1 with hypermethylation to the right and hypomethylation to the left of zero.
- Each black line segment represents a single BAC clone.
- Figure 22 presents a schematic for the analysis of methylation in genomic DNA.
- the present invention relates to a method of preparing a library of replenishable synthetic nucleic acid fragment pools from a tiling set of genomic clones.
- Each pool represents one clone derived from a tiling set of genomic clones, wherein the tiling set is distinguishable from other tiling sets commonly used in the art in terms of its representation of a genome.
- the method comprises two steps: preparing a tiling set of genomic clones and preparing a library from the tiling set.
- a tiling set is prepared by selecting appropriate genomic clones from one or more genome-ordered libraries to optimize size, map coverage and overlap of the genomic inserts.
- a library of synthetic fragment pools is subsequently generated from the tiling set of clones by high through-put amplification techniques.
- the method of the present invention allows for the amplification of DNA in a tiling set containing more than 4,000 clones by employing a high throughput automatable procedure comprising one or more dilution-processing steps.
- this invention provides the capability of processing large clone numbers, which therefore enables the inclusion of a large number of clones in the tiling set.
- the dilution-based amplification step enables the design of a tiling set comprising a large number of clones with a high degree of overlap. In its greatest extent, an entire genome can be represented by the tiling set with a high degree of overlap resulting in a resolution of less than 1Mb.
- the fragment pools generated by this method can be replenished at any time without the need to re-isolate the original genomic clones that were used to create the tiling set.
- the pools can be used, for example, as a source of probes for applications such as nucleic acid hybridization or FISH.
- the library can be used, for example, to prepare a nucleic acid array that can span an entire genome, or a portion of a genome, with a submegabase resolution. Definitions
- tile set refers to a finite collection of cloned DNAs, wherein each cloned DNA comprises a fragment of the genomic DNA of an organism.
- the collection of cloned DNAs can comprise all or a part of the genomic sequence of the organism.
- the individual cloned DNAs are ordered sequentially relative to the genome, so that the genomic insert start point of each cloned DNA follows the start point of the preceding cloned DNA and precedes the start point of the following cloned DNA.
- the clones are in the form of a cloning vector containing the fragment of genomic DNA as an insert.
- SMRT pool refers to a pool of synthetic nucleic acid molecules generated by fragmentation of a clone of a tiling set and amplification of the resultant fragments.
- a SMRT pool typically comprise vector DNA fragments and genomic insert DNA fragments.
- SMRT library refers to a library comprising a plurality of SMRT pools derived from a tiling set, or the constituent synthetic fragments of the SMRT pools.
- a SMRT library of the invention therefore, can represent an entire genome or a portion of a genome.
- a "sub-library” refers to a collection of SMRT pools selected from a SMRT library, wherein the number of SMRT pools in the collection is fewer than the number of SMRT pools in the SMRT library.
- SMRT array refers to an ordered array comprising the SMRT pools of a SMRT library, or a sub- library, or a plurality of SMRT libraries, or the constituent synthetic nucleic acid molecules of an SMRT pools deposited onto a solid support substrate.
- a SMRT set representing a selected genomic sequence comprises a set of genomic clones, the inserts of which have overlapping sequences. The ends of each region of overlap between two clones define the boundaries of a "clone cover.” The length of the clone cover will vary according to the extent of the overlap between the clones. The average size of all the clone covers in a SMRT set can be calculated and this value used to define the resolution of the SMRT set.
- the effective resolution of a SMRT set is calculated by the weighted average of the clone cover size wherein the weights are given by the fraction of the selected genomic sequence represented in clone covers of a given size. The smaller the weighted average size of the clone cover, the higher the resolution of the SMRT set.
- sequence coverage means the percentage of a genome, or selected portion of a genome, that is represented by a SMRT tiling set.
- depth of coverage or “coverage depth,” as used herein, refer to the number of times a given genomic segment is represented in a SMRT set.
- array-based comparative genome hybridization refers to a method of identifying genomic alterations such as gain, loss or rearrangement of chromosomal regions in a genomic test sample through competitive hybridization of differentially-labeled test and reference genomic DNA to an array of probes. The ratio of labeling intensities at each probe array point indicates the copy number of the DNA in the test sample relative to the corresponding copy number in the reference.
- probe refers to one or more nucleic acid molecules of known sequence used in hybridization studies to interrogate a target nucleic acid sequence.
- a SMRT pool of nucleic acid sequences constitutes a probe which may be used to determine whether a sequence is present in a test sample that maps to a specific location on the parent genome.
- a probe may be labeled, for example, when used in fluorescent in situ hybridisation (FISH), or may be unlabeled, for example, when incorporated into an array.
- FISH fluorescent in situ hybridisation
- a library of the present invention is a collection of synthetic nucleic acid pools, wherein each pool represents one clone from a tiling set of genomic clones.
- the clones in the tiling set are selected from one or more genome- ordered libraries to optimize size, map coverage and overlap of the inserts.
- a SubMegabase Resolution Tiling (SMRT) set When the tiling set confers submegabase resolution, it is referred to herein as a SubMegabase Resolution Tiling (SMRT) set.
- SMRT SubMegabase Resolution Tiling
- a synthetic nucleic acid fragment pool derived from each clone in the SMRT set is referred to as a SMRT pool, and a library comprising the SMRT pools is referred to as a SMRT library.
- a library is prepared by: (i) preparing a tiling set of genomic clones, (ii) preparing a pool of synthetic nucleic acid fragments from each clone in the tiling set by fragmentation of the clone and amplification of the fragments.
- Preparing a SMRT tiling set entails selecting overlapping genomic clones from at least one library of clones to form a set that covers a whole genome, or a portion of a genome, at high resolution.
- the resolution is ⁇ 1 Mb.
- the portion of the genome can be one or more chromosomes, or one or more regions of a genome that are relevant to the study of a disease, for example cancer.
- a SMRT set can be selected to span a specific region of interest in a genome, one or more chromosomes, or an entire genome.
- the SMRT set spans a human genome sequence that is minimally 35 Mb in length.
- the SMRT set spans two or more chromosomes.
- the SMRT set covers one or more chromosomes.
- the SMRT set covers a region of interest, for example, a region known to be important in the diagnosis of disease.
- the SMRT set can also be selected to cover a minimal percentage of a selected genome, which may represent a fragment of the genome or substantially all of the genome.
- the SMRT set minimally spans about 10% of a selected genome. In another embodiment, the SMRT set minimally spans about 20% of a selected genome. In another embodiment, the SMRT set minimally spans about 30% of a selected genome. In another embodiment, the SMRT set minimally spans about 40% of a selected genome. In another embodiment, the SMRT set minimally spans about 50% of a selected genome. In another embodiment, the SMRT set minimally spans about 60% of a selected genome. In another embodiment, the SMRT set minimally spans about 70% of a selected genome. In another embodiment, the SMRT set minimally spans about 80% of a selected genome.
- the SMRT set minimally spans about 90% of a selected genome. In another embodiment, the SMRT set minimally spans 95% of a selected genome. In other embodiments, the SMRT set minimally spans about 96%, about 97%, about 98% and about 99% of a selected genome.
- the SMRT set can be contiguous over each chromosome, and constitute a plurality of tiling subsets equal, for example, to the number of different chromosomes in the genome.
- the tiling set can comprise 22 somatic and 2 sex chromosome subsets.
- the SMRT set comprises at least about 4,000 clones. In another embodiment, the SMRT set comprises at least about 6,000 clones. In another embodiment, the SMRT set comprises at least about 8,000 clones. In another embodiment, the SMRT set comprises at least about 10,000 clones. In another embodiment, the SMRT set comprises at least about 15,000 clones. In another embodiment, the SMRT set comprises at least about 20,000 clones. In another embodiment, the SMRT set comprises at least about 25,000 clones.
- the SMRT set comprises at least about 30,000 clones. In another embodiment, the SMRT set comprises at least about 35,000 clones. It will be readily apparent to one skilled in the art that, while the methods provided by the instant invention enable the production of libraries from SMRT sets comprising large numbers of genomic clones, the methods are equally applicable to tiling sets that comprise small numbers of clones.
- a SMRT set can comprise fewer than 4,000 clones and still cover a particular region of the genome with high resolution when the region of the genome is fairly small. Such SMRT sets may be useful in detecting changes in one or more regions of the genome that are related to a particular disease state. In one embodiment, therefore, the SMRT set comprises less than 4,000 clones. In another embodiment, the SMRT set comprises less than 4,000 clones that cover a portion or portions of a genome, wherein the portion(s) of a genome is greater then 35 mB in size.
- the SMRT set comprises the fewest number of genomic clones required to span the entire genome, i.e. constitutes a "minimal SMRT set.” In another embodiment, the SMRT set comprises the fewest number of genomic clones required to span the entire genome, together with additional clones to increase resolution in regions known to be prone to rearrangement events associated with disease states.
- Suitable genomes for the construction of a SMRT set are those of metazoan organisms which undergo programmed developmental changes, and which are prone to aberrant changes in developmental programming such as cancers and other developmental diseases.
- the genome may be a plant genome or an animal genome for which essentially the entire sequence is known and/or for which a physical map is available. Examples of suitable genomes that are currently known in the art include, but are not limited to, human, mouse, rat, chimpanzee, chicken, zebrafish, fugu, honeybee, C. elegans, C. briggsae, Drosophila, Arabidopsis, maize, oat, soybean, yeast.
- the genome from which the SMRT set and corresponding library is prepared is the human genome.
- the genome is from a mammal other than a human.
- the genome is from an agriculturally significant plant or animal.
- a map of a genome of interest is required in order to select clones for the SMRT set.
- the map provides information about the order of individual genomic clones and enables the selection of genome-ordered clones from the one or more genomic libraries.
- the map can be a sequence map or a physical map.
- the map is a base pair coordinates map that allows one skilled in the art to select appropriate genome-ordered clones according to their physical base pair position as determined by sequencing.
- the map is a fingerprint map generated by restriction digestion of genomic clones of one or more genomic libraries to generate a set of bands, unique in number and position that form the fingerprint for a particular clone.
- the restriction enzyme Hind III may be used to generate the fingerprint map.
- the patterns of bands from multiple clones are analyzed by computer and aligned to determine the amount of overlap.
- Genomic Libraries The size of a typical metazoan genome is usually on the order of several Gigabases ⁇ e.g. 3 Gigabases for the human genome). Given this size, a more manageable form of the genome, such a clone library, is required.
- the genomic libraries used to select clones must provide clones that overlap in their coverage of the genome.
- a genomic library may be available in the public domain or may be constructed specifically for the purpose of generating a SMRT library.
- the genomic sequences provided by a library are usually cloned into a suitable vector. A number of vectors are known in the art that are suitable for cloning large genomic DNA fragments.
- the library can reside in, for example, bacterial artificial chromosome (BAC) vectors, yeast artificial chromosome (YAC) vectors, PAC vectors, Pl vectors, or combinations thereof.
- BAC bacterial artificial chromosome
- YAC yeast artificial chromosome
- PAC PAC vectors
- Pl vectors Pl vectors, or combinations thereof.
- BAC bacterial artificial chromosome
- YAC yeast artificial chromosome
- PAC vectors PAC vectors
- Pl vectors Pl vectors, or combinations thereof.
- a set of overlapping clones is selected from the source(s) to construct a SMRT set.
- the SMRT set can be selected to span a specific region of interest in the genome, one or more chromosomes, or an entire genome.
- the clones for the SMRT set are selected by mapping the location of the clones to a map of the genome.
- sequential genomic clones are selected from an ordered set of canonical clones in the genomic library.
- Selection of sequential clones for the tiling set can be performed manually, in which case at least the ends of the genomic insert must be sequenced in order to situate the clone on a map of the genome.
- the ordering can be performed with the aid of a computer, i.e. in silico.
- various methods of sequencing known in the art can be used to sequence the ends of the genomic inserts of selected clones.
- the clones can be subjected to restriction analysis.
- appropriate parameters can be chosen by one skilled in the art to allow an optimum tiling set to be selected.
- the clones selected for the SMRT set contain genomic inserts that provide overlapping coverage of the selected region of the genome.
- the amount of overlap between the clones will be dependent on the type and number of available genomic clones and the extent of coverage of the selected genome provided by these clones.
- the availability of a large number of different genomic clones, together with information relating to their respective map positions and sequences, allows one skilled in the art to select various tiling sets representing all, or one or more portions of the genome.
- the possible SMRT sets that can be constructed may be more limited.
- the minimal sequence overlap between the insert of one clone and that of each of its neighbouring clones is about 17 bp. In another embodiment, the minimal sequence overlap between the insert of one clone and that of its neighbouring clones is about 20 bp. In another embodiment, the minimal sequence overlap between the insert of one clone and that of its neighbouring clones is about 50 bp. In a further embodiment, the minimal sequence overlap between the insert of one clone and that of its neighbouring clones is about 75 bp. In another embodiment, the minimal sequence overlap between the insert of one clone and that of its neighbouring clones is about 100 bp. In other embodiments, the minimal sequence overlap between the insert of one clone and that of its neighbouring clones is about 150 bp, about 175 bp, about 200 bp, about 250 bp and about 500 bp.
- sequence overlap between the inserts of neighbouring clones ranges from 17 bp to 1,500 bp. In another embodiment, the sequence overlap between the inserts of neighbouring clones ranges from 17 bp to 1,000 bp. In a further embodiment, the sequence overlap between the inserts of neighbouring clones ranges from 17 bp to 750 bp. In another embodiment, the sequence overlap between the inserts of neighbouring clones ranges from 17 bp to 500 bp. In another embodiment, the sequence overlap between the inserts of neighbouring clones ranges from 25 bp to 1,500 bp. In another embodiment, the sequence overlap between the inserts of neighbouring clones ranges from 25 bp to 1,000 bp.
- sequence overlap between the inserts of neighbouring clones ranges from 25 bp to 750 bp. In other embodiments, the sequence overlap between the inserts of neighbouring clones ranges from 50 bp to 1,500 bp, from 75 bp to 1,500 bp, from 100 bp to 1,500 bp and from 250 bp to 1,500 bp.
- the minimal sequence overlap between the insert of one clone and that of its neighbouring clones is about 10%. In another embodiment, the minimal sequence overlap between the insert of one clone and that of its neighbouring clones is about 30%. In another embodiment, the minimal sequence overlap between the insert of one clone and that of its neighbouring clones is about 40%. In another embodiment, the minimal sequence overlap between the insert of one clone and that of its neighbouring clones is about 50%. In another embodiment, the minimal sequence overlap between the insert of one clone and that of its neighbouring clones is about 60%.
- the sequence overlap between the inserts of neighbouring clones ranges from 10% to 90 %. In another embodiment, the sequence overlap ranges from 20% to 80 %. In another embodiment, the sequence overlap ranges from 30% to 70%. In another embodiment, the sequence overlap ranges from 40% to 60 %. In another embodiment, the sequence overlap ranges from 20% to 90 %. In another embodiment, the sequence overlap ranges from 30% to 90 %. In another embodiment, the sequence overlap ranges from 40% to 90 %. In another embodiment, the sequence overlap ranges from 10% to 80 %. In one embodiment, the sequence overlap ranges from 10% to 70 %. In another embodiment, the sequence overlap ranges from 10% to 60 %. In another embodiment, the sequence overlap ranges from 10% to 50 %. In another embodiment, the sequence overlap ranges from 10% to 40 %. In another embodiment, the sequence overlap ranges from 20% to 40 %. In another embodiment, the sequence overlap ranges from 30% to 40 %.
- clones that are completely sequenced are selected over unsequenced clones for inclusion in the SMRT library.
- Genomic clones selected for inclusion in the SMRT can have inserts of varying size.
- the SMRT set comprises clones in which the genomic inserts are between about 15 kb and about 300 kb in length.
- the SMRT set comprises clones in which the genomic inserts are between about 50 kb and about 200 kb in length.
- the SMRT set comprises clones in which the genomic inserts are between about 50 kb and about 300 kb in length.
- the SMRT set comprises clones in which the genomic inserts are between about 100 kb and about 300 kb in length.
- the SMRT set comprises clones in which the genomic inserts are between about 100 kb and about 200 kb in length.
- the genomic insert start points of the selected clones are staggered throughout the genome. In one embodiment, the insert start points are between about 0.07 and 1 Megabase apart . In another embodiment, the genomic insert clone start points are between about 0.15 and 0.5 Megabase apart. If desired, the clone representation can be denser in those regions of the genome containing loci that are known or suspected to be prone to rearrangement events in order to provide higher resolution at these points.
- gaps occur in the genome tiling set when a single source of clones is employed to generate the SMRT set, these can be bridged by selecting genomic clones from alternate libraries in the public domain. Alternatively, if suitable clones do not exist in the public domain or cannot be found in a library constructed for the purpose, then the gaps can be filled in by chromosome walking, or by other methods known to a worker skilled in the art.
- An exemplary method of selecting appropriate clones for a SMRT set when comprehensive information relating to a genome is publicly available, and a large number of genomic clones are accessible is as follows:
- the base pair positions of all clone inserts relating to the genome can be downloaded from an appropriate source (for example, for the human genome, from the UCSC genome browser website or other similar site).
- a clone at one end of a first chromosome is selected.
- a second clone is then selected that overlaps by an appropriate amount (such as between 17 bp and 1,500 bp) with the end of the first clone as determined by the base pair positions of the two clones.
- a third clone is then selected that overlaps by an appropriate amount with the end of the second clone as determined by base pair position. This process is continued until sufficient clones have been selected to cover the whole chromosome, genomic region or entire genome. For example, utilising the USSC genome browser screenshot of a chromosome, a tiling set can be manually picked or the positional information can be used to arrange the clones into an overlapping clone set.
- the clones for the SMRT set are selected using a physical map generated by restriction enzyme digestion such that neighbouring clones share no fewer than 4 restriction enzyme fragments with respect to the fingerprint map (i.e. if, for example, the physical map being used is a Hind III restriction map, then each clone insert should overlap by at least 4 Hind III fragments); the clone inserts are between about 15 kB and about 300 kB in length, and none of the inserts in the selected clones of a tiling set should share the same 3' or 5' ends (i.e. both ends of the insert of each clone are staggered throughout the genome, or portion of the genome).
- each of the selected clones contains more than about 20 restriction enzyme sites with respect to the fingerprint map.
- a preliminary SMRT set can be validated by fingerprinting or by sequencing to ensure that each genomic clone in the set corresponds to that stored in the genomic reference map. If the genomic clone does not pass the validation step, it is replaced with another clone, or clones, providing equivalent sequence coverage of the genome. In addition, if any gaps in genome sequence coverage are identified, additional genomic clones can be selected to cover the gaps. If a greater depth of coverage is desired for certain regions of the genome, then additional clones that cover the region of interest can also be added at this stage, provided that they otherwise meet the above criteria. The resulting set of clones constitutes the final SMRT set.
- the final SMRT set is generated it can be characterized if required according to parameters such as clone location on the map, sequence coverage, resolution, coverage depth, gaps and sequence overlap.
- Sequence coverage of the SMRT set can be defined in terms of the percentage of the selected genome or selected genomic region that is represented in the SMRT set.
- the sequence coverage of the final SMRT set is at least 90% of the selected genome or genomic region.
- the sequence coverage of the final SMRT set is at least 95% of the selected genome or genomic region.
- the sequence coverage of the final SMRT set is at least 96% of the selected genome or genomic region.
- the sequence coverage of the final SMRT set is at least 97% of the selected genome or genomic region.
- the sequence coverage of the final SMRT set is at least 98% of the selected genome or genomic region.
- the sequence coverage of the final SMRT set is at least 99% of the selected genome or genomic region.
- the resolution of the final SMRT set is less than 1 Mb. In one embodiment of the invention, the resolution of the final SMRT set is less than 1 Mb. In another embodiment, the resolution of the final SMRT set is less than 0.95 Mb. In a further embodiment, the resolution of the final SMRT set is less than 0.9 Mb. In another embodiment, the resolution of the final SMRT set is less than 0.85 Mb. In another embodiment, the resolution of the final SMRT set is less than 0.8 Mb.
- the average depth of coverage of the final SMRT set should be at least IX. In one embodiment, the average depth of coverage of the final SMRT set is at least 1.2X. In another embodiment, the average depth of coverage of the final SMRT set is at least 1.3X. In another embodiment, the average depth of coverage of the final SMRT set is at least 1.4X. In another embodiment, the average depth of coverage of the final
- the average depth of coverage of the final SMRT set is at least 1.6X.
- the clones that are comprised by the final SMRT set have been selected such that there are minimal gaps in sequence coverage provided by the SMRT set.
- the inserts of the clones selected for the SMRT set have overlapping sequences.
- the sequence overlap between the inserts of neighbouring clones ranges from 17 bp to 1,500 bp.
- the sequence overlap between the inserts of neighbouring clones ranges from 17 bp to 1 ,000 bp.
- the sequence overlap between the inserts of neighbouring clones ranges from 17 bp to 750 bp.
- sequence overlap between the inserts of neighbouring clones ranges from 17 bp to 500 bp. In another embodiment, the sequence overlap between the inserts of neighbouring clones ranges from 25 bp to 1,500 bp. In another embodiment, the sequence overlap between the inserts of neighbouring clones ranges from 25 bp to 1,000 bp. In another embodiment, the sequence overlap between the inserts of neighbouring clones ranges from 25 bp to 750 bp. In other embodiments, the sequence overlap between the inserts of neighbouring clones ranges from 50 bp to 1,500 bp, from 75 bp to 1,500 bp, from 100 bp to 1,500 bp and from 250 bp to 1,500 bp.
- the average sequence overlap between the inserts of neighbouring clones is between about 10% and about 90%. In another embodiment, the average sequence overlap between the inserts of neighbouring clones is between about 20% and about 80%. In another embodiment, the average sequence overlap between the inserts of neighbouring clones is between about 30% and about 70%. In another embodiment, the average sequence overlap between the inserts of neighbouring clones is between about 30% and about 60%. In another embodiment, the average sequence overlap between the inserts of neighbouring clones is between about 30% and about 50%.
- substantially overlapping means that at least 95% of the clones within the SMRT set contain genomic inserts that overlap. In one embodiment of the invention, at least 96% of the clones within the SMRT set contain genomic inserts that overlap. In another embodiment, at least 97% of the clones within the SMRT set contain genomic inserts that overlap.
- At least 98% of the clones within the SMRT set contain genomic inserts that overlap. In a further embodiment, at least 99% of the clones within the SMRT set contain genomic inserts that overlap. In another embodiment, the genomic inserts of all the clones within the SMRT set overlap.
- the procedure for preparing a SMRT library entails preparing a SMRT pool from each genomic clone in the SMRT set using a protocol suitable for high throughput and automated generation of SMRT pools.
- This protocol eliminates selected purification steps by use of at least one dilution-processing step and results in the production of a SMRT pool that is replenishable by further amplification.
- the SMRT pools subsequently become the source material from which greater quantities of SMRT pools can be produced, thereby allowing further amplification and generation of synthetic genomic DNA fragments without the need for preparation of additional amounts of the starting genomic clones.
- the protocol entails a) preparing genomic clone DNA, b) reduction of the genomic clone DNA into fragments, and c) amplifying the fragments to generate a SMRT pool.
- the procedure comprises one or more dilution-processing steps that can be part of step (b) or (c) or both.
- Dilution processing comprises taking an aliquot of a first reaction and adding this aliquot to a subsequent reaction mixture such that the aliquot is diluted or by taking the entire first reaction, or a portion thereof, diluting this with a suitable diluent and using all or an aliquot of the diluted first reaction in the subsequent reaction.
- a suitable diluent include water and various solutions that do not comprise components that would interfere with subsequent procedures in the method.
- the dilution-processing step comprises diluting a reaction, or aliquot thereof, between about 1 :2 to about 1 : 500 in either a diluent or a subsequent reaction mixture. In another embodiment of the invention, the dilution-processing step comprises diluting a reaction, or an aliquot thereof, between about 1 :2 to about 1:100. In another embodiment, the dilution-processing step comprises diluting a reaction, or an aliquot thereof, between about 1 :2 to about 1 :50. In another embodiment of the invention, the dilution-processing step comprises diluting a reaction, or an aliquot thereof, between about 1 :2 to about 1 :40. In another embodiment of the invention, the dilution-processing step comprises diluting a reaction, or an aliquot thereof, between about 1:5 to about 1:40.
- dilution-processing is conducted after fragmentation of the genomic clone DNA, i.e. the fragmented clone DNA is diluted prior to amplification.
- the dilution-processing is conducted as part of the amplification step.
- the amplification step comprises more than one amplification reaction and the dilution-processing is conducted between amplifications.
- the procedure comprises more than one amplification reaction and more than one dilution-processing step.
- Preparation of DNA from the genomic clones of the SMRT set can be carried out by various methods known in the art (see, for example, Sambrook, et ah, Molecular Cloning: A Laboratory Manual, 2d ed. (1989) Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N. Y.; Ausubel et al. (eds.), Current Protocols in Molecular Biology, J. Wiley & Sons, New York, NY). A variety of commercial kits are also available for this purpose. Such methods include high throughput automatable method for isolation of clone DNA from bacteria. An exemplary method is described in the Examples section.
- Reduction of the genomic clones into fragments can be achieved by a number of techniques known in the art, for example, by digestion with one or more restriction enzymes, by exposure to UV light or gamma radiation, or by physical methods, such as sonication or mechanical shearing. Limited DNase I digestion may also be employed to generate fragments of the genomic clones.
- the method of preparing a SMRT library includes the step of reduction of genomic clones into fragments by restriction digestion. Once restriction digest of the genomic clone DNA is complete, the restriction enzyme can be inactivated, for example, by heating the restriction enzyme digest mixture, precipitation of the DNA, or use of a commercial kit.
- the average length of the fragments is between about 50 and about 5000 nucleotides. In another embodiment, the average length of the fragments is between about 50 and about 1500 nucleotides. In another embodiment, the average length of the restriction fragments is between about 100 and about 1000 nucleotides. In yet another embodiment, the average length of restriction fragments is between about 100 and about 2000 nucleotides. In a further embodiment, the average length of the fragments is between about 100 and about 500 nucleotides.
- the final length of the fragments will be influenced by the selection of the restriction enzyme(s).
- the use of a restriction enzyme that has a four-nucleotide recognition sequence will generate DNA fragments having an average length of about 4 4 , or 256 nucleotides
- the use of a restriction enzyme having a five- nucleotide recognition sequence will generate DNA fragments having an average length of about 4 5 , or 1024 nucleotides.
- the base composition of a genome which is specific for a given organism, will also influence the average length of fragments generated by restriction enzyme digestion.
- a genome rich in GC content will contain more sites for a restriction enzyme whose recognition sequence is also GC rich, than sites for a restriction enzyme whose recognition sequence is AT rich.
- Selection of suitable restriction enzyme(s) to generate appropriate length fragments is considered to be within the ordinary skills of a worker in the art.
- the selected restriction enzyme(s) may generate 5' overhangs, blunt ends, or 3' overhangs.
- Restriction enzymes suitable for use typically have at least 4-base cleavage sites. Enzymes having 5-, or 6-base cleavage sites are also suitable.
- Non- limiting examples of suitable restriction enzymes include the following 4-base cutters: CvDI, MnII, AM, BsuFI, HapII, Hpall, Msel, Mspl, AccII, BstUI, BsuEI, FnuDII, Thai, Bce243I, BsaPI, Bsp67I, BspAI,BspPII, BsrPII, BssGII, BstEIII, BstXII, Cpal, CviAI, DpnII, FnuAII, FnuCI, FnuEI, Mbol, Mmell, MnoIII, Mosl, Mthl, Ndell, NfII, NIaII, NsiAI, Nsul, Pfal, Sau3 AI, SinMI, Hhal, HinPI, BsuRI, Haelll, NgoII, CviQI, Rsal, Taql, and TthHBI.
- Amplification of the DNA fragments to provide SMRT pools can be achieved using one of a number of amplification techniques known in the art that are suitable for high throughput generation of amplified fragments. Both strands of the fragment may be amplified, or one strand only may be amplified. In one embodiment of the invention, amplification of the DNA fragments is achieved using a PCR-based method.
- Suitable PCR-based methods include, but are not limited to, degenerate oligonucleotide-primed PCR (DOP-PCR; which utilizes partially degenerate primer sequence (6 out of 21) and repeated thermocycling (see Telenius, et al., Genomics 13(3):718-25, 1992)), primer- extension preamplification (PEP) (Zhang et al., Proc. Natl. Acad. ScI USA 89:5847- 5851, 1992), random primer PCR and ligation-mediated PCR (LM-PCR) methods.
- DOP-PCR degenerate oligonucleotide-primed PCR
- PEP primer- extension preamplification
- LM-PCR random primer PCR and ligation-mediated PCR
- One or more rounds of amplification can be conducted.
- the products of a first amplification reaction can be used as a template for a second round of amplification to increase the yield of the SMRT pools. Further rounds of amplification can be conducted if desired.
- amplification of the restriction fragments is conducted by a LM-PCR protocol, in which a known sequence (either an adaptor or a synthetic linker) is attached to the ends of the DNA fragments, thus providing primer binding sites for PCR amplification.
- the DNA can then be amplified by PCR using primers that are complementary to the sequence of the adaptor or linker.
- a variety of methods known in the art for attachment of linker oligonucleotides to the genomic DNA fragments can be used.
- the linkers are attached to the genomic DNA fragments using a DNA ligase according to procedures known in the art or provided by the manufacturer of the ligase enzyme.
- the linking oligonucleotides that are attached to the ends of each fragment may comprise the same or different sequences and can be ligated to the fragments in separate reactions, or simultaneously. If desired, the linking oligonucleotides can contain one, or a plurality of, restriction enzyme recognition sequences.
- the linking oligonucleotides may also be modified, such as by the inclusion of a moiety which is a first member of a binding pair, to allow binding of the fragment by a second member of the binding pair coated on the surface of a solid support.
- binding pair as used herein, means a pair of moieties whose physiochemical properties are known and can be exploited to allow specific, mutual binding to, or interaction with, the other member of the binding pair.
- PCR is used to amplify the DNA fragments after ligation of two different linker oligonucleotides to the fragments.
- one of the primers for the amplification reaction is modified by inclusion of one member of a binding pair.
- the primer oligonucleotides are complementary to one or both linker oligonucleotides attached to the ends of each DNA fragment.
- the primer oligonucleotides may be modified, such as by the inclusion of a moiety which is a first member of a binding pair, to allow binding of the DNA by a second member of the binding pair coated on the surface of a solid support as described above.
- LM-PCR is employed to prepare the SMRT pools and a dilution-processing step is included after ligation of the linkers to the fragments and prior to the amplification step.
- an additional round of amplification is included and a second dilution-processing step is employed in which the products of the first round of amplification are diluted prior to the second round of amplification.
- one or more detectable label may be incorporated into the DNA fragment during or after the amplification reaction.
- Detectable labels are molecules or moieties a property or characteristic of which can be detected directly or indirectly and are chosen such that the ability of the nucleic acid molecule to hybridise with its target sequence is not affected. Methods of labelling nucleic acid sequences are well-known in the art (see, for example, Ausubel et ah, (1997 & updates) Current Protocols in Molecular Biology, Wiley & Sons, New York).
- Labels contemplated by the present invention include directly detectable labels, such as radioisotopes, fluorophores, chemiluminophores, enzymes, colloidal particles, fluorescent microparticles, intercalating dyes such as SYBR green or ethidium bromide and the like.
- directly detectable labels may require additional components, such as substrates, triggering reagents, light, and the like to enable detection of the label.
- the present invention also contemplates the use of labels that are detected indirectly.
- Indirectly detectable labels are typically pairs of binding members one of which is attached or coupled to a directly detectable label. Non- limiting examples of suitable binding pairs are provided above.
- the SMRT pools generated by the method of the present invention comprise a quantity of DNA fragments of varying size, depending on the way the original clone was reduced into fragments.
- the SMRT pools comprise DNA fragments from about 50 bp to about 5000 bp.
- the SMRT pools comprise DNA fragments from about 100 to about 2000 bp.
- the quantity of DNA fragments produced by the methods of the invention will be dependent on the starting ocncentration of genomic clone and also on the number pf amplification reactions that are conducted.
- the methods described herein provide for the generation of SMRT pools comprising between about 20 ⁇ g to about 100 ⁇ g of DNA.
- the SMRT pool comprises between about 40 ⁇ g to about 50 ⁇ g of DNA.
- the SMRT pools of the invention may be stored in a multi-well format to facilitate the preparation of SMRT arrays therefrom.
- SMRT pools representing different chromosomes can be stored in separate multi-well plates such that certain plates correspond to a particular chromosome.
- SMRT pools representing regions of the genome involved in a particular disease or condition can be stored together in one or more multi-well plate.
- the SMRT pools can be prepared for storage by conventional techniques, for example, by refrigeration, by freezing, either directly or after addition of a suitable cryoprotectant, such as glycerol or dimethyl sulphoxide (DMSO), by lyophilsation, or similar procedure.
- a suitable cryoprotectant such as glycerol or dimethyl sulphoxide (DMSO)
- Storage can be at room temperature, under refrigeration (for example, at 4 0 C), or under freezing conditions (for example at -2O 0 C, -7O 0 C or -8O 0 C). Selection of appropriate storage techniques and conditions is considered to be within the skills of a worker in the art.
- SMRT pools require multiple steps and generally takes place in a multi-well format to facilitate high-throughput. Quality control procedures can be used, therefore, to confirm sequence identity, detect any plate exchanges or mis- labelling, and/or to assess any well-to-well contamination.
- DNA restriction digest fingerprint analysis DNA restriction digest fingerprint analysis
- FISH fluorescence in situ hybridization
- Protocols for these methods are known to a skilled worker (see, for example, Sambrook, et ah, Molecular Cloning: A Laboratory Manual, 2d ed. (1989) Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N. Y.; Ausubel et al. (eds.), Current Protocols in Molecular Biology, J. Wiley & Sons, New York, NY).
- quality control of SMRT pools using DNA restriction digest fingerprint analysis can be achieved by digesting the genomic clone corresponding to that used to generate the SMRT pool with an appropriate restriction enzyme and running the digest on an agarose gel.
- the SMRT pool is then labelled with a detectable label and used as a hybridization probe in a Southern hybridization procedure. If the labelled SMRT pool is able to detect all the digest fragments of the genomic clone by hybridization and does not hybridize substantially to negative control samples, the identity of the SMRT pool is considered to be verified.
- Quality control using FISH analysis involves labelling a SMRT pool with a detectable label and subsequent hybridization of the labelled SMRT pool to a metaphase chromosome preparation. This procedure allows the mapping of a SMRT pool to a chromosomal region and, therefore, provides a crude quality control step. FISH analysis does not, however, provide positive verification that a SMRT pool is derived from a specific genomic clone.
- DNA sequencing can be employed for quality control by removal of a sample of a SMRT pool and submitting this to standard DNA sequencing techniques using end primers designed to be complementary to vector sequences flanking the genomic insert in the clone used to generate the SMRT pool.
- This approach is successful because within the large collection of amplified products from one SMRT pool, a subset of fragments contain vector sequence followed by a short stretch of unique sequence terminating at the most proximal site of fragmentation.
- An appropriate primer is selected for the sequencing reaction based on the type of vector contained in the original genomic clones. The primers are selected to anneal to a region of the vector that is proximal to the insert contained in the clone such that it can be used to generate sequence data relating to the insert.
- the primer sequences are complementary to a region in close proximity to the multiple cloning site of the vector.
- Methods of identifying and preparing such primers are known to a worker skilled in the art.
- Many suitable primers are also available commercially. Examples of suitable commercially available primers include, but are not limited to, T7 primer, SP6 primer, Ml 3 forward primer and Ml 3 reverse primer.
- Standard sequencing protocols and kits which are based on chain termination methods employing radioactive or fluorescent labels, can be employed to carry out the sequencing step.
- Cycle sequencing which is a modification of this procedure in which the chain terminators are incorporated using PCR, is also suitable for use as a quality control procedure in accordance with the present invention.
- kits for sequencing DNA including cycle sequencing kits, are commercially available. Examples include, but are not limited to, Applied Biosystems Big Dye Cycle Sequencing kit.
- the methods for separation and detection of labelled DNA fragments generated during the sequencing reaction are typically automated and can be performed using DNA sequencers and analyzers that are also commercially available. These methods generally require purification of the template prior to sequencing in order to remove any nucleotides or primers that may be carried over from prior manipulations of the template.
- sequence information regarding a SMRT pool can be analyzed against the appropriate genome sequence, for example, using the tools provided by the National Center for Biotechnology Information (NCBI) such as BLASTN. Scripts can be written for use with BLAST to allow multiple sequences to query BLAST simultaneously.
- NCBI National Center for Biotechnology Information
- Quality control procedures can be used to verify all of the SMRT pools generated in the production of an SMRT library.
- selected SMRT pools can be verified only, which will allow for detection of plate switches and flips that may occur during handling. This can be achieved, for example, by verifying one sample from each archived row or selected samples from each plate.
- quality control can be accomplished by sequencing as few as three samples from each 96-well plate.
- all of the SMRT pools in an SMRT library are verified by quality control procedures.
- selected SMRT pools from each multi-well plate in the library are verified by quality control procedures.
- 8 SMRT pools from each multi-well plate in the library are verified by quality control procedures.
- 3 SMRT pools from each multi-well plate in the library are verified by quality control procedures.
- a high-throughput end sequence analysis protocol is employed as a quality control procedure.
- the protocol uses cycle sequencing and involves the end sequence analysis of less than 200 bp of the genomic insert in order to confirm insert identity.
- the first factor is that less than 200 bp, and in general less than 50 bp, of the end sequence of the insert are required to be analyzed in order to confirm insert identity, whereas typical sequence reads are between about 200 bp and about 600 bp. The minimum required sequence read is 17 bp.
- the second factor is that, rather than conduct a purification of the template to be amplified to remove interfering excess nucleotide and primers, primer annealing is conducted in the presence of contaminating amplification primers.
- the third factor is that a higher amount of template is used, and the fourth factor is that a greater number of thermal cycles is employed in the sequencing reaction (for example, about 65 or more cycles versus the standard of 25 to 35 cycles).
- sequence analysis of SMRT pools is enabled by the fact that within each SMRT pool is a subset of amplification products that contain vector sequence followed by a short stretch of unique sequence corresponding to the genomic insert and terminating at the most proximal site of fragmentation.
- the targeted sequence read ranges from at least 17 bp to about 200 bp in size.
- the targeted sequence read ranges from 17 bp to about 100 bp in size. In another embodiment, the targeted sequence read ranges from 17 bp to about 50 bp in size.
- the high- throughput end sequence analysis protocol of the present invention employs unpurified template for the sequencing reaction. The sequence analysis is carried out using an aliquot of the selected SMRT pool without the need to remove excess nucleotides and primers remaining in the pool from the one or more amplification step.
- the SMRT pools can be sequenced after resuspension in spotting solution just prior to spotting on the array, provided that the spotting solution does not contain components that interfere with the sequencing reaction.
- a SMRT pool can be sequenced after one round of amplification, or after a second, or subsequent, round of amplification.
- the use of unpurified template is also facilitated by modifications made to the cycle sequencing protocol employed in the sequencing reaction. These modifications comprise increasing the amount of template used in the reaction and increasing the number of cycles in the reaction.
- the number of PCR cycles in the extension reaction is increased substantially over the number of cycles described in protocols found in the art, which typically range from about 25 to about 50.
- the number of cycles used in the sequencing reaction is between about 65 to about 100.
- the number of cycles used in the reaction is between about 65 and about 95.
- the number of cycles used in the reaction is between about 65 and about 90.
- the number of cycles used in the reaction is between about 75 and about 100.
- the number of cycles used in the reaction is between about 85 and about 100.
- the number of cycles used in the reaction is between about 85 and about 95, and between about 85 and bout 90.
- the amount of DNA template used in the sequencing reaction can vary from 10 fmol to 75 fmol. In one embodiment, the amount of DNA template added to the sequencing reaction is between about 20 fmol and about 50 fmol. In another embodiment, the amount of DNA template added to the sequencing reaction is between about 30 fmol to about 40 fmol. In an alternative embodiment, the amount of DNA template added to the sequencing reaction is between about 1 ng to about 50 ng. In one embodiment, the amount of DNA template added to the sequencing reaction is between about 5 ng to about 40 ng. In another embodiment, the amount of DNA template added to the sequencing reaction is between about 15 ng to about 30 ng.
- the high throughput end sequence analysis employs about 4% of each unpurified SMRT pool as the DNA template in the sequencing reaction and a cycle sequencing kit. The protocol supplied by the manufacturer of the kit is followed but the number of PCR cycles in the extension reaction is increased to about 85. In another embodiment of the invention, the high throughput end sequence analysis employs an amount of each unpurified SMRT pool that corresponds to approximately 20 fhiol of DNA. In a further embodiment, the number of PCR cycles in the extension reaction is increased to about 95.
- a sub-library comprises a sub-set of SMRT pools that make up the SMRT library.
- the selected SMRT pools can be, for example, pools that correspond to a specific region of the genome that is of interest, such as one or more chromosomes, an arm of a chromosome, or particular regions of the genome known to be involved in disease, drug resistance or susceptibility, or the like. If the specific
- SMRT pools required for the sub-library are known then they can readily be removed from the SMRT library and transferred to an alternative multi-well container or spotted onto a solid support to provide an array (see below). Similarly, if the location of the region(s) or chromosome(s) of interest on the physical map of the genome is known, then the appropriate genomic DNA clone(s) in the tiling set can be identified and the SMRT pools derived from these clones can then be selected from the library. Alternatively, a library can be screened for SMRT pools corresponding to a region of interest, for example, by hybridization techniques, and the appropriate SMRT pools selected.
- SMRT arrays Once the library of SMRT pools has been generated, it can be used for the preparation of a SMRT array or for the selection of probes for various applications, for example, Southern hybridization or FISH analysis. 1.0 SMRT arrays
- a SMRT array can be prepared by spotting each member of the library, or libraries, of SMRT pools onto one or more solid support in an arrayed configuration, using standard techniques known in the art, wherein each point of the array corresponds to a SMRT pool.
- the SMRT pools are typically precipitated, for example, with ethanol and then resuspended directly in a suitable spotting solvent, for example, 20% DMSO, 50% formamide.
- the SMRT pools of the SMRT library can be deposited onto the solid support.
- the SMRT pools can be deposited in random order, or in a specific order, for example, according to their map position. Methods of deposition in array construction are known in the art.
- an array is constructed by binding nucleic acid molecules of the SMRT pools to a solid support in an ordered spatial arrangement so that each SMRT pool is present at a specified location on the support.
- the solid support can be a membrane, such as a nylon membrane, activated nylon membrane or nitrocellulose membrane, a filter, a chip, a glass slide, or other suitable solid support [see, for example, U.S. Pat. No. 5,837,832; PCT application WO95/11995; Lockhart, D. J., et al, (1996) Nat. Biotech, 14:1675-1680; Schena, M., et al, (1996) Proc. Natl. Acad. Sci. USA, 93:10614-10619; U.S. Pat. No. 5,807,522].
- the SMRT arrays can also cover an entire genome or portion of a genome.
- a subset of pools from a SMRT library (a sub-library) may be used to generate the SMRT array.
- the SMRT arrays can contain more than one SMRT library if desired, and thus can cover more than one genome.
- SMRT arrays can comprise combinations of one or more SMRT libraries and other types of nucleic acids.
- examples of other types of nucleic acids include, for example, viral DNA, plasmid DNA, and oligonucleotides.
- a SMRT array can comprise a SMRT library representing a genome or portion of a genome in addition to a series of oligonucleotides designed to increase the resolution of an array in a particular region of the genome, or a SMRT library representing a genome or portion of a genome in addition to genomic DNA of a relevant virus, bacteria or organelle(s) associated with a particular disease state.
- a SMRT array can comprise a SMRT library in combination with one or more control or reference DNA sequences that allow for identification, orientation, or normalization of the results generated using the array.
- the present invention also contemplates genome- wide high resolution arrays, which can be used to analyze specific regions of interest in the genome either by blocking portions of the array from exposure to hybridization solution, or by only analyzing portions of the array that are of interest.
- SMRT arrays may be constructed in a low or high density format.
- the term "high density' as used herein with reference to an array, means that the array comprises more than about 60 different SMRT pools per cm 2 .
- a high density SMRT arrays comprises more than about 100 different SMRT pools per cm 2 .
- a high density SNRT array comprises more than about 600 different SMRT pools per cm .
- a high density SMRT array comprises more than about 1000 different SMRT pools per cm 2 .
- a high density SMRT array comprises more than about 5,000 different SMRT pools per cm 2 .
- a high density SMRT array comprises more than about 10,000 different SMRT pools per cm 2 .
- a high density array provides for rapid, essentially simultaneous, evaluation of a number of hybridizations in a single test.
- High density arrays can be prepared using robotic spotters to deposit DNA samples onto one or more solid support. Such robotic spotters use high-grade stainless steel pins to pick up samples and then deposit them in the correct locations on the support. Robotic spotters are commercially available, for example, from Virtek Biotech, or Telechem. Each SMRT library can be spotted on the array once or in multiplicate. In one embodiment the SMRT library is spotted in triplicate. In another embodiment, the SMRT library is spotted in duplicate.
- One embodiment of the invention provides for a high-density array comprising one or more SMRT library.
- the resolution of the high-density array is between 0.03 and 1 Mb.
- the resolution of the high-density array is between 0.05 and 0.08 Mb.
- the resolution of the high-density array is 77 Kb (0.077 Mb).
- a sub-Megabase resolution SMRT array representing a complete minimal tiling set across the sequenced human genome is provided.
- the present invention also provides for a two-stage SMRT array system.
- the first stage of such a system comprises a low resolution, genome- wide SMRT array and the second stage comprises one or more chromosome-specific or region-specific (for example, a disease-specific region) high resolution SMRT arrays.
- the first low resolution array can be used for the initial localization of genetic alterations by gross mapping of altered regions and typically comprises between about 50 and about 200 array points per chromosome arm and used for gross mapping of altered regions.
- the second high-density (high resolution) array(s) comprise about 500 to 10,000 array points. Once the altered regions have been mapped using the low resolution array, an appropriate high-resolution array is selected and used to facilitate fine mapping of the altered regions.
- the present invention further contemplates SMRT arrays that are disease-specific and comprise a tiling subset wherein each point comprises an SMRT pools covering a region of interest.
- SMRT arrays that are disease-specific and comprise a tiling subset wherein each point comprises an SMRT pools covering a region of interest.
- a subsequent SMRT array can be generated comprising SMRT pools determined to be in a region of interest. This allows the high density array to be employed to discover novel regions or patterns of genetic alterations and then generate a refined SMRT array which can reduce costs for commercial applications.
- SMRT arrays can also be used for preparation of sub-libraries as described above. Arrays of a sub-library of SMRT pools could be spotted directly onto a solid substrate using the original SMRT library housed in a multi-well container by programming the array printer to select samples only from those wells containing the desired SMRT pools.
- SMRT arrays of the present invention are useful in a variety of clinical and research settings.
- SMRT arrays can be used to analyze genetic alterations (polymorphisms, chromosomal rearrangements and translocations), DNA copy number changes, epigenetic changes such as changes in methylation, changes in gene expression and the discovery of novel genes relevant to disease as well as the identification of known genes that are related to specific diseases, and evolutionary genomic changes.
- the SMRT arrays can also be used in combination with chromatin immunoprecipitation to locate chemical modifications of chromatin, identify chromosome targets of proteins involved in DNA binding or chromatin remodelling, and to identify the sites throughout the genome at which DNA binding proteins interact. This type of information can be used to determine patterns of genomic alteration that may be indicative of disease.
- the SMRT arrays can be used as tools to diagnose diseases, enable the selection of treatment regimens, and predict resistance to particular treatments.
- Array-based CGH detects gain or loss of chromosomal regions through competitive hybridization of test and reference samples to the array.
- the test and reference samples are distinctly labelled, for example by chromophores or fluorophores of different colours or with different emission spectra, so as to be distinguishable from each other.
- the signal ratio from the chromophores or fluorophores indicates levels of hybridization to the array, and indicates an increase or reduction in the copy number of the corresponding test DNA sequence. For example, duplication in a region of the test sample will result in that sample 'competing out' the corresponding reference sample in a competitive hybridization, while deletion in a region of the test sample will result in that sample being 'competed out' by the corresponding reference sample in a competitive hybridization.
- An increase in the relative signal strength, or an absence of signal, from the test sample at a given array point indicates a duplication or deletion in the corresponding region of the sample DNA.
- the resolution of the array determines the accuracy with which the rearrangement
- this technique has been used to determine DNA copy number variation in a test genomic DNA sample.
- This type of information can be used to identify and characterize genes that are related to human disease.
- the SMRT arrays of the present invention allow high resolution mapping of amplifications and deletions in the genome, and are thus able to detect microdeletions and microamplifications.
- This SMRT arrays can, therefore, be used in CGH-based genomic profiling techniques to diagnose diseases, select treatment regimens, predict the occurrence of drug resistance, or radiation resistance, or predict side-effects of treatments.
- the SMRT array-based CGH experiments can be used to correlate pharmacogenomic and toxicogenomic studies with an individuals genetic profile, thus allowing clinicians to optimize response to therapies. For example, it has been determined (van t'Veer et al. Nature 415(6871), 530-536) that breast cancer patients with the same disease state responded differently to treatment, depending on their genetic profile.
- the SMRT arrays of the present invention can also be used to identify genes that are related to specific diseases thus allowing the selection of specific gene targets for drug discovery.
- the SMRT arrays of the present invention can also be used as a tool for detecting epigenetic changes through DNA methylation of CpG islands, which may be a separate, causative mechanism for various diseases, including tumor progression.
- a variety of previously published protocols are capable of distinguishing genes with abnormal methylation within total genomic DNA, for example, methylation sensitive AP-PCR (Huang , et al. 1997 Cancer Res. 57(6):1030-1034; Gonzalgo, et al, 1997 Cancer Research 57(4):594-599; Liang, et al, 1998 Genomics. 53(3):260-8) and Genomic Mismatch Scanning (GMS), and other subtractive techniques (Nelson, et al. Nat Genet. 1993 May;4(l):l 1-8).
- methylation sensitive AP-PCR probes can be generated from AP-PCR products of disease samples and normal samples and competitively hybridized to a human SMRT array. Ratio differences between these probe populations would indicate regions of altered methylation.
- GMS subtracted non-mismatched heterohybrids can be competitively hybridized with normal DNA to a SMRT array. In either case, use of a SMRT array in combination with the techniques described above, provides a template from which to comprehensively assess methylation differences in multiple human diseases.
- FIG. 25 A schematic diagram of an exemplary method of analyzing methylation differences between samples is depicted in Figure 25. Variations of this method known to the skilled worker are contemplated within the scope of the invention.
- SMRT arrays can also be used to analyze chemical modifications of chromatin and the sites at which DNA-binding proteins interact within the genome.
- the sites within the genome at which a DNA-binding protein of interest interacts can be determined as follows. In vivo protein-DNA interactions are preserved by cross- linking and then fragmented to reduce the genome to DNA fragments of manageable size. Immunoprecipitation is then carried out using an antibody against the protein of interest, which co-immunoprecipitates DNA that it is cross-linked to. The DNA fragments are isolated and the cross-linking reversed. The fragments are then amplified, labelled, and hybridized to the SMRT array in order to locate the sites at which the protein of interest binds.
- the SMRT arrays of the present invention can be used to analyze transcriptional activity at the genomic level. It is possible to simultaneously analyze the transcription level of all the genes in a genome.
- the target RNA transcripts of all the genes in a test sample (collectively called the 'transcriptome') are first transcribed into corresponding cDNAs, which are labelled with a first label.
- the target cDNA library represents the full transcriptome.
- a corresponding cDNA library is prepared from a reference transcriptome, and labelled with a second, distinguishable label.
- the test and reference cDNA libraries are then competitively hybridized against the SMRT array library.
- the ratio of labelling intensities at each probe array point indicates the transcriptional activity corresponding to a region of genomic DNA of the sample relative to the corresponding gene in the standard. In this way, the global transcriptional activity of the sample can be mapped to specific regions of the genome.
- test cDNA can be hybridised to the SMRT array in the absence of a reference cDNA.
- a software tool called SeeGH can be used to view and analyze array CGH data.
- the software gives users the ability to view the data in an overall genomic view as well as magnify specific chromosomal regions facilitating the precise localization of genetic alterations.
- the application translates spot signal ratio data from array CGH experiments to displays of high resolution chromosome profiles. Data is imported from a simple tab delimited text file obtained from standard microarray image analysis software. SeeGH processes the signal ratio data and graphically displays it in a conventional CGH karyotype diagram with the added features of magnification and DNA segment annotation.
- SeeGH imports the data into a database, calculates the average ratio and standard deviation for each replicate spot, and links them to chromosome regions for graphical display.
- users have the option of hiding or flagging DNA segments based on user defined criteria, and retrieve annotation information such as clone name, NCBI sequence accession number, ratio, base pair position on the chromosome, and standard deviation.
- annotation information such as clone name, NCBI sequence accession number, ratio, base pair position on the chromosome, and standard deviation.
- Detailed information regarding this software is found in Chi et al. (2004) BMC Bioinformatics 5:13.
- the present invention comprises software for the analysis of data, such as signal intensity and array co-ordinate data generated by use of the SMRT array of the present invention.
- FISH fluorescent in situ hybridization
- FISH analysis using SMRT pools as probes comprises the steps of 1) labeling selected SMRT pools with a chromophore or fluorophore, for example using the random primer extension method; 2) hybridizing the labelled DNA with a metaphase nucleus preparation; and 3) detecting the presence or absence of probe hybridization to the metaphase nucleus preparation.
- Hybridization of the probe to an inappropriate chromosome or inappropriate region of the chromosome indicates a rearrangement event, while the lack of hybridization indicates a deletion event.
- the FISH technique is routinely used to identify genomic rearrangement events associated with cancers and other genetic diseases.
- the SMRT pools can be used to prepare probes for nucleic acid hybridization techniques. Examples of such hybridization techniques include Southern, Northern and Southwestern blot hybridization. Procedures for labelling such probes are known in the art.
- the present invention additionally provides for kits comprising the SMRT sets,
- kits may further comprise reagents for preparing and/or re-amplifying the SMRT pools.
- kits can optionally include amplification reagents, reaction components and / or reaction vessels.
- One or more of the reagents provided in the kit can incorporate a detectable label, or the kit may include reagents for labelling target sequences.
- One or more of the components of the kit may be lyophilised and the kit may further comprise reagents suitable for the reconstitution of the lyophilised components.
- the present invention further provides for kits comprising one or more SMRT arrays. These can be provided in a form ready to be applied to a solid substrate or they can be provided as pre-assembled arrays.
- the kits may additionally contain buffers, labels, and other reagents to facilitate the preparation or use of the arrays, including, for example, buffers and solutions for the preparation of a test sample, extraction of nucleic acids, purification of nucleic acids and the like.
- kits can additionally contain instructions for use, which may be provided in paper form or in computer-readable form, such as a disc, CD, DVD or the like.
- kits described above may be provided as part of a package that includes computer software to analyse data generated from the use of the kit.
- EXAMPLE 1 CLONE SELECTION FOR CONSTRUCTION OF A BAC CLONE SMRT SET
- the human BAC fingerprint-based physical map (15 Nov 2001, McPherson, J.D. et al. A physical map of the human genome. Nature 409, 934-41 (2001)) generated at Washington University Genome Sequencing Centre was used for the selection of BAC clones for the tiling set.
- the fingerprint map is a manually curated and mature data set which covers 96% of the genome at an average depth of 15X. The redundancy of clone coverage was used to specify desired clone overlap and size characteristics with the goal of achieving representation of every region in the map.
- BAC clones were chosen from each of the 726 contigs to provide maximum coverage of the fingerprint map.
- Clones not assigned to contigs as well as non-canonical were excluded from candidacy. Clone selection exercises were restricted to the readily available RPCI-11 and RPCI- 13 (Osoegawa, K. et al. A bacterial artificial chromosome library for sequencing the complete human genome. Genome Res 11, 483-96 (2001), and Caltech D1/D2 (Lander, E.S. et al.
- GigAssembler Genome Res 11, 1541-8 (2001) were selected preferentially. Clones with BES hit coordinates that were inconsistent with their position in the fingerprint map were avoided in cases where equivalent map coverage could be obtained by selecting another clone. During the selection process, an attempt was also made to enrich the clone set with clones having either existing FISH information (Cheung, V. G. et al. Integration of cytogenetic landmarks into the draft sequence of the human genome. Nature 409, 953-8 (2001)) or sequence data.
- the BAC clones were selected as follows. Starting from the left end of each map contig, the first canonical clone from the ordered set was always selected. The next pick was chosen to have as close to, but no fewer than, 4 conserved bands with the previous pick. conserveed bands are defined as bands present in the HindIII fingerprints of two overlapping clones and in the fingerprints of all clones located between them. conserveed bands emanate from the same DNA and their use minimizes false positives in determining clone overlap, since bands found in multiple adjacent intermedial clones in the ordered map represent the same digest fragment.
- Clones smaller than 100 kb or larger than 200 kb, or having fewer than 20 or more than 50 HindIII fragments were chosen only where map coverage could not be provided by other eligible clones. No clones smaller than 15kb or with fewer than 5 sanitized HindIII fragments were chosen. Clones with unique end sequence hits to the assembly were preferred, with clones having both ends aligning to the sequence assembly chosen in priority.
- the first round of map-based clone selection as described in Example 1 yielded 29,035 clones representing 99% of the map. All clones were digested using HindIII and fingerprints were generated as described in Marra, M. A. et al. High throughput fingerprint analysis of large-insert clones. Genome Res 7, 1072-84 (1997). All validation fingerprints were compared in an automated fashion to those stored in the physical map. The fingerprints in the map are sanitized- all fragments closer than 7 standard mobility units (this length unit corresponds to a size tolerance of 0.5% at 5kb, 3% at 20kb, 5% at 25kb) have been replaced with a single fragment.
- Clones with rank n had n- ⁇ map clones which were more similar than corresponding map clone (i.e., had a lower Sulston score reflecting a smaller probability of coincidental overlap). Fingerprints of clones with rank >3 were visually examined (5,272 clones; including 2,784 clone fingerprints identified by automated analysis as potential mismatches). Fingerprints for 1,978 clones in the set did not match their corresponding fingerprints in the physical map. The discrepancies could be categorized as resulting from clone tracking errors either during the generation of the fingerprint map or in the generation of the rearrayed clone set (1,143 clones), from cross-well contamination, or from situations in which the fingerprint process failed (835 clones).
- a second round of clone selection was performed to maintain the coverage represented by the 1,978 failed clones. For each failed clone, neighbouring clones were selected from the map to provide equivalent coverage. In total, 4,531 clones were selected as replacements. These clones were sampled from RPCI-Il, RPCI- 13 and Caltech-D, in roughly the same proportion as for the final set (87%:8%:5%). An additional 1,258 clones were selected to close gaps larger than 10kb based on the June 2002 UCSC assembly. Approximately 755 of these clones were not in the physical map. A second round of fingerprint verification performed on the replacement clones identified 413 clones that did not match their map fingerprints. These clones were rejected from the set. The final tiling set contained 32,433 clones.
- Clones for each library were first coordinated in 96-well format in the order of plate, row and column. BACs were inoculated, grown in 96-deep well blocks and kept at -80 0 C until all clones were picked. The BACs were then condensed into 384- well plates using 96-pin tools.
- 31 ,678 are in the fingerprint map, with 31,676 localized to contigs.
- the remaining 755 clones not found in the fingerprint map were selected to provide coverage of the sequence assembly based on the sequence assembly coordinates of their BES matches.
- the majority of the clones are from RPCI-11 (92%) with the remaining 2% from RPCI- 13 and 6% from Caltech libraries.
- the average clone size and HindIII fragment counts for each library are 189 kb/46 (RPCI-Il), 160 kb/37 (RPCI-13) and 146 kb/35 (Caltech-D).
- the average sizes of the tiling set members based on BES data are 176 kb for RPCI-11 clones and 140 kb for Caltech-D, indicating that the sizes of the validation fingerprints overestimate the size by 4-7%. This difference is in part due to vector-insert junction fragments, present in the fingerprints.
- Genbank Jan 2003, Benson, D. A. et al. GenBank. Nucleic Acids Res 30, 17-20 (2002) sequence records were available for 8,018 clones in the set. The records indicated 4,967 clones categorized as finished, 2,069 working draft clones, 365 in-progress clones and 569 low-pass clones. BES coordinates were available for 10,213 of the clones (31% of clones in the set), providing a localization scaffold. 1,134 clones in the set had previously generated FISH data (Cheung, V.G. et al.
- cytogenetic landmarks into the draft sequence of the human genome. Nature 409, 953-8 (2001) available through the Cancer Genome Anatomy Project (CGAP) (Strausberg, R.L., Buetow, K.H., Greenhut, S.F., Grouse, L.H. & Schaefer, CF.
- CGAP Cancer Genome Anatomy Project
- Map coverage was determined by analyzing the overlap distribution between map- adjacent clone picks and total number and depth of consensus band (cbmap) units that were covered by the rearray set.
- cbmap consensus band
- Any map clone can therefore be associated with one or more clones from the clone set which provide equivalent map coverage.
- Each unit on the cbmap unit scale corresponds to a single detected fingerprint fragment and cbmap distances cannot be directly related to a distance in sequence coordinate space.
- Each clone in a fingerprint map contig is positioned relative to all other clones using cbmap coordinates. Regions in the fingerprint map that do not have representation in the clone tiling set may indicate gaps in coverage.
- This ratio is not unity because the map was generated using sanitized fingerprints and because fragments are only reliably sized in the range of 0.6-30kb. Using this ratio, it was estimated that the clone tiling set did not cover approximately 11,000 restriction fragments (1.4%) out of a total of about 800,000 fragments in the sequence. This is likely to be an overestimate of the actual gaps because about 20% of fragments are outside of the reliable detection range of the fingerprinting method, because the fingerprints stored in the map were sanitized and the insert therefore may extend over more fragments than appear in the fingerprint and because there are artifactual gaps in the fingerprint map in which some joins between contigs have been recognized by sequence data but not confirmed by fingerprinted clones.
- the depth of clone tiling set coverage in the fingerprint map can be approximated by the cbmap unit coverage, with the assumption that the ratio of cbmap units to sequence digest fragments is relatively constant over large map distances.
- 40% of cbmap units were covered by only one clone in the clone tiling set and another 44% were covered by the overlapping region of two clones. The remaining 15% were covered by 3 or more clones and 1.3% of the cbmap units were not covered.
- the average coverage depth based on map unit calculations was determined to be 1.8X. A similar analysis of coverage depth carried out with sequence coordinates corroborates this result (see Example 4).
- EXAMPLE 4 DETERMINATION OF SEQUENCE COVERAGE FOR THE SMRT SET
- Sequence coverage was calculated by first determining precise sequence coordinates for as many clones as possible.
- the validation fingerprints of the clones were used to localize them to the genome in the following manner: for each clone, the region of the sequence from which the clone was derived was determined using the clone's map neighbours with BES hits. Five left map neighbours and five right map neighbours were identified and their BES hits were used to demarcate a region of the genome. Only neighbours whose BES hits landed on the same chromosome as the majority of the clones in the map contig were used.
- the clone's own BES hits were not used in determining the sequence region in case the clone's map position was incorrect or the BES hit coordinates did not reflect the actual position of the clone or were actually associated with another clone (Zhao, S. et al. Human BAC ends quality assessment and sequence analyses. Genomics 63, 321-32 (2000)). Approximately 250 clones were located in 66 contigs which did not contain BES- or assembly-anchored clones. The clone neighbourhood was enlarged by 1 Mb in both directions to minimize the effect of local inconsistencies.
- the neighbourhood assembly was digested in silico and the fingerprint of a sliding window of 100 fragments, created every 10 fragments, was matched to the clone's fingerprint.
- a sliding subwindow having 5 more fragments than the clone, was created every 2 fragments.
- the fingerprint matching was performed with a tolerance of 2% for fragments ⁇ 15kb, 3% for fragments 15-25 kb and 4% for fragments >25kb in size. This tolerance profile approximates a standard mobility tolerance of 7, the cutoff used to generated the fingerprint map.
- the subwindow which matched the most fragments was used to determine the clone coordinates.
- the clone was determined to start/end at the first fragment which was part of a 6 matched-fragment run, with no more than 3 unmatched fragments in between.
- These in silico anchors were accepted only in cases where (i) over 80% of the fragments in the anchor matched the clone's fingerprint, (ii) over 40% of the fragments in the clone matched fragments in the subwindow (in regions of the assembly where sequence is incomplete only a fraction of the insert may match), (iii) the anchor was at least 10 fragments in size and (iv) the anchor could not be larger than the clone by more than 50% of the clone's map length.
- the average in silico anchor size was 47 ⁇ 11 fragments with 95 ⁇ 4% matched fragments.
- the average number of fragments shared between the clone and its anchor was 84 ⁇ 8% (151 ⁇ 26kb).
- the fingerprint-based method provided an increase in the precision of localization for 29,539 clones.
- the final sequence coordinate for a clone was taken as the BES coordinates, where available, in preference to the fingerprint-derived coordinates. Where BES coordinates were not available, fingerprint-derived coordinates were used. Finally, if those could not be located, Golden Path clone assembly coordinates (Kent, WJ. & Haussler, D. Assembly of the working draft of the human genome with Gig Assembler. Genome Res 11, 1541-8 (2001)) were used, if available. Clone assembly coordinates were chosen only if other coordinates were not available because they may significantly underestimate the size of the clone in cases where the entire insert was not sequenced. In a similar fashion but to a lesser extent, the in silico coordinates were generated conservatively and may underestimate the extent of the insert of the clone.
- the fingerprint map was used in the following manner: for every fingerprint map contig, an undirected graph of overlapping clones was created, separating the contig into strongly connected components. Two clones shared an edge in the same component if they had at least 4 conserved bands and overlapped by more than 5 cbmap units. Clones which did not overlap by 5 cbmap units had to share at least 6 conserved bands. For each strongly connected component, the left most and right most clone with sequence coordinates was used to locate the component within the assembly and this region was considered covered by the tiling set.
- the average sequence overlap between neighbouring clones is 83 ⁇ 44 kb (mean ⁇ std. dev.) which corresponds to 50 ⁇ 26% of the length of the clones.
- the average coverage depth of the clone set based on sequence coordinates is 1.8X.
- the ratio of IX to 2X coverage is approximately 1:1, with 1.077 Gb of the assembly covered by regions spanned by a single clone and 1.141 Gb by regions spanned by two overlapping clones.
- Coverage at 3X spans 0.350 Gb and deep coverage of 4X+ spans 0.151 Gb.
- Coverage at high depth in the tiling set occurs in regions where additional clones were added to the set to replace clones that failed validation, as it was not always possible to find a single clone providing equivalent coverage during the replacement process.
- gaps in sequence coverage in the clone set totaling 24 Mb.
- the gaps are formed by regions for which coverage cannot be achieved by using RPCI-11, RPCI- 13 and Caltech-D clones. Some of these gaps were formed by removing the 413 clones from the set which failed the second round of validation. Replacements for these clones can be added to minimise the gaps in sequence coverage.
- telomere regions were evaluated using 164 unique BAC telomeric markers from RPCI-11/13 and Caltech-D from the Human Telomere Sequencing and Mapping Project (Ning, Y. et al. A complete set of human telomeric probes and their clinical application. National Institutes of Health and Institute of Molecular Medicine collaboration. Nat Genet 14, 86-9 (1996)).
- the SMRT set contains 45 of these telomeric BACs 5 and remaining BACs overlap with the best match from the clone set at an average of 100 kbp (22 shared fingerprint fragments).
- Telomere regions are known in the art to be difficult to isolate and sequence, therefore, precise determination of telomere representation is difficult. As the SMRT set represents full coverage of the fingerprint map the issue of telomere and centromere coverage may be mitigated, although some genomic regions unclonable in BACs or not present in the map may not be represented in this particular SMRT set.
- BAC genomic DNA corresponding to the 32,433 BAC genomic DNA clones in the SMRT set prepared as described in the previous Examples was prepared as follows:
- Bacterial cultures containing BAC DNA genomic clones were grown in 96- well blocks according to standard protocols. Solutions were prepared as follows for each 96 well block. For solution I (GET/RNaseA, 150 ⁇ g/ml), 330 ⁇ l of 10 mg/ml RNaseA was mixed with 21.67 ml of cold GET. GET was measured and RNaseA was added, the cylinder covered with parafilm and inverted several times to mix and stored on ice. Solution II (1.0% SDS/0.2 N NaOH) was prepared as follows. 2.2 ml of 10% w/v SDS was added to an appropriately sized bottle, followed by the addition of 19.4 ml of water, and 0.44 ml NaOH. The bottle was capped and inverted to mix.
- solution I GET/RNaseA, 150 ⁇ g/ml
- 330 ⁇ l of 10 mg/ml RNaseA was mixed with 21.67 ml of cold GET. GET was measured and RNaseA was added, the cylinder covered with para
- Solution II was stored at room temperature. 96 well block containing BAC preparation were thawed for 30 minutes 200 ⁇ l of cold GET/RNAse was added and the plates sealed with Edge Biosystems clear tape. Pellets were resuspended completely, 200 ⁇ l of solution I was added and allowed to stand for 5 minutes. 200 ⁇ l of cold 3 M KOAc pH5.5 was added and the plates were resealed and shaken at 1100 rpm for 3 minutes. Plates were centrifuged for 45 minutes at 525Og.
- Hydras were precleaned with ddH2 ⁇ , and the needles gently scrubbed with a brush to remove the white, gummy residue.
- the Hydras were washed with 2% bleach then with ddH 2 O 4 times.
- Beckman blocks were washed by removing the pellet with the block washer then using 2% bleach solution to scrub the blocks three times per well with a test-tube brush.
- the blocks were rinsed thoroughly with tap water then ddH 2 0.
- the isopropanol and lysate mixture was removed from the collection plate by rapidly inverting the plate over sink and then gently shaking and blotting on paper towel. 200ul of 80% ethanol (freshly made from 95% ethanol stock) was added to the DNA pelletand then removed immediately by gentle shaking.
- the plate was blotted briefly on a paper towel then the towel changed and the plate allowed to air dry until no drops were present.
- the plates were spin-dried plates at medium heat and the pellets resuspended in 60ul sterile ddH 2 O. Plates were sealed and incubated for 10 minutes at 37 0 C. Plates were stored at 4 0 C.
- DNA transfer steps involved the use of a HydraPP (Matrix technologies).
- DNA transfer steps employed a 12 channel pipette. For each transfer the syringes were washed by pipetting (3 times) 2% bleach followed by 2 washes of ddH 2 O. At each stage of preparation 3 clones of each 96 well plate were spot-checked by running 2ul of the sample on a gel to ensure proper DNA concentration and size. In all steps involving heating above room temperature Microseal B (MJ Research) sealing pads were used to control evaporation.
- each BAC DNA sample (prepared as in Example 5) was transferred to a 96 well plate and digested for approximately eight hours with 5 U of Msel (New England Biolabs) in a 40 ⁇ l reaction. The reaction mixture was inactivated at 65°C for 10 min. Ten percent of the product was transferred to a new plate and ligated to the primer-linkers.
- the ligation mixture consisted of the digested DNA, 0.2 ⁇ M primers each of MeI long (5 1 -AGTGGGATTCCGCATGCTAGT-3 1 , SEQ ID NO: 1) and Msel short (S'-TAACTAGCATGC-S', SEQ ID NO:2) (Alpha DNA, Quebec) and 80 U of T4 DNA ligase in NEB ligase buffer (New England Biolabs). The primers were allowed to anneal for 5 minutes at room temperature before addition to the ligation mix. The ligation was performed overnight (12-16 h) at 16°C.
- PCRl PCR reaction
- the reaction mixture contained the linker-ligated DNA template, 8 rnM MgCl 2 , 1 mM each dNTPs (Promega), 0.4 ⁇ M Msel long primer (modified at the 5' end with an amino group), and 5 U of Taq polymerase (Promega, storage buffer B) in Promega PCR buffer.
- the PCR cycled at 95°C for 1 minute, 55°C for 1 minute, and 72 0 C for 3 minutes, for 30 cycles.
- a 10 minute extension at 72 0 C completed the protocol.
- PCR2 The second round of PCR (PCR2) was initiated using 2.5 ⁇ l of the diluted PCRl product under the same conditions for 35 cycles.
- the yield of final PCR product concentration was typically 40-50 ⁇ g and yielded products from approximately 100-2000 bp in length.
- the PCR products were precipitated by adding 2.5 volumes 100% ethanol and incubating for !Z 2 hr at room temperature and mixed by inverting 3 times sealed with Microseal A (MJ Research) and pulse spun down. The PCR products were then centrifuged at 275Og for 45 minutes at 4°C.
- the pellets were washed with 70% ethanol for 15 minutes, then centrifuged at 275Og for 15 minutes at 4°C. The pellets were air dried at 55 °C for about 30 minutes. The pellet was re-suspended in 25 ⁇ l distilled water and incubated overnight at room temperature. The final concentration of DNA was quantified using a ND-1000 spectrophotometer (Nanodrop, Delaware). Typical yield for LMPCR was 40-50 ⁇ g. AU products were subsequently sealed with aluminum foil tape and stored at -2O 0 C or below to inhibit evaporation. Thawing of the plates required resealing the aluminum foil tape to ensure consistent volume then pulse spinning at 800g.
- the method is a modification of terminal end sequencing, and can be practiced because within each large collection of PCR products in one SMRT pools, a pair of fragments contain BAC vector sequence starting at the Msel cut site and continuing up to the genomic DNA cloning site, followed by a short stretch of unique genomic sequence terminating at the most proximal Msel cut site (see Figure IB).
- the SMRT pools were identified by sequencing as follows.
- Half (468) of the SMRT pools yielded sequences and 448 of these were matched to specific BAC clone sequences. Twenty matched repetitive sequences, representing multiple GeneBank entries.
- the Msel restriction site must be a significant distance from the sequencing primer, preferably greater than 50 nucleotides before Msel recognizes the sequence TTAA.
- 83 SMRT pools were sequenced using the protocol outlined above. Of the 83 SMRT pools sequenced, 64 returned usable sequences and 60 of these were matched to a specific BAC. Four matched repetitive sequences, representing multiple GeneBank entries. Combining the results from the Sp6 and T7 sequence reads, it was possible to identify 76 of the 83 SMRT pools (91%). High throughput SMRT pool sequencing allows identification of 91% of the clones in a clone set when using both the Sp6 and T7 primers.
- EXAMPLE 8 IDENTIFICATION OF SMRT POOLS BY SOUTHERN HYBRIDIZATION
- EXAMPLE 9 IDENTIFICATION OF SMRT POOLS BY FLUORESCENCE INSITU HYBRIDIZATION (FISH)
- Selected SMRT pools were identified by FISH using metaphase chromosomes. Two microlitres of SMRT pools ( ⁇ 2 ⁇ g) was labelled by random priming overnight in the presence of 2 nmol of Cy3-dCTP, Cy5-dCTP (Perkin Elmer), FITC-dUTP, or Texas Red-dUTP using the BioPrime kit (Invitrogen) as per manufacturer directions. The labeled probe was purified using a Sephadex G-50 column, combined with 21 ⁇ g of Cot-1 DNA and precipitated with ethanol.
- the labeled probe was then resuspended in 80 ⁇ l of hybridization buffer (50% formamide, 2X SSC, 10% dextran sulfate, 0.1 % Tween-20, 10 mM Tris pH 7.4) and denatured for 5 min at 100 0 C.
- hybridization buffer 50% formamide, 2X SSC, 10% dextran sulfate, 0.1 % Tween-20, 10 mM Tris pH 7.4
- the metaphase slide was dehydrated through a series of 70%, 80%, and 100% ethanol washes for 2 min each, denatured in 70% formamide in 0.6X SSC for 2 min at 70°C and processed through the same ethanol series at -20°C and allowed to dry. Thirty-five microlitres of probe was then added to the slide and hybridized overnight at 37°C. Images were processed with Qcapture (Q-imaging, Vancouver) with a Zeiss Axioscope microscope.
- a SMRT array was constructed consisting of the 32,433 BAC clone SMRT set prepared as described in Examples l-4.
- the SMRT pools to be spotted on the array were prepared from the SMRT set as described in Example 5 and identified as described in Example 6.
- each of the final SMRT pools was redissolved in 75 ⁇ l of 0.83X MSP printing solution (Telechem), then denatured by boiling for 5 minutes in a PCR thermocycler, and rearrayed for robotic printing in triplicate using a VersArray Chip Writer Pro (BioRad).
- This arrayer used a 12 x 4 array of SMP2.5 Stealth Micro Spotting Pins (Telechem/ ArraylT) depositing DNA spots of 0.8 nl at approximately 1 ⁇ g ⁇ l "1 at l33- ⁇ m distances. The entire set of 32,433 SMRT pool solutions was spotted in triplicate onto two aldehyde-coated slides.
- EXAMPLE 11 ARRAY SENSITIVITY OF WHOLE GENOME SMRT ARRAY
- TAT-I lymphoma cell DNA was prepared as follows. 400 ng of test and reference DNA was labelled separately with Cyanine-3 and Cyanine-5 dCTPs according to a random priming protocol previously described (Garnis, C, Baldwin, C, Zhang, L., Rosin, M.P. & Lam, W.L. Use of complete coverage array CGH to define copy number alterations on chromosome 3p in oral squamous cell carcinomas. Cancer Res. 63, 8582-8585 (2003)).
- the DNA probes were combined and purified using ProbeQuant Sephadex G-50 Columns (Amersham) to remove unincorporated nucleotides. 200 ⁇ g of human Cot-1 DNA (Invitrogen) was added, the mixture was precipitated and resuspended in 100 ⁇ l of DIG Easy hybridization solution (Roche) containing sheared herring sperm DNA (Sigma-Aldrich) and yeast tRNA (Calbiochem). The probe was denatured at 85 °C for 10 min and repetitive sequences were blocked at 45 °C for 1 h before hybridization. Prehybridization was carried out in the same buffer.
- the probe mixture was applied to the slide surface, the coverslips were fixed and slides were incubated at 42 0 C for 36 h.
- the arrays were washed five times for 5 min each in 0.1 x saline sodium citrate, 0.1% SDS at room temperature with agitation. Each array was rinsed repeatedly in 0.1 x saline sodium citrate and dried by centrifugation.
- Genomic regions containing BCL2 (18q21) and MYC (8q24) in TAT-I were previously shown to have a twofold copy-number increase by FISH analysis
- EXAMPLE 12 RESOLUTION OF SMRT ARRAY COMPARED TO CONVENTIONAL CGH
- Test H526 cell line genomic
- reference normal male genomic DNA
- Figure 9a The results are shown in Figure 9. All patterns of gains and losses were matched, including large changes (e.g., the amplification of 7q and 8q and loss of the entire chromosome 10) and complex changes (e.g., the multiple amplifications on chromosome 1 and the multiple deletions on chromosome 4).
- conventional chromosomal CGH identified a highly amplified region on the telomeric end of chromosome arm 2p, apparently covering approximately one-fourth of the whole chromosome.
- EXAMPLE 13 COMPARISON OF SMRT ARRAY WITH PREVIOUS ARRAY CGH
- the colorectal cancer cell line COLO320 (Quinn, L.A., Moore, G.E., Morgan, R.T. & Woods, L.K. Cell lines from human colon carcinoma with unusual cell products, double minutes, and homogeneously staining regions. Cancer Res. 39, 4914-4924 (1979).) was profiled. This cell line has been characterized in two previous array CGH studies (Snijders, A.M. et al. Assembly of microarrays for genome- wide measurement of DNA copy number. Nat. Genet. 29, 263-264 (2001); Wessendorf, S. et al.
- EXAMPLE 14 IDENTIFICATION OF MINUTE REGIONS OF ALTERATION BY SMRT ARRAY ANALYSIS
- Submegabase- sized microdeletions can be accurately mapped in a single whole-genome array CGH experiment. This is made possible by the overlapping clone coverage and their distribution on the array.
- a notable example is a 240-kb deletion at 7q22.3 in the breast cancer cell line BT474, containing PRKAR2B, a regulatory kinase, and HBPl, a Gl inhibitory kinase regulated by p38 MAP kinase (Xiu, M. et al.
- the transcriptional repressor HBPl is a target of the p38 mitogen-activated protein kinase pathway in cell cycle regulation. MoI. Cell. Biol. 23, 8890-8901 (2003).
- Figure 14 shows a further example of a microdeletion detected using the whole genome tiling resolution array. This figure shows a microdeletion at 6q24.3-ter in the HCCl 5 adenocarcinoma cell line. The average array Iog 2 ratio for this deletion was -0.85 versus an ideal expected two copy deletion ratio of -1.0. Previous data confirms a deletion at this locus (Girard, L. et al. Genome-wide allelotyping of lung cancer identifies new regions of allelic loss, differences between small cell lung cancer and non-small cell lung cancer, and loci clustering. Cancer Res. 60, 4894-4906 (2000)).
- Examples 12 -14 show how small, previously unidentified alterations that have the potential to contribute to disease may easily be identified in a single whole genome SMRT array experiment.
- Hybridized slides were imaged using a CCD-based imaging system (Arrayworx eAuto, Applied Precision) and analyzed with SoftWoRx Tracker Spot Analysis software.
- the ratios of the triplicate spots were averaged and standard deviations (s.d.) were calculated. All spots with s.d. >0.075 or signal-to-noise ratios ⁇ 20 were removed from the analysis. Custom viewing software (SeeGH) was used to visualize all data as log 2 ratio plots where each dot represents one BAC.
- EXAMPLE 16 GENERATION OF A HIGH RESOLUTION SMRT SET, SMRT LIBRARY AND SMRT ARRAY REPRESENTING THE 8q21-24 REGION OF THE HUMAN GENOME
- 166 BAC clones were selected from the RPCI-11 library. AU BAC clones were mapped to the tiling assembly of The International Human Genome Mapping Consortium (2001). In addition, clones were referenced to the Nov 2002 assembly of the UCSC Genome Browser based on their end sequences. The centromeric position was estimated based on BLAST search of the termini of the BAC sequence retrieved from GenBank, which may or may not reach the ends, against the UCSC sequence assembly.
- BAC array To construct a BAC array, a contig map of the 8q21-24 region was built using FPC (finger printed contigs) software (available from the Sangerlnstitute website) and the fingerprint data of -400,000 human BAC clones (The International Human Genome Mapping Consortium, 2001). Relative positions of the BACs were confirmed by cross referencing sequence based BAC contig assemblies [the Ensembl Genome Browser (Clamp et al, 2003. Nucleic Acids Research 31:38-42), the NCBI Map Viewer (Wheeler D, et al. 2003. Nucleic Acids Research 31:28-33) and the UCSC Genome Browser (Kent et al., 2002.
- FPC finger printed contigs
- the amplified DNA dissolved in a 20% DMSO solution, was denatured by boiling for 10 min and rearrayed for robotic printing in triplicate using a VersArray Chip Writer Pro (BioRad, Mississauga, ON, Canada) with Stealth Micro Spotting Pins
- Arrays were pre-hybridized at 42°C with DIG Easy hybridization solution (Roche, Mississauga, ON) containing 1% BSA and 2 ⁇ g/ ⁇ L sheared herring sperm DNA. Denatured probes in hybridization buffer containing 6 ⁇ g/ ⁇ L yeast tRNA were applied to the array and hybridized at 42 0 C for 36 hours. Arrays were washed repeatedly with O.lXSSC/0.1% SDS in the dark at room temperature. Hybridized arrays were imaged using a CCD based imaging system and analyzed using the Softworx array analysis program (Arrayworx eAuto, API, Issaquah, WA).
- Spot signal data for each channel was normalized by applying a scale factor which balanced the signal intensities of the human genomic DNA control spots. Additionally, 96 randomly selected clones scattered throughout the genome are included as control spots. The average signal ratios and standard deviation for each triplicate spot set were calculated and displayed as a plot of the normalized cy5/cy3 log 2 signal ratio versus relative tiling path position. A log 2 signal ratio of zero at a spot represents equivalent copy number between a sample and reference DNA. An amplicon was defined as a region of clones with a local average signal ratio above the baseline.
- EXAMPLE 17 USE OF THE HIGH RESOLUTION SMRT ARRAY
- FIG. 16 illustrates three patterns of alteration: no amplification (Fig. 16B); amplification of the entire tiling set (8q21-24) (Fig. 16C); and multiple amplifications at 8q22, separate from 8q24 (Fig. 16D and E). Alterations at 8q22 were confirmed by microsatellite analysis.
- array CGH analysis detected genetic alteration in 18 cases. No alteration in copy number was observed in four of the cases, while five cases showed an increase in copy number for the entire region (all 166 BACs). The remaining 13 cases showed two distinct regions of amplification at both 8q22 and 8q24. The amplification at 8q24 contains the MYC oncogene.
- EXAMPLE 18 USE OF A HIGH RESOLUTION SMRT ARRAY
- Array construction, hybridization and analysis were carried out as described in Example 16.
- One hundred nanograms of microdissected bronchial carcinoma in situ (CIS) sample and reference (normal diploid male) genomic DNA were separately labeled using cyanine 3 and cyanine 5 dCTPs, respectively, and used to probe the arrays.
- CIS microdissected bronchial carcinoma in situ
- CIS 59 shows two distinct regions, one at 8q21 (centered at BAC clone RPl 1- 575C14) and the other at 8q24 (centered at BAC clone RPl 1-382A18).
- CIS 60 shows at least three distinct regions of alteration two of which align with those observed in CIS 59 in addition to a separate region at 8q22 (centered at BAC clone RPl 1-35014).
- the region observed at 8q22 coincides with that observed as an early event in oral cancer progression Garnis C, et al. Novel regions of amplification on 8q distinct from the MYC locus and frequently altered in oral dysplasia and cancer, Genes Chromosomes Cancer 2004 Jan;39(l):93 — 8.
- Amplification on chromosome 8q is a common event in cancer.
- high resolution array CGH as described above allows for the delineation of multiple regions of alteration on 8q in oral tumors.
- EXAMPLE 19 PREPARATION OF A SMRT SET REPRESENTING THE 6ql6-q21 REGION OF THE HUMAN GENOME
- a comprehensive BAC clone map spanning the region 6ql6.2 through 6q21 was constructed, and a minimal tiling set of 43 RP-11 BACs was identified from the fingerprinted contigs database (International Human Genome Mapping Consortium, 2001) (see Table 2). This map was confirmed using the April 2003 version of the sequenced-based UCSC genome browser. Markers used in previous studies for the identification of regions of loss have been mapped onto this resource. A series of 13 BAC clones spaced at intervals across this contig were used as FISH probes to determine the approximate size and location of 6q deletions in the selected follicular lymphoma cases.
- BAC, PAC (Pl-derived artificial chromosome), and YAC (yeast artificial chromosome) clones from additional libraries were also used as FISH probes, for a total of 17 probes.
- Figure 19 shows the relative locations of all FISH probes.
- the 43 RP-11 BAC clones can be used as a SMRT set for the generation of SMRT pools using the method described in Example 6, which are suitable for use as FISH probes (see, Henderson et al, Genes, Chromosomes & Cancer 40:60-65 (2004)).
- EXAMPLE 20 USE OF SMRT POOLS FOR FISH ANALYSIS
- the SMRT pools generated using the methods of the invention can be used as FISH probes (see Figure 20, which shows the results from a standard FISH hybridization protocol. (GibcoBRL Part # YO 1393) using a SMRT pool product from BAC clone 127C12 (Green) and 619019 (Red). Briefly, two microlitres of each SMRT pool ( ⁇ 2 ⁇ g) was labelled by random priming overnight in the presence of 2 nmol of Cy 3- dCTP, Cy5-dCTP (Perkin Elmer), FITC-dUTP, or Texas Red-dUTP using the BioPrime kit (Invitrogen) as per manufacturer's directions.
- the labelled probe was purified using a Sephadex G-50 column, combined with 21 ⁇ g of Cot- 1 DNA and precipitated with ethanol. The labelled probe was then resuspended in 80 ⁇ l of a hybridization buffer containing 50% formamide, 2X SSC, 10% dextran sulfate, 0.1% Tween-20, 10 niM Tris pH 7.4, and denatured for 5 min at 100 0 C.
- the metaphase slide was dehydrated through a series of 70%, 80%, and 100% ethanol washes for 2 min each, denatured in 70% formamide in 0.6X SSC for 2 min at 7O 0 C and processed through the same ethanol series at -20 0 C and allowed to dry. Thirty-five microlitres of probe was then added to the slide and hybridized overnight at 37 0 C. Images were processed with Qcapture (Q-imaging, Vancouver) with a Zeiss Axioscope microscope.
Landscapes
- Genetics & Genomics (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Zoology (AREA)
- General Engineering & Computer Science (AREA)
- Organic Chemistry (AREA)
- Biotechnology (AREA)
- Wood Science & Technology (AREA)
- Biomedical Technology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Plant Pathology (AREA)
- Physics & Mathematics (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CA002570068A CA2570068A1 (fr) | 2003-06-12 | 2004-06-14 | Methodes de preparation d'une bibliotheque de reserves pourvues de plaques ayant une resolution inferieure a une megabase et utilisations |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US32026603P | 2003-06-12 | 2003-06-12 | |
| US60/320,266 | 2003-06-12 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2004111267A2 true WO2004111267A2 (fr) | 2004-12-23 |
| WO2004111267A3 WO2004111267A3 (fr) | 2005-06-09 |
Family
ID=33551152
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CA2004/000859 WO2004111267A2 (fr) | 2003-06-12 | 2004-06-14 | Methodes de preparation d'une bibliotheque de reserves pourvues de plaques ayant une resolution inferieure a une megabase et utilisations |
Country Status (2)
| Country | Link |
|---|---|
| CA (1) | CA2570068A1 (fr) |
| WO (1) | WO2004111267A2 (fr) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| GB2481783A (en) * | 2010-06-04 | 2012-01-11 | Anton Petrov | Data presentation and annotation software for genetic copy number analysis. |
| US8232055B2 (en) | 2002-12-23 | 2012-07-31 | Agilent Technologies, Inc. | Comparative genomic hybridization assays using immobilized oligonucleotide features and compositions for practicing the same |
| US8321138B2 (en) | 2005-07-29 | 2012-11-27 | Agilent Technologies, Inc. | Method of characterizing quality of hybridized CGH arrays |
| CN114842911A (zh) * | 2022-06-21 | 2022-08-02 | 深圳市睿法生物科技有限公司 | 基于精准医疗的基因检测流程的优化方法及装置 |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| AU3875200A (en) * | 1999-03-11 | 2000-09-28 | Orion Genomics, Llc | Genome chips and optical transcript mapping |
| US20030087231A1 (en) * | 2000-05-19 | 2003-05-08 | Albertson Donna G. | Methods and compositions for preparation of a polynucleotide array |
-
2004
- 2004-06-14 CA CA002570068A patent/CA2570068A1/fr not_active Abandoned
- 2004-06-14 WO PCT/CA2004/000859 patent/WO2004111267A2/fr active Application Filing
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8232055B2 (en) | 2002-12-23 | 2012-07-31 | Agilent Technologies, Inc. | Comparative genomic hybridization assays using immobilized oligonucleotide features and compositions for practicing the same |
| US8321138B2 (en) | 2005-07-29 | 2012-11-27 | Agilent Technologies, Inc. | Method of characterizing quality of hybridized CGH arrays |
| GB2481783A (en) * | 2010-06-04 | 2012-01-11 | Anton Petrov | Data presentation and annotation software for genetic copy number analysis. |
| CN114842911A (zh) * | 2022-06-21 | 2022-08-02 | 深圳市睿法生物科技有限公司 | 基于精准医疗的基因检测流程的优化方法及装置 |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2004111267A3 (fr) | 2005-06-09 |
| CA2570068A1 (fr) | 2004-12-23 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CA2804450C (fr) | Strategies de sequencage de la region genomique d'interet v3-d | |
| JP2004524044A (ja) | 制限部位タグ付きマイクロアレイを用いたハイスループットゲノム解析方法 | |
| US20110045462A1 (en) | Digital analysis of gene expression | |
| US20110003301A1 (en) | Methods for detecting genetic variations in dna samples | |
| KR20140024378A (ko) | 조직 샘플 중의 핵산의 국지적 또는 공간적 검출을 위한 방법 및 그 방법의 생성물 | |
| WO2001086003A2 (fr) | Sondes d'acide nucleique a un seul exon derivees du genome humain utiles pour analyser l'expression genique dans le poumon humain | |
| JP2001521754A (ja) | Dna識別のためのプローブアレイ及びプローブアレイの使用方法 | |
| EP2121977A2 (fr) | Capture des chromosomes avec une conformation circulaire | |
| WO2000047767A1 (fr) | Ensemble d'oligonucléotides et ses méthodes d'utilisation | |
| JP2004504059A (ja) | 転写された遺伝子を分析、及び同定するための方法、及びフインガープリント法 | |
| JP2003245072A (ja) | シグナル伝達経路の決定 | |
| US20040023237A1 (en) | Methods for genomic analysis | |
| Khorasani et al. | A first generation physical map of the medaka genome in BACs essential for positional cloning and clone-by-clone based genomic sequencing | |
| AU2005238489A1 (en) | Kits and reagents for use in diagnosis and prognosis of genomic disorders | |
| AU2003275377A1 (en) | Subtelomeric dna probes and method of producing the same | |
| WO2004111267A2 (fr) | Methodes de preparation d'une bibliotheque de reserves pourvues de plaques ayant une resolution inferieure a une megabase et utilisations | |
| US20090263798A1 (en) | Method For Identification Of Novel Physical Linkage Of Genomic Sequences | |
| US20030124542A1 (en) | Methods for mapping the chromosomal loci of genes expressed by a cell | |
| JP2002532070A (ja) | 核酸配列を解析するためのアレイおよび方法 | |
| WO2005079357A2 (fr) | Representations d'acides nucleiques mettant en oeuvre des produits de clivage d'endonucleases de restriction de type iib | |
| US20040029161A1 (en) | Methods for genomic analysis | |
| HK1259754A1 (en) | 3-d genomic region of interest sequencing strategies | |
| JP2004512494A (ja) | ゲノム配列から導き出された機能情報を推定、確認および表示する方法および装置 | |
| HK1259754B (en) | 3-d genomic region of interest sequencing strategies | |
| AU2002307594A1 (en) | Methods for high throughput genome analysis using restriction site tagged microarrays |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
| AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
| 122 | Ep: pct application non-entry in european phase | ||
| WWE | Wipo information: entry into national phase |
Ref document number: 2570068 Country of ref document: CA |