AU2007214360A1 - Whole genome based genetic evaluation and selection process - Google Patents
Whole genome based genetic evaluation and selection process Download PDFInfo
- Publication number
- AU2007214360A1 AU2007214360A1 AU2007214360A AU2007214360A AU2007214360A1 AU 2007214360 A1 AU2007214360 A1 AU 2007214360A1 AU 2007214360 A AU2007214360 A AU 2007214360A AU 2007214360 A AU2007214360 A AU 2007214360A AU 2007214360 A1 AU2007214360 A1 AU 2007214360A1
- Authority
- AU
- Australia
- Prior art keywords
- information
- individual
- population
- individuals
- merit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 title claims description 465
- 230000002068 genetic effect Effects 0.000 title claims description 210
- 230000008569 process Effects 0.000 title claims description 47
- 238000011156 evaluation Methods 0.000 title description 20
- 230000006870 function Effects 0.000 claims description 122
- 239000003550 marker Substances 0.000 claims description 112
- 230000001488 breeding effect Effects 0.000 claims description 110
- 238000009395 breeding Methods 0.000 claims description 104
- 241000283690 Bos taurus Species 0.000 claims description 92
- 230000009467 reduction Effects 0.000 claims description 85
- 238000004458 analytical method Methods 0.000 claims description 75
- 238000000513 principal component analysis Methods 0.000 claims description 70
- 108090000623 proteins and genes Proteins 0.000 claims description 70
- 238000004422 calculation algorithm Methods 0.000 claims description 54
- 239000002773 nucleotide Substances 0.000 claims description 54
- 125000003729 nucleotide group Chemical group 0.000 claims description 54
- 235000013336 milk Nutrition 0.000 claims description 48
- 210000004080 milk Anatomy 0.000 claims description 48
- 239000008267 milk Substances 0.000 claims description 47
- 244000309464 bull Species 0.000 claims description 44
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 42
- 201000010099 disease Diseases 0.000 claims description 36
- 235000013365 dairy product Nutrition 0.000 claims description 35
- 102000004169 proteins and genes Human genes 0.000 claims description 35
- 108020004414 DNA Proteins 0.000 claims description 30
- 210000000481 breast Anatomy 0.000 claims description 27
- 230000004044 response Effects 0.000 claims description 25
- 230000036961 partial effect Effects 0.000 claims description 23
- 210000000988 bone and bone Anatomy 0.000 claims description 19
- 230000035558 fertility Effects 0.000 claims description 19
- 210000001082 somatic cell Anatomy 0.000 claims description 18
- 238000012706 support-vector machine Methods 0.000 claims description 15
- 230000004083 survival effect Effects 0.000 claims description 15
- 230000007613 environmental effect Effects 0.000 claims description 11
- 102000054765 polymorphisms of proteins Human genes 0.000 claims description 11
- 238000013528 artificial neural network Methods 0.000 claims description 10
- 239000003814 drug Substances 0.000 claims description 10
- 238000012217 deletion Methods 0.000 claims description 9
- 238000007894 restriction fragment length polymorphism technique Methods 0.000 claims description 9
- 108091092878 Microsatellite Proteins 0.000 claims description 8
- 230000001973 epigenetic effect Effects 0.000 claims description 8
- 238000012163 sequencing technique Methods 0.000 claims description 8
- 210000000038 chest Anatomy 0.000 claims description 7
- 230000037430 deletion Effects 0.000 claims description 7
- 210000003041 ligament Anatomy 0.000 claims description 7
- 238000003780 insertion Methods 0.000 claims description 6
- 230000037431 insertion Effects 0.000 claims description 6
- 230000035935 pregnancy Effects 0.000 claims description 6
- 239000012634 fragment Substances 0.000 claims description 5
- 230000004049 epigenetic modification Effects 0.000 claims description 4
- 238000007834 ligase chain reaction Methods 0.000 claims description 4
- 238000007621 cluster analysis Methods 0.000 claims description 3
- 239000003053 toxin Substances 0.000 claims description 2
- 231100000765 toxin Toxicity 0.000 claims description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 2
- 241000237519 Bivalvia Species 0.000 claims 1
- 235000020639 clam Nutrition 0.000 claims 1
- 239000004576 sand Substances 0.000 claims 1
- 241001465754 Metazoa Species 0.000 description 185
- 230000000694 effects Effects 0.000 description 65
- 244000144972 livestock Species 0.000 description 58
- 241000282414 Homo sapiens Species 0.000 description 57
- 238000012360 testing method Methods 0.000 description 51
- 235000015278 beef Nutrition 0.000 description 44
- 239000000047 product Substances 0.000 description 44
- 238000012549 training Methods 0.000 description 39
- 239000011159 matrix material Substances 0.000 description 38
- 108700028369 Alleles Proteins 0.000 description 35
- 210000000349 chromosome Anatomy 0.000 description 35
- 235000013372 meat Nutrition 0.000 description 32
- 102000054766 genetic haplotypes Human genes 0.000 description 30
- 239000000523 sample Substances 0.000 description 27
- 238000004519 manufacturing process Methods 0.000 description 25
- 206010028980 Neoplasm Diseases 0.000 description 24
- 201000011510 cancer Diseases 0.000 description 23
- 235000019197 fats Nutrition 0.000 description 23
- 108020004707 nucleic acids Proteins 0.000 description 23
- 102000039446 nucleic acids Human genes 0.000 description 23
- 150000007523 nucleic acids Chemical class 0.000 description 23
- 238000010200 validation analysis Methods 0.000 description 23
- 241000283073 Equus caballus Species 0.000 description 22
- 210000004027 cell Anatomy 0.000 description 22
- 241000894007 species Species 0.000 description 22
- 238000013459 approach Methods 0.000 description 21
- 241000282472 Canis lupus familiaris Species 0.000 description 19
- 241000283086 Equidae Species 0.000 description 19
- 230000000996 additive effect Effects 0.000 description 19
- 238000002790 cross-validation Methods 0.000 description 19
- 230000000875 corresponding effect Effects 0.000 description 18
- 238000005259 measurement Methods 0.000 description 18
- 241000196324 Embryophyta Species 0.000 description 17
- 230000012010 growth Effects 0.000 description 17
- 241000699670 Mus sp. Species 0.000 description 15
- 239000000090 biomarker Substances 0.000 description 15
- 239000013598 vector Substances 0.000 description 15
- 238000010586 diagram Methods 0.000 description 14
- 241001494479 Pecora Species 0.000 description 13
- 230000006872 improvement Effects 0.000 description 13
- 238000010367 cloning Methods 0.000 description 12
- 206010012601 diabetes mellitus Diseases 0.000 description 12
- 238000009826 distribution Methods 0.000 description 12
- 238000003205 genotyping method Methods 0.000 description 12
- 230000006651 lactation Effects 0.000 description 12
- 238000003752 polymerase chain reaction Methods 0.000 description 12
- 230000001850 reproductive effect Effects 0.000 description 12
- 241000282887 Suidae Species 0.000 description 11
- 230000035611 feeding Effects 0.000 description 11
- 244000144980 herd Species 0.000 description 11
- 230000001965 increasing effect Effects 0.000 description 11
- 239000000654 additive Substances 0.000 description 10
- 238000001514 detection method Methods 0.000 description 10
- 241000287828 Gallus gallus Species 0.000 description 9
- 230000003321 amplification Effects 0.000 description 9
- 206010002026 amyotrophic lateral sclerosis Diseases 0.000 description 9
- 235000013330 chicken meat Nutrition 0.000 description 9
- 235000005911 diet Nutrition 0.000 description 9
- 230000037213 diet Effects 0.000 description 9
- 238000003199 nucleic acid amplification method Methods 0.000 description 9
- 238000007619 statistical method Methods 0.000 description 9
- 108091028043 Nucleic acid sequence Proteins 0.000 description 8
- 229940079593 drug Drugs 0.000 description 8
- 230000036541 health Effects 0.000 description 8
- 230000003993 interaction Effects 0.000 description 8
- 210000003205 muscle Anatomy 0.000 description 8
- 238000010187 selection method Methods 0.000 description 8
- 210000001519 tissue Anatomy 0.000 description 8
- 241000699666 Mus <mouse, genus> Species 0.000 description 7
- 241000282898 Sus scrofa Species 0.000 description 7
- 150000001413 amino acids Chemical class 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 7
- 244000309465 heifer Species 0.000 description 7
- 238000011458 pharmacological treatment Methods 0.000 description 7
- 230000002829 reductive effect Effects 0.000 description 7
- 238000005070 sampling Methods 0.000 description 7
- 241000282412 Homo Species 0.000 description 6
- 208000026350 Inborn Genetic disease Diseases 0.000 description 6
- 230000008901 benefit Effects 0.000 description 6
- 230000007423 decrease Effects 0.000 description 6
- 208000022602 disease susceptibility Diseases 0.000 description 6
- 208000016361 genetic disease Diseases 0.000 description 6
- 238000011068 loading method Methods 0.000 description 6
- 210000001161 mammalian embryo Anatomy 0.000 description 6
- 239000000463 material Substances 0.000 description 6
- 230000013011 mating Effects 0.000 description 6
- 239000000203 mixture Substances 0.000 description 6
- 230000035772 mutation Effects 0.000 description 6
- 238000012856 packing Methods 0.000 description 6
- 238000011160 research Methods 0.000 description 6
- 208000002491 severe combined immunodeficiency Diseases 0.000 description 6
- 241000972773 Aulopiformes Species 0.000 description 5
- 208000018737 Parkinson disease Diseases 0.000 description 5
- 210000004369 blood Anatomy 0.000 description 5
- 239000008280 blood Substances 0.000 description 5
- 238000010888 cage effect Methods 0.000 description 5
- 230000003047 cage effect Effects 0.000 description 5
- 239000011575 calcium Substances 0.000 description 5
- 230000008859 change Effects 0.000 description 5
- 238000003745 diagnosis Methods 0.000 description 5
- 208000035475 disorder Diseases 0.000 description 5
- 210000002683 foot Anatomy 0.000 description 5
- 230000001976 improved effect Effects 0.000 description 5
- 238000012423 maintenance Methods 0.000 description 5
- 208000004396 mastitis Diseases 0.000 description 5
- 238000010238 partial least squares regression Methods 0.000 description 5
- 230000003234 polygenic effect Effects 0.000 description 5
- 230000006798 recombination Effects 0.000 description 5
- 238000005215 recombination Methods 0.000 description 5
- 235000019515 salmon Nutrition 0.000 description 5
- 238000012216 screening Methods 0.000 description 5
- 210000003371 toe Anatomy 0.000 description 5
- 238000012546 transfer Methods 0.000 description 5
- 241000271566 Aves Species 0.000 description 4
- OYPRJOBELJOOCE-UHFFFAOYSA-N Calcium Chemical compound [Ca] OYPRJOBELJOOCE-UHFFFAOYSA-N 0.000 description 4
- 208000035240 Disease Resistance Diseases 0.000 description 4
- 206010020772 Hypertension Diseases 0.000 description 4
- XSQUKJJJFZCRTK-UHFFFAOYSA-N Urea Chemical compound NC(N)=O XSQUKJJJFZCRTK-UHFFFAOYSA-N 0.000 description 4
- 239000003674 animal food additive Substances 0.000 description 4
- 239000003242 anti bacterial agent Substances 0.000 description 4
- 229940088710 antibiotic agent Drugs 0.000 description 4
- 229910052791 calcium Inorganic materials 0.000 description 4
- 238000010276 construction Methods 0.000 description 4
- 230000002596 correlated effect Effects 0.000 description 4
- 230000008021 deposition Effects 0.000 description 4
- 230000001079 digestive effect Effects 0.000 description 4
- 210000002257 embryonic structure Anatomy 0.000 description 4
- 235000012631 food intake Nutrition 0.000 description 4
- 230000037406 food intake Effects 0.000 description 4
- 239000005556 hormone Substances 0.000 description 4
- 229940088597 hormone Drugs 0.000 description 4
- 238000009396 hybridization Methods 0.000 description 4
- 238000000338 in vitro Methods 0.000 description 4
- 208000015181 infectious disease Diseases 0.000 description 4
- 230000000670 limiting effect Effects 0.000 description 4
- 230000002503 metabolic effect Effects 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 244000144977 poultry Species 0.000 description 4
- 235000013594 poultry meat Nutrition 0.000 description 4
- 108090000765 processed proteins & peptides Proteins 0.000 description 4
- 235000020989 red meat Nutrition 0.000 description 4
- 206010039073 rheumatoid arthritis Diseases 0.000 description 4
- 238000003860 storage Methods 0.000 description 4
- 230000035882 stress Effects 0.000 description 4
- 208000001072 type 2 diabetes mellitus Diseases 0.000 description 4
- 238000002604 ultrasonography Methods 0.000 description 4
- 229960005486 vaccine Drugs 0.000 description 4
- 102100035037 Calpastatin Human genes 0.000 description 3
- 241000238557 Decapoda Species 0.000 description 3
- 101000693993 Homo sapiens Sodium channel protein type 4 subunit alpha Proteins 0.000 description 3
- 208000007599 Hyperkalemic periodic paralysis Diseases 0.000 description 3
- 238000003657 Likelihood-ratio test Methods 0.000 description 3
- 241000124008 Mammalia Species 0.000 description 3
- 241000700159 Rattus Species 0.000 description 3
- 241000607142 Salmonella Species 0.000 description 3
- 102100027195 Sodium channel protein type 4 subunit alpha Human genes 0.000 description 3
- 241000209140 Triticum Species 0.000 description 3
- VREFGVBLTWBCJP-UHFFFAOYSA-N alprazolam Chemical compound C12=CC(Cl)=CC=C2N2C(C)=NN=C2CN=C1C1=CC=CC=C1 VREFGVBLTWBCJP-UHFFFAOYSA-N 0.000 description 3
- 238000009360 aquaculture Methods 0.000 description 3
- 244000144974 aquaculture Species 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 108010044208 calpastatin Proteins 0.000 description 3
- ZXJCOYBPXOBJMU-HSQGJUDPSA-N calpastatin peptide Ac 184-210 Chemical compound C([C@@H](C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)NCC(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N1[C@@H](CCC1)C(=O)N1[C@@H](CCC1)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC=1C=CC(O)=CC=1)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](C)C(N)=O)NC(=O)[C@@H](NC(=O)[C@H](CO)NC(=O)[C@H](CO)NC(=O)[C@H](CCSC)NC(=O)[C@H]1N(CCC1)C(=O)[C@H](CC(O)=O)NC(C)=O)[C@@H](C)O)C1=CC=C(O)C=C1 ZXJCOYBPXOBJMU-HSQGJUDPSA-N 0.000 description 3
- 230000001413 cellular effect Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 230000001186 cumulative effect Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 235000021045 dietary change Nutrition 0.000 description 3
- 235000014113 dietary fatty acids Nutrition 0.000 description 3
- 210000002969 egg yolk Anatomy 0.000 description 3
- 230000002922 epistatic effect Effects 0.000 description 3
- 229930195729 fatty acid Natural products 0.000 description 3
- 239000000194 fatty acid Substances 0.000 description 3
- 150000004665 fatty acids Chemical class 0.000 description 3
- 235000021050 feed intake Nutrition 0.000 description 3
- 238000001914 filtration Methods 0.000 description 3
- 235000019688 fish Nutrition 0.000 description 3
- 230000007614 genetic variation Effects 0.000 description 3
- 238000003306 harvesting Methods 0.000 description 3
- 230000009027 insemination Effects 0.000 description 3
- 230000002452 interceptive effect Effects 0.000 description 3
- 201000010901 lateral sclerosis Diseases 0.000 description 3
- 210000004072 lung Anatomy 0.000 description 3
- 230000008774 maternal effect Effects 0.000 description 3
- 239000003607 modifier Substances 0.000 description 3
- 238000010369 molecular cloning Methods 0.000 description 3
- 201000006417 multiple sclerosis Diseases 0.000 description 3
- 235000016709 nutrition Nutrition 0.000 description 3
- 210000000287 oocyte Anatomy 0.000 description 3
- 230000016087 ovulation Effects 0.000 description 3
- 210000004681 ovum Anatomy 0.000 description 3
- 238000012628 principal component regression Methods 0.000 description 3
- 238000011946 reduction process Methods 0.000 description 3
- 230000000241 respiratory effect Effects 0.000 description 3
- 210000000582 semen Anatomy 0.000 description 3
- 238000004088 simulation Methods 0.000 description 3
- 238000003307 slaughter Methods 0.000 description 3
- 238000010374 somatic cell nuclear transfer Methods 0.000 description 3
- 230000009897 systematic effect Effects 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 239000002023 wood Substances 0.000 description 3
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 description 2
- GVJHHUAWPYXKBD-UHFFFAOYSA-N (±)-α-Tocopherol Chemical compound OC1=C(C)C(C)=C2OC(CCCC(C)CCCC(C)CCCC(C)C)(C)CCC2=C1C GVJHHUAWPYXKBD-UHFFFAOYSA-N 0.000 description 2
- 241000251468 Actinopterygii Species 0.000 description 2
- 241000272517 Anseriformes Species 0.000 description 2
- 241000723736 Black beetle virus Species 0.000 description 2
- 102000007590 Calpain Human genes 0.000 description 2
- 108010032088 Calpain Proteins 0.000 description 2
- 241000282832 Camelidae Species 0.000 description 2
- 241000283707 Capra Species 0.000 description 2
- CURLTUGMZLYLDI-UHFFFAOYSA-N Carbon dioxide Chemical compound O=C=O CURLTUGMZLYLDI-UHFFFAOYSA-N 0.000 description 2
- 241000238424 Crustacea Species 0.000 description 2
- 230000007067 DNA methylation Effects 0.000 description 2
- 108010000912 Egg Proteins Proteins 0.000 description 2
- 102000002322 Egg Proteins Human genes 0.000 description 2
- 108090000790 Enzymes Proteins 0.000 description 2
- 102000004190 Enzymes Human genes 0.000 description 2
- 241001331845 Equus asinus x caballus Species 0.000 description 2
- 241000282326 Felis catus Species 0.000 description 2
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 2
- 238000007476 Maximum Likelihood Methods 0.000 description 2
- 102000004472 Myostatin Human genes 0.000 description 2
- 108010056852 Myostatin Proteins 0.000 description 2
- 108091034117 Oligonucleotide Proteins 0.000 description 2
- 102000043276 Oncogene Human genes 0.000 description 2
- 108700020796 Oncogene Proteins 0.000 description 2
- 241000209094 Oryza Species 0.000 description 2
- 235000007164 Oryza sativa Nutrition 0.000 description 2
- 235000008331 Pinus X rigitaeda Nutrition 0.000 description 2
- 235000011613 Pinus brutia Nutrition 0.000 description 2
- 241000018646 Pinus brutia Species 0.000 description 2
- 102000029797 Prion Human genes 0.000 description 2
- 108091000054 Prion Proteins 0.000 description 2
- RJKFOVLPORLFTN-LEKSSAKUSA-N Progesterone Chemical compound C1CC2=CC(=O)CC[C@]2(C)[C@@H]2[C@@H]1[C@@H]1CC[C@H](C(=O)C)[C@@]1(C)CC2 RJKFOVLPORLFTN-LEKSSAKUSA-N 0.000 description 2
- 241000277331 Salmonidae Species 0.000 description 2
- 108700025695 Suppressor Genes Proteins 0.000 description 2
- IQFYYKKMVGJFEH-XLPZGREQSA-N Thymidine Chemical compound O=C1NC(=O)C(C)=CN1[C@@H]1O[C@H](CO)[C@@H](O)C1 IQFYYKKMVGJFEH-XLPZGREQSA-N 0.000 description 2
- 235000021307 Triticum Nutrition 0.000 description 2
- 241000251539 Vertebrata <Metazoa> Species 0.000 description 2
- 240000008042 Zea mays Species 0.000 description 2
- 235000016383 Zea mays subsp huehuetenangensis Nutrition 0.000 description 2
- 235000002017 Zea mays subsp mays Nutrition 0.000 description 2
- 230000032683 aging Effects 0.000 description 2
- 238000003975 animal breeding Methods 0.000 description 2
- 230000003466 anti-cipated effect Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 238000003556 assay Methods 0.000 description 2
- 230000000386 athletic effect Effects 0.000 description 2
- 230000037147 athletic performance Effects 0.000 description 2
- 235000021052 average daily weight gain Nutrition 0.000 description 2
- 230000036772 blood pressure Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 239000004202 carbamide Substances 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000019113 chromatin silencing Effects 0.000 description 2
- 238000010372 cloning stem cell Methods 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 238000000205 computational method Methods 0.000 description 2
- 238000005094 computer simulation Methods 0.000 description 2
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 2
- 230000006378 damage Effects 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000013401 experimental design Methods 0.000 description 2
- 210000003414 extremity Anatomy 0.000 description 2
- 210000003608 fece Anatomy 0.000 description 2
- 230000004720 fertilization Effects 0.000 description 2
- 239000012530 fluid Substances 0.000 description 2
- 239000007850 fluorescent dye Substances 0.000 description 2
- 238000012252 genetic analysis Methods 0.000 description 2
- 239000008103 glucose Substances 0.000 description 2
- UYTPUPDQBNUYGX-UHFFFAOYSA-N guanine Chemical compound O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 description 2
- 210000004124 hock Anatomy 0.000 description 2
- 210000000003 hoof Anatomy 0.000 description 2
- 210000000987 immune system Anatomy 0.000 description 2
- 239000007943 implant Substances 0.000 description 2
- 238000007918 intramuscular administration Methods 0.000 description 2
- 238000002955 isolation Methods 0.000 description 2
- 208000017169 kidney disease Diseases 0.000 description 2
- 238000012886 linear function Methods 0.000 description 2
- 239000010871 livestock manure Substances 0.000 description 2
- 235000009973 maize Nutrition 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 208000030159 metabolic disease Diseases 0.000 description 2
- 239000002207 metabolite Substances 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 210000004940 nucleus Anatomy 0.000 description 2
- 235000015097 nutrients Nutrition 0.000 description 2
- 230000035764 nutrition Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 210000000056 organ Anatomy 0.000 description 2
- 244000052769 pathogen Species 0.000 description 2
- 208000029308 periodic paralysis Diseases 0.000 description 2
- 229920001184 polypeptide Polymers 0.000 description 2
- 230000032361 posttranscriptional gene silencing Effects 0.000 description 2
- 102000004196 processed proteins & peptides Human genes 0.000 description 2
- 210000002307 prostate Anatomy 0.000 description 2
- 238000003908 quality control method Methods 0.000 description 2
- 230000003938 response to stress Effects 0.000 description 2
- 230000000284 resting effect Effects 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 235000009566 rice Nutrition 0.000 description 2
- 201000000306 sarcoidosis Diseases 0.000 description 2
- 230000003248 secreting effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 210000000130 stem cell Anatomy 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000009182 swimming Effects 0.000 description 2
- 238000012033 transcriptional gene silencing Methods 0.000 description 2
- 238000011282 treatment Methods 0.000 description 2
- 210000003462 vein Anatomy 0.000 description 2
- 239000011782 vitamin Substances 0.000 description 2
- 229930003231 vitamin Natural products 0.000 description 2
- 235000013343 vitamin Nutrition 0.000 description 2
- 229940088594 vitamin Drugs 0.000 description 2
- 150000003722 vitamin derivatives Chemical class 0.000 description 2
- 210000002268 wool Anatomy 0.000 description 2
- SNICXCGAKADSCV-JTQLQIEISA-N (-)-Nicotine Chemical compound CN1CCC[C@H]1C1=CC=CN=C1 SNICXCGAKADSCV-JTQLQIEISA-N 0.000 description 1
- IIZPXYDJLKNOIY-JXPKJXOSSA-N 1-palmitoyl-2-arachidonoyl-sn-glycero-3-phosphocholine Chemical compound CCCCCCCCCCCCCCCC(=O)OC[C@H](COP([O-])(=O)OCC[N+](C)(C)C)OC(=O)CCC\C=C/C\C=C/C\C=C/C\C=C/CCCCC IIZPXYDJLKNOIY-JXPKJXOSSA-N 0.000 description 1
- IJJWOSAXNHWBPR-HUBLWGQQSA-N 5-[(3as,4s,6ar)-2-oxo-1,3,3a,4,6,6a-hexahydrothieno[3,4-d]imidazol-4-yl]-n-(6-hydrazinyl-6-oxohexyl)pentanamide Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)NCCCCCC(=O)NN)SC[C@@H]21 IJJWOSAXNHWBPR-HUBLWGQQSA-N 0.000 description 1
- 208000004998 Abdominal Pain Diseases 0.000 description 1
- 229930024421 Adenine Natural products 0.000 description 1
- GFFGJBXGBJISGV-UHFFFAOYSA-N Adenine Chemical compound NC1=NC=NC2=C1N=CN2 GFFGJBXGBJISGV-UHFFFAOYSA-N 0.000 description 1
- 206010067484 Adverse reaction Diseases 0.000 description 1
- 108010088751 Albumins Proteins 0.000 description 1
- 102000009027 Albumins Human genes 0.000 description 1
- 208000007848 Alcoholism Diseases 0.000 description 1
- 241001136792 Alle Species 0.000 description 1
- 235000002198 Annona diversifolia Nutrition 0.000 description 1
- 201000001320 Atherosclerosis Diseases 0.000 description 1
- 208000023275 Autoimmune disease Diseases 0.000 description 1
- 108090001008 Avidin Proteins 0.000 description 1
- 208000035143 Bacterial infection Diseases 0.000 description 1
- 238000000846 Bartlett's test Methods 0.000 description 1
- DWRXFEITVBNRMK-UHFFFAOYSA-N Beta-D-1-Arabinofuranosylthymine Natural products O=C1NC(=O)C(C)=CN1C1C(O)C(O)C(CO)O1 DWRXFEITVBNRMK-UHFFFAOYSA-N 0.000 description 1
- 101100452236 Caenorhabditis elegans inf-1 gene Proteins 0.000 description 1
- 101100180402 Caenorhabditis elegans jun-1 gene Proteins 0.000 description 1
- 208000024172 Cardiovascular disease Diseases 0.000 description 1
- 241000282994 Cervidae Species 0.000 description 1
- 108091006146 Channels Proteins 0.000 description 1
- 208000006545 Chronic Obstructive Pulmonary Disease Diseases 0.000 description 1
- 208000002881 Colic Diseases 0.000 description 1
- 241001605679 Colotis Species 0.000 description 1
- 241000777300 Congiopodidae Species 0.000 description 1
- 241000699800 Cricetinae Species 0.000 description 1
- 241000938605 Crocodylia Species 0.000 description 1
- 208000011231 Crohn disease Diseases 0.000 description 1
- 241000252233 Cyprinus carpio Species 0.000 description 1
- 102000053602 DNA Human genes 0.000 description 1
- 230000006820 DNA synthesis Effects 0.000 description 1
- 241000252212 Danio rerio Species 0.000 description 1
- 206010073767 Developmental hip dysplasia Diseases 0.000 description 1
- 208000002249 Diabetes Complications Diseases 0.000 description 1
- 201000009273 Endometriosis Diseases 0.000 description 1
- 241000588724 Escherichia coli Species 0.000 description 1
- 241000289695 Eutheria Species 0.000 description 1
- 208000007882 Gastritis Diseases 0.000 description 1
- 108700039691 Genetic Promoter Regions Proteins 0.000 description 1
- 241000699694 Gerbillinae Species 0.000 description 1
- GVGLGOZIDCSQPN-PVHGPHFFSA-N Heroin Chemical compound O([C@H]1[C@H](C=C[C@H]23)OC(C)=O)C4=C5[C@@]12CCN(C)[C@@H]3CC5=CC=C4OC(C)=O GVGLGOZIDCSQPN-PVHGPHFFSA-N 0.000 description 1
- 208000007446 Hip Dislocation Diseases 0.000 description 1
- 101000599951 Homo sapiens Insulin-like growth factor I Proteins 0.000 description 1
- 101000693844 Homo sapiens Insulin-like growth factor-binding protein complex acid labile subunit Proteins 0.000 description 1
- 101000650863 Homo sapiens SH2 domain-containing protein 1A Proteins 0.000 description 1
- 206010020751 Hypersensitivity Diseases 0.000 description 1
- 206010061218 Inflammation Diseases 0.000 description 1
- 208000006877 Insect Bites and Stings Diseases 0.000 description 1
- 102100037852 Insulin-like growth factor I Human genes 0.000 description 1
- 241000861223 Issus Species 0.000 description 1
- 241001527806 Iti Species 0.000 description 1
- 208000012659 Joint disease Diseases 0.000 description 1
- 241001288024 Lagascea mollis Species 0.000 description 1
- 241000282838 Lama Species 0.000 description 1
- 241000269779 Lates calcarifer Species 0.000 description 1
- 241000186779 Listeria monocytogenes Species 0.000 description 1
- 206010024641 Listeriosis Diseases 0.000 description 1
- 206010024652 Liver abscess Diseases 0.000 description 1
- 206010058467 Lung neoplasm malignant Diseases 0.000 description 1
- 235000007688 Lycopersicon esculentum Nutrition 0.000 description 1
- 241000282560 Macaca mulatta Species 0.000 description 1
- 208000036626 Mental retardation Diseases 0.000 description 1
- 108020005196 Mitochondrial DNA Proteins 0.000 description 1
- 208000007101 Muscle Cramp Diseases 0.000 description 1
- 208000029578 Muscle disease Diseases 0.000 description 1
- 208000012902 Nervous system disease Diseases 0.000 description 1
- 208000025966 Neurological disease Diseases 0.000 description 1
- 206010057852 Nicotine dependence Diseases 0.000 description 1
- 108091093105 Nuclear DNA Proteins 0.000 description 1
- 208000008589 Obesity Diseases 0.000 description 1
- 241000277275 Oncorhynchus mykiss Species 0.000 description 1
- 241000283973 Oryctolagus cuniculus Species 0.000 description 1
- 241000237502 Ostreidae Species 0.000 description 1
- 241000283898 Ovis Species 0.000 description 1
- 241000282577 Pan troglodytes Species 0.000 description 1
- 241001504519 Papio ursinus Species 0.000 description 1
- 206010033799 Paralysis Diseases 0.000 description 1
- 241000286209 Phasianidae Species 0.000 description 1
- 235000008566 Pinus taeda Nutrition 0.000 description 1
- 241000218679 Pinus taeda Species 0.000 description 1
- 241000219000 Populus Species 0.000 description 1
- 206010060862 Prostate cancer Diseases 0.000 description 1
- 208000000236 Prostatic Neoplasms Diseases 0.000 description 1
- 101710130181 Protochlorophyllide reductase A, chloroplastic Proteins 0.000 description 1
- 108700005075 Regulator Genes Proteins 0.000 description 1
- 108091081062 Repeated sequence (DNA) Proteins 0.000 description 1
- 238000012952 Resampling Methods 0.000 description 1
- 206010039020 Rhabdomyolysis Diseases 0.000 description 1
- 102100027720 SH2 domain-containing protein 1A Human genes 0.000 description 1
- 206010039438 Salmonella Infections Diseases 0.000 description 1
- 108010052164 Sodium Channels Proteins 0.000 description 1
- 102000018674 Sodium Channels Human genes 0.000 description 1
- 240000003768 Solanum lycopersicum Species 0.000 description 1
- 208000005392 Spasm Diseases 0.000 description 1
- 201000002661 Spondylitis Diseases 0.000 description 1
- 241000862969 Stella Species 0.000 description 1
- 208000007107 Stomach Ulcer Diseases 0.000 description 1
- 238000000692 Student's t-test Methods 0.000 description 1
- 241001441724 Tetraodontidae Species 0.000 description 1
- 208000024799 Thyroid disease Diseases 0.000 description 1
- 241000276707 Tilapia Species 0.000 description 1
- 208000025569 Tobacco Use disease Diseases 0.000 description 1
- 238000011497 Univariate linear regression Methods 0.000 description 1
- 208000012886 Vertigo Diseases 0.000 description 1
- 241001416177 Vicugna pacos Species 0.000 description 1
- 241000282840 Vicugna vicugna Species 0.000 description 1
- 108700005077 Viral Genes Proteins 0.000 description 1
- 229930003427 Vitamin E Natural products 0.000 description 1
- 208000027418 Wounds and injury Diseases 0.000 description 1
- 101710086987 X protein Proteins 0.000 description 1
- 238000010521 absorption reaction Methods 0.000 description 1
- 239000002253 acid Substances 0.000 description 1
- 150000007513 acids Chemical class 0.000 description 1
- 229960000643 adenine Drugs 0.000 description 1
- 210000000577 adipose tissue Anatomy 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 230000006838 adverse reaction Effects 0.000 description 1
- 201000007930 alcohol dependence Diseases 0.000 description 1
- 208000026935 allergic disease Diseases 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 210000004102 animal cell Anatomy 0.000 description 1
- 238000010171 animal model Methods 0.000 description 1
- 230000003042 antagnostic effect Effects 0.000 description 1
- 239000000427 antigen Substances 0.000 description 1
- 108091007433 antigens Proteins 0.000 description 1
- 102000036639 antigens Human genes 0.000 description 1
- 206010003246 arthritis Diseases 0.000 description 1
- 238000002820 assay format Methods 0.000 description 1
- 101150036080 at gene Proteins 0.000 description 1
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 description 1
- 230000001363 autoimmune Effects 0.000 description 1
- 208000033460 autosomal dominant susceptibility to Parkinson disease 11 Diseases 0.000 description 1
- 208000022362 bacterial infectious disease Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- WQZGKKKJIJFFOK-VFUOTHLCSA-N beta-D-glucose Chemical compound OC[C@H]1O[C@@H](O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-VFUOTHLCSA-N 0.000 description 1
- IQFYYKKMVGJFEH-UHFFFAOYSA-N beta-L-thymidine Natural products O=C1NC(=O)C(C)=CN1C1OC(CO)C(O)C1 IQFYYKKMVGJFEH-UHFFFAOYSA-N 0.000 description 1
- 239000013060 biological fluid Substances 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 239000012472 biological sample Substances 0.000 description 1
- 238000001574 biopsy Methods 0.000 description 1
- 229960002685 biotin Drugs 0.000 description 1
- 235000020958 biotin Nutrition 0.000 description 1
- 239000011616 biotin Substances 0.000 description 1
- 230000000740 bleeding effect Effects 0.000 description 1
- 229910002092 carbon dioxide Inorganic materials 0.000 description 1
- 239000001569 carbon dioxide Substances 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 241001233037 catfish Species 0.000 description 1
- 238000010370 cell cloning Methods 0.000 description 1
- 210000003855 cell nucleus Anatomy 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 201000001883 cholelithiasis Diseases 0.000 description 1
- 230000002759 chromosomal effect Effects 0.000 description 1
- 210000001072 colon Anatomy 0.000 description 1
- 239000013065 commercial product Substances 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000002860 competitive effect Effects 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 238000010411 cooking Methods 0.000 description 1
- 229940104302 cytosine Drugs 0.000 description 1
- 230000034994 death Effects 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 230000032671 dosage compensation Effects 0.000 description 1
- 230000008482 dysregulation Effects 0.000 description 1
- 230000002526 effect on cardiovascular system Effects 0.000 description 1
- 230000037149 energy metabolism Effects 0.000 description 1
- 230000008995 epigenetic change Effects 0.000 description 1
- 230000007608 epigenetic mechanism Effects 0.000 description 1
- 210000003743 erythrocyte Anatomy 0.000 description 1
- 230000003090 exacerbative effect Effects 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 231100000502 fertility decrease Toxicity 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 239000000796 flavoring agent Substances 0.000 description 1
- 235000019634 flavors Nutrition 0.000 description 1
- 244000144992 flock Species 0.000 description 1
- 238000000684 flow cytometry Methods 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 239000013505 freshwater Substances 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000005021 gait Effects 0.000 description 1
- WIGCFUFOHFEKBI-UHFFFAOYSA-N gamma-tocopherol Natural products CC(C)CCCC(C)CCCC(C)CCCC1CCC2C(C)C(O)C(C)C(C)C2O1 WIGCFUFOHFEKBI-UHFFFAOYSA-N 0.000 description 1
- 239000007789 gas Substances 0.000 description 1
- 230000002496 gastric effect Effects 0.000 description 1
- 230000030279 gene silencing Effects 0.000 description 1
- 238000012268 genome sequencing Methods 0.000 description 1
- 230000000762 glandular Effects 0.000 description 1
- 210000003780 hair follicle Anatomy 0.000 description 1
- 210000003128 head Anatomy 0.000 description 1
- 208000014829 head and neck neoplasm Diseases 0.000 description 1
- 230000003862 health status Effects 0.000 description 1
- 208000019622 heart disease Diseases 0.000 description 1
- 238000005534 hematocrit Methods 0.000 description 1
- 238000003898 horticulture Methods 0.000 description 1
- 230000009610 hypersensitivity Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 210000001822 immobilized cell Anatomy 0.000 description 1
- 208000026278 immune system disease Diseases 0.000 description 1
- 238000000126 in silico method Methods 0.000 description 1
- 238000009399 inbreeding Methods 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 230000004054 inflammatory process Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 208000014674 injury Diseases 0.000 description 1
- 229940079322 interferon Drugs 0.000 description 1
- UEXQBEVWFZKHNB-UHFFFAOYSA-N intermediate 29 Natural products C1=CC(N)=CC=C1NC1=NC=CC=N1 UEXQBEVWFZKHNB-UHFFFAOYSA-N 0.000 description 1
- 238000012432 intermediate storage Methods 0.000 description 1
- 150000002500 ions Chemical class 0.000 description 1
- 230000009916 joint effect Effects 0.000 description 1
- 210000003734 kidney Anatomy 0.000 description 1
- 238000011005 laboratory method Methods 0.000 description 1
- 208000030175 lameness Diseases 0.000 description 1
- 235000020997 lean meat Nutrition 0.000 description 1
- 229940067606 lecithin Drugs 0.000 description 1
- 239000000787 lecithin Substances 0.000 description 1
- 235000010445 lecithin Nutrition 0.000 description 1
- 210000003141 lower extremity Anatomy 0.000 description 1
- 201000005202 lung cancer Diseases 0.000 description 1
- 208000020816 lung neoplasm Diseases 0.000 description 1
- 230000036244 malformation Effects 0.000 description 1
- 208000026037 malignant tumor of neck Diseases 0.000 description 1
- 210000004962 mammalian cell Anatomy 0.000 description 1
- 239000004579 marble Substances 0.000 description 1
- 238000000691 measurement method Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000001404 mediated effect Effects 0.000 description 1
- 230000011987 methylation Effects 0.000 description 1
- 238000007069 methylation reaction Methods 0.000 description 1
- 238000002493 microarray Methods 0.000 description 1
- 238000012775 microarray technology Methods 0.000 description 1
- 208000005264 motor neuron disease Diseases 0.000 description 1
- 238000000491 multivariate analysis Methods 0.000 description 1
- 238000002703 mutagenesis Methods 0.000 description 1
- 231100000350 mutagenesis Toxicity 0.000 description 1
- 230000002107 myocardial effect Effects 0.000 description 1
- JOUIQRNQJGXQDC-AXTSPUMRSA-N namn Chemical compound O1[C@@H](COP(O)([O-])=O)[C@H](O)[C@@H](O)[C@@H]1[N+]1=CC=CC(C(O)=O)=C1 JOUIQRNQJGXQDC-AXTSPUMRSA-N 0.000 description 1
- 210000003739 neck Anatomy 0.000 description 1
- 229960002715 nicotine Drugs 0.000 description 1
- SNICXCGAKADSCV-UHFFFAOYSA-N nicotine Natural products CN1CCCC1C1=CC=CN=C1 SNICXCGAKADSCV-UHFFFAOYSA-N 0.000 description 1
- QJGQUHMNIGDVPM-UHFFFAOYSA-N nitrogen group Chemical group [N] QJGQUHMNIGDVPM-UHFFFAOYSA-N 0.000 description 1
- 238000010449 nuclear transplantation Methods 0.000 description 1
- 238000007899 nucleic acid hybridization Methods 0.000 description 1
- 238000001668 nucleic acid synthesis Methods 0.000 description 1
- 235000020824 obesity Nutrition 0.000 description 1
- 238000002515 oligonucleotide synthesis Methods 0.000 description 1
- 239000005416 organic matter Substances 0.000 description 1
- 229910052760 oxygen Inorganic materials 0.000 description 1
- 239000001301 oxygen Substances 0.000 description 1
- 239000003973 paint Substances 0.000 description 1
- 244000045947 parasite Species 0.000 description 1
- 230000008775 paternal effect Effects 0.000 description 1
- 230000001717 pathogenic effect Effects 0.000 description 1
- 238000010647 peptide synthesis reaction Methods 0.000 description 1
- 210000004976 peripheral blood cell Anatomy 0.000 description 1
- 206010034674 peritonitis Diseases 0.000 description 1
- 239000008177 pharmaceutical agent Substances 0.000 description 1
- 230000035790 physiological processes and functions Effects 0.000 description 1
- 210000002826 placenta Anatomy 0.000 description 1
- 230000004983 pleiotropic effect Effects 0.000 description 1
- 208000028280 polygenic inheritance Diseases 0.000 description 1
- 235000015277 pork Nutrition 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 239000000186 progesterone Substances 0.000 description 1
- 229960003387 progesterone Drugs 0.000 description 1
- 230000000135 prohibitive effect Effects 0.000 description 1
- 230000018883 protein targeting Effects 0.000 description 1
- 235000018102 proteins Nutrition 0.000 description 1
- 230000002685 pulmonary effect Effects 0.000 description 1
- 238000000746 purification Methods 0.000 description 1
- 230000002285 radioactive effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000000611 regression analysis Methods 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 230000024977 response to activity Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000012502 risk assessment Methods 0.000 description 1
- 206010039447 salmonellosis Diseases 0.000 description 1
- 150000003839 salts Chemical class 0.000 description 1
- 235000021003 saturated fats Nutrition 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 230000028327 secretion Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000009394 selective breeding Methods 0.000 description 1
- 210000003491 skin Anatomy 0.000 description 1
- 238000001374 small-angle light scattering Methods 0.000 description 1
- 239000008279 sol Substances 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 230000000392 somatic effect Effects 0.000 description 1
- 238000012066 statistical methodology Methods 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 150000003431 steroids Chemical class 0.000 description 1
- BGRJTUBHPOOWDU-UHFFFAOYSA-N sulpiride Chemical compound CCN1CCCC1CNC(=O)C1=CC(S(N)(=O)=O)=CC=C1OC BGRJTUBHPOOWDU-UHFFFAOYSA-N 0.000 description 1
- 239000000725 suspension Substances 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
- 208000011580 syndromic disease Diseases 0.000 description 1
- 238000012353 t test Methods 0.000 description 1
- 238000002560 therapeutic procedure Methods 0.000 description 1
- 229940104230 thymidine Drugs 0.000 description 1
- 208000021510 thyroid gland disease Diseases 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 230000009261 transgenic effect Effects 0.000 description 1
- UFTFJSFQGQCHQW-UHFFFAOYSA-N triformin Chemical compound O=COCC(OC=O)COC=O UFTFJSFQGQCHQW-UHFFFAOYSA-N 0.000 description 1
- 201000008827 tuberculosis Diseases 0.000 description 1
- 238000009827 uniform distribution Methods 0.000 description 1
- 235000021081 unsaturated fats Nutrition 0.000 description 1
- 230000002485 urinary effect Effects 0.000 description 1
- 210000002700 urine Anatomy 0.000 description 1
- 238000002255 vaccination Methods 0.000 description 1
- 231100000889 vertigo Toxicity 0.000 description 1
- 230000003612 virological effect Effects 0.000 description 1
- 238000011179 visual inspection Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
- 229940046009 vitamin E Drugs 0.000 description 1
- 235000019165 vitamin E Nutrition 0.000 description 1
- 239000011709 vitamin E Substances 0.000 description 1
- 210000003905 vulva Anatomy 0.000 description 1
- 235000020990 white meat Nutrition 0.000 description 1
Landscapes
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Description
31, AUG. 2007 17:48 SPRUSON FERGUSON 92615486 NO. 1556 P. 4 S&FRef: 819230
SAUSTRALIA
0 CN PATENTS ACT 1990 SCOMPLETE
SPECIFICATION
en FOR A STANDARD PATENT CcD ^1- Name and Address of Applicant: Innovative Dairy Products Pty Ltd, an Australian company, ACN 098 382 784, of Level 1, 84 William Street, Melbourne, Victoria, 3000, Australia Actual Inventor(s): Address for Service: Invention Title: Associated Provisional Applic Herman Raadsma Bruce Tier Alexander Frederick Woolaston Gerhard Christian Moser Spruson Ferguson St Martins Tower Level 31 Market Street Sydney NSW 2000 (CCN 3710000177) Whole genome based genetic evaluation and selection process ation Details: [31] Appln No(s): [323 Application Date: 2007901355 15 Mar 20 07 2007901501 20 Mar 2007 [33] Country;
AU
AU
The following statement is a full description of this invention, including the best method of performing it known to me/us:- 5845c(933774_1) COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31, AUG. 2007 17:49 SPRUSON FERGUSON 92615486 NO. 1556 P. 0 0 WHOLE GENOME BASED GENETIC EVALUATION
AND
SELECTION
PROCESS
TECHNICAL
FIELD
[0001 Disclosed herein are methods for predicting genetic and phenotypic merit in s individuals on the basis of genome-wide marker information. Also disclosed are methods for Sdetermining the fitness or predisposition of an individual for a desired purpose, or the 0 susceptibility of the individual to an outcome, such as a disease. It should be recognized that Sthe invention has a broad range of applicability,
BACKGROUND
,t [0002] All references, including any patents or patent applications, cited in this specification are hereby incorporated by reference. No admission is made that any reference constitutes prior art. The discussion of the references states what their authors assert, and the applicants reserve the right to challenge the accuracy and pertinence of the cited documents.
It will be clearly understood that, although a number of prior art publications are referred to is herein, this reference does not constitute an admission that any of these documents forms part of the common general knowledge in the art, in Australia or in any other country.
0003 Genetic progress, for example in a herd, flock, group, crop, etc, depends on choices made as to the best individuals to use as breeding stock, on the basis of predictions of the superior performance of offspring yet to be born. The basis of such predictions is generally to an estimate of genetic merit on the basis of the use of statistical analysis of performance or phenotypic data of an individual and that of its relatives where the data are analysed using statistical approaches such as best linear unbiased prediction (BLUP). This is a well-accepted procedure, and is the basis of genetic improvement schemes for several species of livestock in a number of countries. For example, such schemes have been used for dairy cattle in Australia, New Zealand, Canada and Holland, for sheep in Australia, New Zealand and the United Kingdom, and for poultry and pigs in a number of countries.
874124 7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG. 2007 11:49 SPRUSON FERGUSON 92615486 NO. 1556 P. 6 -2- [0004) Although phenotypic measuremenlts of a biological or performance trait can be ;Z ecorded for an individual within a population, there is little or no usefli phenotypio information available until the individual enters the productive phase of its life, which is normally adulthood. In the case of the dairy cow, this is its first lactation; for meat-producing animals such as beef cattle, pigs and sheep, it is harvesting, i.e. slaughter; for racing animals, Vait is when the animal commences training or actual racing. In the pro-production phase predictions of genetic merit for an individual rely entirely on the data on relatives of that individual. This lack of information on individuals within a population at an early stage 0reduces the ability to make decisions about the potential future use of such individuals io especially with respect to their use in breeding. Consequently the rate of genetic gain in the biological or performance trait of the population under selection is less than that which would be achievable with such data, 0005 Some performance traits are expressed in only one sex; such traits are known as sex-limited traits, with one example being milk production, However, the genetic merit of the sire for any heritable trait is very important in achieving genetic progress, in that an individual inherits around one-half of its genotype from each parent. Therefore it is advantageous to assess the genetic merit of an individual sire in order to define its value for breeding the next generatidn of progeny/descendants. This has led to progeny testing of young sires, which are then generally selected on the basis of Estimated Breeding Value which is an estimate of their genetic merit.
0006 In many commercially-important species, artificial breeding techniques such as artificial insemination in vitro fertilization embryo transfer and the like are permissible and practicable. In such species, following progeny testing, the semen of the best (proven) sires is then made available for use in the wider population by artificial insemination Even though progeny testing delays the use of sires in the wider population, the costbenefit is sufficiently great that artificial breeding companies invest a considerable amount in progeny testing each year. For example, the cost of progeny testing per young dairy or beef bull is around $A20,000 per head, and depending on the size of the company it is not uncommon for first year team size to be around 150 bulls.
874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31 AUG. 2007 17:49 SPRUSON FERGUSON 92615486 NO. 1556 P. 7 -3- 0 0 0007 The use of quantitative genetics in individual breeding programs is a powerful and important tool. For example, it has been a major driver of profitability and international Scompetitiveness within the dairy industry in Australia and other countries, However, until recently the use of large-scale gene-marker technology to identify premium individuals and favourable traits has been immature, cumbersome and expensive. Some preliminary attempts Sat genome-wide analysis of data for dairy cattle have been described in artificial simulated C data sets where both marker spacing and genetic (or so called Quantitative trait loci, QTL) Seffects were known and do not reflect naturally complex biological systems(Meuwissen et al, S2001; Gtanola et al 2006). Furthermore in these studies the number and density of markers 10 was relatively low compared to the quantity of genotypic data now becoming available which could contain a full genome sequence of each individual thus exacerbating problems which are overcome by this invention. Despite these limitations the hypothetical and yet as unproven advantages of using extensive marker information are highly prospective in both livestock (Schaffer, 2006) and plants (Bernardo and Yu, 2007) once again in artificial It simulated un natural populations. Also, examples of attempts to apply neural network and genetic algorithms approaches to determine a variety of predictive applications based upon gene-hunting techniques to determine particular genes responsible for determining the desired outcome and is not applicable to a whole genomic approach to the situation. Therefore,, despite previous attempts at gene analysis for predictive capabilities and the availability of genomic information for many species, the methods have hitherto not been widely applied because of difficulties in predicting correlation between gene markers such as single nucleotide polymorphisms (SNPs) and beneficial phenotypic traits. Even with the availability of validated SNPs or other markers and high-throughput genotyping methods, there is no generally accepted methodology for analysis of genotype data at the whole genome level.
0008 Therefore, an improved system and method for analysing genotype data is desired,
SUMMARY
0009 The inventors have now devised a method for estimation of breeding values and phenotypic performance from SNP data, in which genome-wide variation in the SNP data is used to account for the variation in breeding values of phenotyp by integrating dimension 874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31 AUG. 2007 17:50 SPRUSON FERGUSON 92615486 NO. 1556 P. 8 -4- 0 0 con reduction and SNP selection to reduce the number of dimensions in the original SNP data and ;optimize model selection fort maximum predictive accuracy minimal prediction enor).
<in one arrangement, using this method enables the breeding value of an individual to be predicted without knowing the actual location of the SNP in the genome, and without having knowledge of the pedigree of the individual. Knowledge of the pedigree is helpful, but is not essential to the method, Also, knowledge of marker locations for a particular trait may also be helpful, but again are not necessary for the prediction of merit using the present method(s).
l[ 0010 The presently described methods and systems disclosed herein cover aspects in 0gene marker and trait analyses and building predictive diagnostic tools. A process of dimension reduction is used that preserves the information in fewer dimensions without loss of information and without explicit modeling relationships between genotype and phenotype.
This is achieved but not limited by use of PLS, PCA and SVM combined with optional cross validation. Furthermore the prediction equations derived may use a subset of markers which capture a large proportion of the original information. This is accomplished by combining dimension reduction and marker selection. Eurtheriore, the prediction equations (i.e.
predictor frction(s)) and marker selection may be derived by using a genetic algorithm or similar method.
[0011 The use of extensive genome wide genetic marker technologies allows many 1000's if not soon millions of markers to be measured in an individual. It is forecast that it z0 will be technically possible to obtain the whole genome sequence for individuals at a reasonable price in the next decade. However, now and in the forseeable future, in most cases many more marker observations are present than individuals measured (ie 50 to 500 million marker observations in 1000 individuals are common data structures). This presents the following problems in that not all markers can be explicitly fitted thus rendering usual methods for marker subset selection such as ordinary regression methods (stepwise, least angle regression) or QTL screening methods useless. Furthermore there are many 1000's of model combinations possible (theoretically an exponential increase in model combinations over the number of markers tested different models being fitted to the data where the total number of possible models is SUM(k 1 to N) the total number of specific models is SUM (k=l to ndw,) as fitting more than d SNP is redundant).
B74124.7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31, AUG. 2007 17:50 SPRUSON FERGUSON 92615486 NO. 1556 P. 9 0 -n Furthermore the close relationship between multiple markers in linkage disequilibrium means Sthat many alternate markers may be used to account for the same trait-marker relationship Stherefore making finite model selection to maximise prediction of merit almost impossible.
SThe ambiguity in interpretation of multiple marker models arises as a consequence of collinearity between the explanatory variables). Finally, the addition of multiple isolated Sgenetic effects in conventional QTL mapping solutions or marker associations, present Sproblems in accurately predicting total genetic merit, since each effect is subject to error and c the sum total of all effects may be grossly over estimated thus limiting prediction and utililty of high density marker applications in diagnostic applications of human, plant and animal, This invention describes means to handle all these problems in an integrated and systematic manner to maximize ascertainment of predictive functions between genome-wide marker information and merit in populations to which the marker information applies.
0012 1 The methods disclosed herein demonstrate that a subset of markers may be used to explain a large proportion of the variation in a given trait in a population. The methods of the 1i invention enable the identification of the minimum number of SNPs which explains the maximum variation of a trait. This can be established using the "training set" described herein The selected set of SNPs is then used on the population of interest. The method can be used to design a panel, eg of SNPs, for each trait in a desired set of traits. It is expected that there may be some redundancy between the sets of SNPs for different traits.
0013 According to an arrangement of a first aspect there is provided a method for the prediction of the merit of at least one individual in a population, the method comprising the steps of: in the population, where information of individuals are known, using dimension reduction on the information to project the information to a low dimensional space whilst retaining the complexity of the information to generate a set of explanatory variables; utilising the explanatory variables to generate a predictor function with respect to merit; and utilising the predictor function to predict the merit of the individual.
874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG. 2007 17:50 SPRUSON FERGUSON 92615486 NO. 1556 P. -6- 0014 1 According to another arrangement of the first aspect, there is provided a method for a prediction of a merit of at least one individual, the method comprising the steps of: in a first population, where genotype and phenotype information of individuals t c n in the first population are known, using dimension reduction on the genotype and phenotype s information to determine the complexity of the genotype and phenotype information to I\ minimise prediction error for at least one marker in the first population and thereby generate a set of explanatory variables with respect to the at least one marker; ci utilising the explanatory variables to the first population to generate predictor o function with respect to merit; generating a genotype for the at least one marker in at least one individual of interest from a second population; and utilising the predictor function and the genotype of the at least one individual of interest to determine the genetic merit of the individual of interest with respect to the at least one marker.
[0015] According to a further arrangement of the first aspect, there is provided a method for the prediction of the merit of at least one individual in a population, the method comprising the steps of: in the population, where information of individuals are known, using a genetic algorithm process on the information to generate a set of explanatory variables for all the information, the explanatory variables comprising weighted averages for components of the information; and utilising the explanatory variables to generate a predictor function with respect to merit; utilising the predictor function to predict the merit of the individual.
0016 In any one of the arrangements of the first aspect, step may comprise utilising the explanatory variables to generate a plurality of predictor functions for the individuals of the population. The information may comprises information for at least one marker. The information may comprise information for a plurality of marker s.
874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG. 2007 17:51 SPRUSON FERGUSON 92615486 NO. 1556 P. 11 -7- 0 os4 (0017 In any one of the arrangements of the first aspect, or in any arrangement of the ;Zfollowing aspects, the information may be selected from the group of genotype phenotype or genotype and phenotype information on individuals in the population, For a plurality of individuals of interest from the population where information is unknown, the method may further comprise generating genotype for at least one individual of interest from population.
en [0018 In still ftirther arrangements, the method may further comprise the steps of; c(f) deternining additional information on the explanatory variables for the at least one individual; 0(g) combining the additional information for the at least one individual with the information on the explanatory variables for the individuals of the population; and repeating steps and for at least one further individual to predict the merit of the further individual.
0019 Step may comprises determining additional information on the explanatory variables on a plurality of individuals, 0020 In any one of the arrangements, the utilisation of the predictor function may be performed on the basis of a desired outcome.
0021) The genotype information may comprises genetic markers or bio-markers or epigenetic markers.
[0022 The merit may be a genetic merit selected from the group of a molecular breeding value, a quantitative trait locus, or a quantitative trait nucleotide.
[0023 The sampling in step may be random or it may be targeted. The targeted sampling may comprise sampling the first population on the basis of an outcome of interest.
0024 Step of the method may comprise defining a plurality of predictors for the sampled individuals of the first population. Step may comprise determining the genotype for a plurality of markers, Step may comprise determining the genotype for a plurality of individuals of interest, 874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31.AUG-2007 17:51 31. UG. 007 1:51SPRUSON FERGUSON 92615486 N.15 .1 ,NO. 1556 P. 12 17-- -8to 0025)] The genotype may comprise genetic markers, blo-markers and/or epigenetic ;Z markers. The merit may be in the form of genetic merit. The genetic merit may be one or more of a molecular breeding value, the isolation and/or identification of a quantitative trait locus (QTL), a quantitative trait nucleotide (QTN), or other genotypic information. The merit may alternatively be in the form of the fitness of the individual of interest for a desired Va outcome. The merit may also be in the form of a diagnosis of a condition or susceptibility to ~zI- a condition in the individual of interest.
[0026 The prediction of merit of the individual may involve only genotypes available for o at least one of the predictor ftnctionss.
1o [0027]1 According to a second aspect there is provided a method for predicting trait performance for at least one individual of interest, the method comprising the steps of:, in the population, where information of individuals are known, using dimension reduction on the information to project the information to a low dimensional space whilst retaining the complexity of the information to generate a set of explanatory variables, and utilising the explanatory variables to generate a predictor function with respect to merit; utilising the predictor function to predict the trait perfon-naxice for the individual.
00285) The method may fRrther comprise the steps of: for an individual of interest from the population where information is unknown, generating genotype for at least one individual of interest from population; and applying the predictor function to the genotype of the at least one individual of interest to predict the predict the trait performance for the individual.
0029)1 According to a third aspect there is provided a method for selecting at least one individual of interest, wherein said method comprises: a) in a first population 3 where genotype and phenotype information of individuals in the first population are known, using dimension reduction on the genotype and phenotype 8741247 COMS ID No: ARCS-159283 Received by IP Australia: Time (I-tm) 18:19 Date 2007-08-31 31. AUG. 2007 17:52 SPRUSON FERGUSON 92615486 NO. 1556 P. 13 information to determine the complexity of the genotype and phenotype informationi to ;Z minimise prediction error for at least one marker in the first population and thereby generate a set of' explanatory variables with respect to the at least one marker; applying the explanatory variables to the first population to generate a predictor function; Va generating genotype for the at least one marker in at least one individual of interest from a second population; Cl applying the predictor fimetion to the genotype of the at least one individual of o interest to select the individual.
i [0030]J According to a fourth aspect there is provided a method of diagnosing a condition in at least one individual of interest in a population, the method comprising the steps of: in the population, where information of individuals are known, using dimension reduction on the information to project the information to a low dimensional space whilst retaining the complexity of the information to generate a set of explanatory variables; and 1S utilising the explanatory variables to generate a predictor function; utilising the predictor fuinction to diagnose a condition in the individual The method of diagnosing mnay further comprise the steps of for an individual of interest ftrm the population where information is unknown, generating genotype for at least one individual of interest from. population; and applying the predictor function to the genotype of the at least one individual of interest to diagnose a condition in the individual of interest.
0031 The method includes drawing an inference regarding a trait of the subject for the health condition, from a nucleic acid sample of the subject. The inference is drawn by identifying at least one nucleotide occurrence of a SN? in the nucleic acid sample, wherein 2s the nucleotide occurrence is associated with the trait 0032 According to a -fifth aspect, there is provided a method of prediction of a susceptibility to an outcome of at least one individual of' interest in a population, the method comprising the steps of: 0t74124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 3t AUG. 2007 17:52 SI'RUSON FERGUSON 92615486 NO. 1556 P. 14 on in the population, where infomation of individuals are known, using dimension ;Z reduction on the information to project the information to a low dimensional space whilst retaining the complexity of the information to generate a set of explanatory variables; and utilising the explanatory variables to generate a predictor function; Va utilising the predictor function to predict the susceptibility of the individual to en an outcome.
Cl [0033 J The prediction of a susceptibility to an outcome may further comprising toe steps of: (di) for an individual of interest from the population where information is unknown, generating genotype for at least one individual of interest from population; and applying the predictor function to the genotype of toe at least one individual of interest to predict the susceptibility of the individual to an outcome 0034] The outcome may be the susceptibility of the individual of interest to a disease, The outcome may be the susceptibility of the individual of interest to a response to a stimulus.
The stimulus may be selected from the group of a medicament, toxin, or an environmental condition. The environmental condition may comprise water shortage, feed shortage, stress, sunlight, or other environmental condition.
£003 5] According to a sixth aspect, there is provided a method of breeding at least one individual in a population, the method comprising the steps of; in the population, where information of individuals are known, using dimension reduction on the information to project the information to a low dimensional space whilst retaining the complexity of the information to generate a set of explanatory variables; and utilising the explanatory variables to generate a predictor function with respect to merit of the individual; utilising the predictor function to predict the merit of the individual and breeding firm the individual of interest on the basis of the merit of the individual.
874124-7 COMS ID No: ARCS-i 59283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG. 2001 17:52 SPRUSON FERGUSON 92615486 NO. 1556 P. £(0036] The method of breeding may fur-ther comprise the steps of: ;Z determining infonnationi for the descendants of the at least one individual; en correlating the infonnation for the descendants of the at least one individual to the predictor fuinction, and selecting descendants of said individual on the basis of the relationship between IND the information for the descendants and the predictor fu~nction.
S0037] 1it will be appreciated that while methods of breeding cannot ethically be utilized with humans, there are situations in which a couple may be at significantly increased risk of o having a child which suffers from a genetically-deteriflmed disease or condition. For example, genetic counselling is widely used to help couples to decide whether to have chikdren or to proceed with a pregnancy. However, few conditions are determined by a single gene, and unless a relative of one of the couple is known to have a geneticallydetermined disease or condition, the couple may not be aware that there is any risk. This aspect of the invention is applicable to determination of risk .and assisting a couple to arrive at an informed decision in is the context of genetic counselling.
[0038) According to a seventh aspect there is provided a system for the prediction of merit of an individual in a population, the system comnprising: in the population, where information of individuals are known, means for using dimension reduction on the information to project the information to a low dimensional space 2o whilst retaining the complexity of the information to generate a set of explanatory variables; and means for utilising the explanatory variables to generate a predictor function with respect to merit; means for utilising the predictor fraction to predict the merit of the individual 1. According to an eighith aspect there is provided a system for predicting trait performance of at least one individual in a population, the system comprising; in the population, where information of individuals are known, means for using dimension reduction on the information to project the information to a low dimensional space $74124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 34, AUG. 2007 17:53 SPRUSON FERGUSON 92615486 NO. 1555 P, 16 -12- 0 1l 2) whilst retaining the complexity of the information to generate a set of explanatory variables; and means for utilising the explanatory variables to generate a predictor function; and os() means for utilising the predictor function to predict performance of said trait for
IND
en the individual of interest.
ci -[0039] The trait may be a quantitative trait.
0040 According to a ninth aspect there is provided a system for selecting at least one Sindividual in a population, the system comprising; a) in the population, where information of individuals are known, means for using dimension reduction on the information to project the information to a low dimensional space whilst retaining the complexity of the information to generate a set of explanatory variables; and means for utilising the explanatory variables to generate a predictor function; is and means for utilising the predictor function to select the individual.
[0041 According to an tenth aspect, there is provided a system for diagnosing a condition in at least one individual of interest in a population, the system comprising: in the population, where infonation of individuals are known, means for using dimension reduction on the information to project the information to a low dimensional space whilst retaining the complexity of the information to generate a set of explanatory variables; and means for utilising the explanatory variables to generate a predictor function; means for utilising the predictor ftEmotion to diagnose a condition in the individual.
274124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG. 2007 11:53 SPRUSON FERGUSON 92615486 NO.1556 P. 11 -13 Oil 1~0042]1 According to an eleventhi aspect there is provided a system for Prediction of a ;Z susceptibility to an outcome of at least one individual of interest in a population,~ the system comprising., in the population 3 where information of individuals are known, means for using o dimension reduction on the information to project the infonnation to a low dimensional space en whilst retaining the complexity of the information to generate a set of explantory vaniables; and means for utilising the explanatory variables to generate a predictor function; ci means for utilising the predictor function to predict the susceptibility of the at to least onie individual of interest to Wn outcome.
f 0043 3 According to a twelfth aspect there is provided a system for breeding at least one individual in a population, the system comprising: in the population, where information of individuals are known, means for using dimension reduction on the information to project the information to a low dimensional space whilst retaining the complexity of the information to generate a set of explanatory variablesand mneans for utilising the explanatory variables to generate a predictor function with respect to mneit of the individual; means for utilising the predictor function to predict the merit of the individual and means for breeding from the individual of interest on the basis of' the merit of the individual.
[0044 1 The system may farther comprise the steps of: means for determining information for the descendants of the at least one individual; means for correlating the information for the descendants of the at least one individual to the predictor function; and 874124_7 COMS IDNo: ARCS-159283 Received by IP Australia: Time (I-tm) 18:19 Date 2007-08-31 3. AUG. 2007 17:53 SPRUSON FERGUSON 92615486 NO. 1556 P. 18 -14- 0 0(h) means for selecting descendants of said individual on the basis of the ;relationship between the information for the descendants and the predictor function.
n[ 0045 In the fourth and tenth aspects, the diagnosis may be diagnosis of a disease or condition. For example, the disease may be any disease which affects productivity, performance or fertility. For example in dairy cattle these include metabolic disorder, n mastifis, and wasting. The condition may be resistance to disease or infection, or _susceptibility to infection with and shedding of pathogens such as E. coli, Salmonella species, Ci Listeria monocytogenes, prions and other organisms potentially pathogenic to humans, oregulation of immune status and response to antigens, susceptibility to conditions such as C, io bloat, Johne's disease, or liver abscess, previous exposure to infection or parasites, or other health or respiratory and digestive problems.
0046] In the fifth and eleventh aspects, the susceptibility may be susceptibility to a disease or condition. For example, the disease may be a metabolic disorder, mastitis, or wasting.
[0047 According to any one of the first to twelfth aspects, the information may comprise genetic information consisting essentially of marker genotypes. The genetic markers may be distributed substantially across the genome. The number of genetic markers genotyped may be greater than 1000, greater than 1500, greater than 2500, greater than 5000, greater than 10000, greater than 15000, greater than 20000, greater than 25000, greater than 30000, greater than 35000, greater than 40000, greater than 45000, greater than 50000, greater than 100000, greater than 250000, greater than 500000, or greater than 1000000, greater than 5000000, greater than 10000000 or greater than 15000000.
0048 The genetic markers may be selected from the group consisting of single nucleotide polymorphism (SNP), tag SNP, microsatellite (simple tandem repeat STR, simple sequence 2s repeat SSR), restriction fragment length polymorphism (RFLI), amplified fragment length polymorphism (AFLP), insertion-deletion polymorphism (INDEL), random amplified polymorphic DNA (RAPD), ligase chain reaction, insertion/deletions and direct sequencing of the gene or a simple sequence conformation polymorphisms (SSCP). The genetic marker may be a SNP.
874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31, AUG. 2007 17:54 SPRUSON FERGUSON 92615486 NO. 1556 P. 19 0 0 0049 The information may comprise at least one of the pedigree of the individual; an estimated breeding value of the individual; data on genetic markers across the genome for the individual or for relatives of the individual; at least one index of phenotype for the individual or for relatives of the individual; at least one marker predictive of phenotype for the individual or for relatives of the individual; and at least one index of epigenetic modification Sor status for the individual, or a combination thereof.
0050 The individual may be a dairy cow or bull, and the quantitative trait may be selected from the group consisting of APR, ASI, protein kg, protein percent, milk yield, fat 0 kg, fat percent, overall type, mammary system, stature, udder texture, bone quality, Sto angularity, muzzle width, body depth, chest width, pin set, pin sign, foot angle, set sign, rear leg view, udder depth, fore attachment, tear attachment height, rear attachment width, centre ligament, teat placement, teat length, loin strength, milking speed, temperament, like-ability, survival, calving ease, somatic cell count, cow fertility, and gestation length, or a combination of one or more of these traits.
is [0051] The dimension reduction may be selected from the a technique in the group consisting of principal component analysis (PCA), a genetic algorithm, a neural network, partial least squares (PLS), inverse least squares, kerne PCA, LLE, Hessian LLE, Laplaian Eigenmaps, LTSA, isomap, maximum variance unfolding, Bolzman machines, projection pursuit, a hidden Markov model support vector machines, kernel regression, discriminant analysis and classification, k-nearest-neighbour analysis, fuzzy neural networks, Bayesian networks, or cluster analysis.
0052 The dimension reduction technique may be principal component analysis. The dimension reduction technique may be supervised principal component analysis. The number of principal components in the priciniple component analysis may be between about 10 and about 40. The number of principal components may be about [0053 The dimension reduction technique may be partial least squares analysis. The number of latent components in the partial least squares analysis may be between about 4 and about 10. The number of latent components may be about 6.
0054 The dimension reduction technique may be support vector machine analysis.
874124 7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG. 2007 11:54 SPRUSON FERGUSON 92615486 NO. 1556 P. -16- 011) [0055] In any one of the above aspects the infonnatioli may not include the pedigree of the ;Z individual.
en [~~0056] 1in one form Of the above aspects, the training population sasbe ftets Population. It is from these individuals that the relationships between the marker variants and o the trait variation is ultimately established. The genotypes of other individuals can be en determined for subsets and used with the predictor functions to determine any type of merit of those individuals.
1 £0057 3 The information may comprise either genotypic or phenotypic information, or a 0 combination thereof, for the individuals in the population. The at least one individual may or may not have corresponding explanatory variables.
0058 The informationa may comprise one, two, three or more of: the pedigree of the individua; an estimated breeding value of the individual; data on genetic markers across the genoine for the individual or for one or more of its relatives; at least one index of phenotype for the individual or for one or more of its; at least one bio-marker predictive of phenotype for 1s the individual or for one or more of its relatives; at least one index of epigenetic modification or status for the individual, and any other information which is indicative of, or potentially indicative of, genetic differences between individuals in the population 2 or a combination thereof For example, other important explanatory variables for phenotypes may include any systematic effects which affect the data, such as age, age of dam, management group, herd, year, season, sex, maternal effects (genetic and environmental), and treatments of the anlimal, such as vaccination. At the phenotypic level comparison can only be made of 'like' wit 'like'.
0059 3 The prediction of merit, the process of selection or the process of breeding for at least one individual, and systems involving same, may involve a predictor function or functions. The predictor fanction5 may 'be genetic predictors, and mnay be derived from genetic m-arkets, phenotypic information or other genetic information such as pedigree, correlated EBVs, genetic parameters such as heritabilities, variances and correlations, or a combination thereof. However, in some arrangements, the pedigree and or map locations 814124_7 COMS ID No: ARCS-i 59283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG. 2007 17:54 SPRUSON FERGUSON 92615486 NO. 1556 P. 21 -17- 0 b (with respect to marker positions of a particular trait) may not be required for the prediction Sof merit.
0060 The markers may be genetic markers, and may be selected from, but are not restricted to, the group consisting of single nucleotide polymorphism (SNP), tag SNPs, haplotype, microsatellite (simple tandem repeat STR, simple sequence repeat SSR), Srestriction fragment length polymorphism (RFLP), amplified fragment length polymorphism (AFLP), insertion-deletion polymorphism (INDEL), random amplified polymorphic
DNA
(RAPD), ligase chain reaction, insertion/deletion and direct sequencing of the gene or a 0 simple sequence conformation polymorphism (SSCP). For example, the genetic marker may So be a single nucleotide polymorphism (SNP). The markers may be distributed substantially across the genome.
0061 The predictors are chosen using a dimension reduction technique. The dimension reduction technique may be selected from a variety of methods, including but not limited to, principal component analysis (PCA), genetic algorithms, neural networks, partial least squares (PLS), inverse least squares, kernel PCA, locally linear embedding such as LLE, Hessian LLE, Laplacian Eigenmaps, LTSA), Isomap, Maximum Variance Unfolding, Bolzman machines, projection pursuit, a hidden Markov model support vector machines,, kernel regression, discriminant analysis and classification, k-nearest-neighbour analysis, fuzzy neural networks, Bayesian networks, cluster analysis or other known dimension reductions techniques or may be a combination of a number of dimension reduction techniques for example partial least squares reduction in combination with a genetic algorithm process. Other examples are also listed in "A survey of dimension reduction techniques" (US DOE Office of Scientific and Technical Information, 2002). The dimension reduction technique may be a supervised dimension reduction technique such as supervised partial least squates analysis or supervised principle component analysis among others.
Different methods give similar results, but vary in speed of computation Neural networks and genetic algorithms are methods for reducing dimensions, and thus they could be used either directly or indirectly. For example PCA will transform 15000 SNP into N principal components, where N is the number of individuals; a genetic algorithm or a neural network could be used to choose among the principal components.
874124 7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG 2007 17:55 SPRUSON FERGUSON 92615486 NO. 1556 P. 22 -18- 0 Ol) 0062 The dimension reduction technique may be partial least squares analysis. The ;Z dimension reduction technique may be logistic partial least squares analysis. The dimension _reduction technique may be generalised partial least squares analysis. In other arrangements, the dimension reduction technique may be selected from the group of principal component 0 analysis (P CA), neural networks, or projection pursuit.
n[ 0063 1 The dimension reduction technique may be principal component analysis, and the number of principal components may be selected using a genetic algorithm, wherein the ci principal components may form the inputs to the genetic algorithm. In one erbodiment the odimension reduction technique is supervised principal component analysis. The number of 1o principal components is less than the number of data points. In one embodiment the number of principal components is about 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39 or 40. The number of principal components may be about 20. The trait may be any quantitative tait. The trait may relate to any aspect relating to the group consisting of agricultural, livestock, performance and i aquaculture animals, and plants used in agriculture, agronomy, forestry and horticulture.
0064 1 It is understood that the methods described herein may be applied to any species for which both genomic information and phenotypic information is available. Genornio information can include DNA sequences and data relating to single nucleotide polymorphisms (SNPs), haplotypes, and the like. Phenotypic information can include performance data, for example for dairy or beef cattle, sheep produced for wool or meat, or for animals used for racing. Phenotypic data also includes information regarding morbidity and disease susceptibility. As a result of the various genome projects, genofluc data such as SNPs, haplotypes etc. are widely available, In addition to the human genome, partial or complete genome maps have been published for mammnals, including chimpanzee, cattle, horse, dog, chicken, rat, mouse, Rhesus macaque, cat, other vertebrates, including zebrafish, medakafish, blowfish, and African clawed toad, and plants, including rice, wheat, maize, tomato, loblolly pine, and poplar. Some sequence data are also available for crustaceans such as shrimp; see for example US Patent No. 5,712,091.
874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31, AUG. 2007 17:55 SPRUSON FERGUSON 92615486 NO. 1556 P. 23 -19- 0 t 0065 Information about genome projects and links to their databases can be found on the World Wide Web, for example at the National Center for Biotechnology Information S(www.ncbi.nlm.mh.gov/Genomes/index.html) which includes the databases for Online en Mendelian Inheritance in Man (www.ncbinlm.rh.gov/OmiI/) and the International HapMap Project (www.hapmap.org). The Genomes OnLine database (www.genomesonline.org) and V. the Institute for Genomic Research (www.tigr.org/tdb).
[0066 Performance data for livestock animals such as dairy cattle have been extensively C recorded in countries such as Australia, Canada, New Zealand and Holland; similar data are Savailable for beef cattle, pigs, chickens, and sheep. Performance data for thoroughbred C o racehorses, quarterhorses, standardbred trotting horses and pacers, endurance horses and Arab horses are available, in the case of thoroughbreds going back well over 100 years.
[0067 1 Thus the invention is particularly applicable to, but not limited to, the following types of individual; a) Cattle: dairy and beef breeds; is b) Horses: racing breeds, eg thoroughbreds, standardbreds, quarterhorses, endurance horses, and Arabs; c) Sheep: wool, meat and milk breeds; d) Other fibre, meat and milk-producing animals, such as goats, alpacas, vicunas and llamas; z0 e) Other racing animals, such as camels; f) Poultry, such as chickens, turkeys, geese and ducks; g) Fish: fannrmed genera or species such as samhnonids, including salmon, ocean trout, and freshwater trout; barramundi, tilapia and carp; h) Crustaceans: fanned genera or species, such as prawns and shrimp; i) Humans: prediction of sporting performance, especially for athletics events involving runing and/or endurance, swimming, rowing and kayaking, and football codes (eg Australian Rules Football, rugby, American football, soccer), baseball, basketball and ice hockey; identification of markers useful in diagnosis of disease, estimation of risk of multifactorial genetic disorders; and identification of pharmnnacogenomic markers.
874124 7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG. 2007 17:55 SPRUSON FERGUSON 92615486 NO. 1556 P. 24 0 on j) Plants: genera or species used in agriculture (crop or pasture), forestry or Shorticulture.
[0068 The quantitative trait may be one or more traits associated with dairy production, which may be selected from, but is not restricted to, the group consisting of Australian Profit SRanking (APR), ASI, protein kg, protein per cent, milk yield, fat kg, fat percent, overall type, en mammary system, stature, udder texture, bone quality, angularity, muzzle width, body depth, Schest width, pin set, pin sign, foot angle, set sign, rear leg view, udder depth, fore attachment, rear attachment height, rear attachment width, centre ligament, teat placement, teat length, o loin strength, milking speed, temperament, like-ability, survival, calving ease, somatic cell count, cow fertility, and gestation length, or a combination thereof. Any trait which is under genetic control in part and for which there is genetic variability can be used.
[0069 According to a thirteenth aspect there is provided a breeders product comprisming at least one gamete with a high prediction of merit for at least one marker, the breeders product selected by a method for the prediction of the merit of at least one individual, the method comprising the steps of: in a first population, where genotype and phenotype information of individuals in the first population are known, using dimension reduction on the genotype and phenotype information to determine the complexity of the genotype and phenotype information to minimise prediction error for at least one marker in the first population and thereby generate a set of explanatory variables with respect to the at least one marker; applying the explanatory variables to the first population to generate a predictor function; t generating genotype for the at least one marker in at least one individual of interest from a second population; applying the predictor function to the genotype of the at least one individual of interest to determine the genetic merit of the individual of interest with respect to the at least one marker.
0070] According to a fourteenth aspect there is provided a computer system comprising a computer processor and memory, the memory comprising software code stored therein for 8741247 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG. 2007 17:56 SPRUSON FERGUSON 92615486 NO. 1556 P. -21- 0 1) execution by the computer processor of a method for the prediction of the merit of at least one Sindividual in a population, the method comprising the steps of: i in a database comprising information about the population, where information c of individuals are known, using dimension reduction on the information to project the information to a low dimensional space whilst retaining the complexity of the information to n generate a set of explanatory variables; utilising the explanatory variables to generate a predictor function with respect to merit; and utilising the predictor function to predict the merit of the individual.
o [0071] In a fifteenth aspect there is provided a computer readable medium, having a progam recorded thereon, where the program is configured to make a computer execute a procedure for the prediction of the merit of at least one individual in a population, the software product comprising: in a database comprising information about the population, where information is of individuals are known, code for using dimension reduction on the information to project the information to a low dimensional space whilst retaining the complexity of the information to generate a set of explanatory variables; code for utilising the explanatory variables to generate a predictor function with respect to merit; and code for utilising the predictor function to predict the merit of the individual.
0072 1 According to a eighteenth aspect, there is provided an information database product comprising information for individuals of a population, the information database for use with a method for the selection of at least one individual in the population, the method comprising the steps of: in the population, where information of individuals are known, using dimension reduction on the information to project the inforation to a low dimensional space whilst retaining the complexity of the information to generate a set of explanator variables; and 874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 I 31. AUG. 2007 17:56 SPRUSON FERGUSON 92615486 NO.1556 P. 26 0 0- <i enq 0- -22utilising the explanatory variables to generate a predictor function with respect utilising the predictor function to predict the merit of the individual.
to merit; 0073 ]Accordiflg to a nineteenth aspect, there is provided an information database product 5 for use with a breeding program, the database comprising information for individuals of a population and a prediction of the merit of the individuals in the population.
0074 3 The individuals of interest from the population may be selected for use in a breeding program based upon the prediction of merit for the at least one marker.
0075 According to a twentieth aspect, there is provided an information database product for use with a breeding program, the database comprising information for individuals of a population and a prediction of the merit of the individuals in the population.
0076 The prediction of a merit of the individuals in the population Is prouv!I
J-
dimension reduction method on the genotype and phenotype information of individuals in the population comprising the steps of: using a dimension reduction method, determining the complexity of genotype and phenotype information of individuals in the population to minimise prediction error and thereby generate a set of explanatory variables; applying the explanatory variables to the first population to generate a predictor function; generating genotype for the at least one marker in at least one individual of interest from a second population; applying the predictor function to the genotype of the individuals of the second population thereby to determine the genetic merit of individuals in the second population individuals with respect to the at least one marker 0077 Individuals of interest from the population may be selected for use in a breeding program based upon the prediction of merit for the at least one marker, 0078] A system or method as claimed in any of the preceding claims wherein the predictor function is a predictor function with having minimal prediction error 574124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG. 2001 11:56 SPRUSON FERGUSON 92615486 NO. 1556 P. 21 -23b~ E0079 The method of any one ore more of the first to twelfth aspects may be implemented ;Z using a computer system 1000, such as that shown in Figure 15 where in the processes of Figures IA to 1iD may be implemented as software, such as one or more application programs executable within the computer system 1000. Figure 15 is merely an example which should not unduly limit the scope of the claims. One of ordinary skill in the art wouald en recognize many variations, alternatives, and modifications. In particular the steps of method of the prediction of merit and/or selection of at least one individual of interest are effected by c-i instrutions5 in the software that are carried out within the computer system 1000. The instructions may be formed as one or more code modules, each for erforming one or more particular tasks, The software may also be divided jnto two separate parts, in which a first part and the corresponding code modules performs the prediction of merit and/or selection methods and a second part and the corresponding code modules manage a user interface between the first part and the user. The software may be stored in a computer readable medium, including the storage devices described below, for example, The software is loaded into the computer system 1000 from the computer readable medium, and then executed by the computer system 1000, A computer readable medium having such software or computer program recorded on it is a computer program product. The use of the computer program product in the computer system 1000 preferably effects an advantageous apparatus for prediction of merit and/or selection of at least one individual of interest.
0080 As seen in Figure 15, the computer system 1000 is formed by a computer module 1001, input devices such as a keyboard 1002 and a mouse pointer device 1003, and output devices including a printer 1015, a display device 1014 and loudspeakers 1017. An external ModulatorDemodulator (Modern) transceiver device 1016 may be used by the computer module 1001 for communilcatig to and from a communications network 1020 via a connection 1021, The network 1020 may be a wide-area network. (WAN), such as the Internet or a private WAN, Where the connection 1Q21 is a telephone line, the modem 1016 may be a traditional "dial-up" modem. Alternatively, where the connection 1021 is a high capacity (eg: cable) connection, the modem 1016 may be a broadband modem. A wireless modem may also be used for wireless connection to the network 1020, 874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG. 2007 17:57 SPRUSON FERGUSON 92615486 NO. 1555 P. 28 -24- 0 S0081] The computer module 1001 typically includes at least ne sor unit 1005, and a memory unit 1006 for example formed from semiconductor random access memory
(RAM)
aer of input/output and read only memory (ROM). The module 1001 also includes an number of inputloutput C interfaces including an audio-video interface 1007 that couples to the video display S 1014 and loudspeakers 1017, an 1/0 interface 1013 for the keyboard 1002 and mouse 1003 0 and optionally a joystick (not illustrated), and an interface 1008 for the external modem 1016 and printer 1015. In some implemelntations, the modem 1016 may be incorporated within the computer module 1001, for example within the interface 1008. The computer module 1001 also has a local network interface 1011 which, via a connection 023, prmooupli ofthe computer system 1000 to a local computer network 1022, known as a Local Area Network o (LAN). As also illustrated, the local network 1022 may also couple to the wide network 1020 via a connection 1024, which would typically include a so-called "firewall" device or similar functionality. The interface 1011 may be formed by an EthernetTM circuit card, a wireless BluetoothTM or an IEEE 802,21 wireless arrangement.
t 0082 The interfaces 1008 and 1013 may afford both serial and parallel connectivity, the former typically being implemented according to the Universal Serial Bus (USB) standards and having corresponding USB connectors (not illustrated). Storage devices 1009 are provided and typically include a hard disk drive (HDD) 1010. Other devices such as a floppy disk drive and a magnetic ape drie (not illustrated) may also be used. An optical disk drive 1012 is typically provided to act as a non-volatile source of data. Portable memory devices, such optical disks (eg: CD-ROM, DVD), USB-RAM, and floppy disks for example may then be used as appropriate sources of data to the system 1000.
0083] The components 1005 to 1013 of the computer module 1001 typically communicate via an interconnected bus 1004 and in a manner which results in a conventional mode of operation of the computer system 1000 known to those in the relevant art. Examples of computers on which the described arrangeme n ts can be practised include IBM-PC's and compatibles, Sun Sparcstations, Apple MacTM or alike computer systems evolved therefrom.
S0084 Typically, the application programs discussed above are resident on the hard disk drive 1010 and read and conolled in execution by the processor 1005. Intermediate storage 874124 7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG 2007 17:57 SPRUSON FERGUSON 92615486 NO. 1556 P. 29 t- 0 Sof such programs and any data fetched from the networks 1020 and 1022 may be Saccomplished using the semiconductor memory 1006, possibly in concert with the hard disk <drive 1010. In some instances, the application programs may be supplied to the user encoded on one or more CD-ROM and read via the corresponding drive 1012, or alternatively may be s read by the user from the networks 1020 or 1022. Still further, the software can also be loaded into the computer system 1000 from other computer readable media. Computer readable media refers to any storage medium that participates in providing instructions and/or data to the computer system 1000 for execution and/or processing. Examples of such media include floppy disks, magnetic tape, CD-ROM, a hard disk drive, a ROM or integrated 18 circuit, a magneto-optical disk, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computer module 1001.
Examples of computer readable transmission media that may also participate in the provision of instructions and/or data include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Interet or Intranets is including e-mail transmissions and information recorded on Websites and the like.
0085 The second part of the application programs and the correspondig code modules mentioned above may be executed to implement one or more graphical user interfaces (GUIs) to be rendered or otherwise represented upon the display 1014. Through manipulation of the keyboard 1002 and the mouse 1003, a user of the computer system 1000 and the application may manipulate the interface to provide controlling commands and/or input to the applications associated with the GUI(s).
0086 The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive BRIEF DESCRIPTION OF THE FIGURES 0087 Figure 1A is a simplified diagram showing a flow diagram of an aspect of a method for the prediction of merit of an individual; 874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31, AUG. 2007 17:58 SPRUSON FERGUSON 92615486 NO. 1556 P. S-26- 0 S[ 0088 Figure 1B is a simplified diagram showing a flow diagram of an aspect of a Smethod for selection of an individual based on genetic merit; 0089 1 Figure 1C is a simplified diagram showing a flow diagram of an aspect of a method for the prediction of merit and/or selection of at least one individual based on genetic Ss merit; e 0090] Figure 1D is a simplified diagram showing a flow diagram of an alternate aspect of a method for selection of an individual; 0091 Figure 1E is a simplified diagram showing a schematic outline of an arrangement Sof a method for obtaining a prediction for a characteristic of an individual of interest; o 0092 Figure 1F is a simplified diagram showing a schematic outline of an arrangement of a validation technique for feature (eg. SNP) selection and assessment; 0093] Figure 2 shows a graph showing molecular breeding values for kilograms of protein plotted against BLUP EBV for kilograms of protein. The MBV were weighted estimates from a genetic algorithm (GA) run modelling 500 SNP simultaneously; 1 0094 Figure 3 is a graph showing the correlation between the MBV and EBV for the bulls included in the analyses of Figure 1, on the basis of the number of SNPs fitted in the analysis; 0095 Figure 4 is a graph showig the cumulative proportion of variance accounted for by the PCs when: PCA is used, (ii) SPCA is used with 2, and (iii) SPCA is used with 8 3; 0096 Figure 5 is a series of exploratory plots of the BVs and the first 3 PCs for animals born before 1995 and 1995 or later. Plots above the diagonal are for the reduced data when PCA is used and plots below the diagonal are for the reduced data when SPCA is used, 2; [0097 Figure 6 is a simplified diagram showingschematic diagram for the propagation of the simulated population; 0098 Figures 7(a) to 7(c) are graphs showing the mean correlation between BV and simulated breeding value using Principal Component Analysis techniques, where there are 874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG. 2007 17:58 SPRUSON FERGUSON 92615486 NO. 1556 P. 31 -27- 0 f chromosomes are in the initial population, and the number of SNPs which have an additive ;effect is 10, 100 and 1000 respectively and n is the number of SNPs with an additive effect; na 10 n. 100 and nsa 1000 SNPs over 100 iterations; 0099 Figures 7(d) to 7(f) are graphs showing the mean correlation between EBV and Ssimulated breeding value using Principal Component Analysis techniques, where there are 200 chromosomes are in the initial population, and the number of SNPs which have an additive effect is 10, 100 and 1000 respectively, S[00100] Figure 8 is a graph showing the mean correlation between predicted breeding ovalue and observed breeding value for real SNP data using Principal Component Analysis 1 0 techniques for individuals separated into two subsets: those in the training set with known EBVs, and those in the test set whose BBVs are treated as unknown; [00101 1 Figures 9A and 9B are graphs showing the correlation between predicted and true breeding values of a first generation of individuals, calculated using BLUJP techniques and principal component techniques respectively; is 00102 Figures 1000A and 10B are graphs showing the correlation between predicted and true breeding values of the next generation of individuals, calculated using
BLUP
techniques and principal component techniques respectively; 00103 Figure 11 is a simplified diagram showing an example of the effect of prediction bias in SNP selection; 2 00104] Figures 12A and 12B show the SNP weight distribution VIM values) using an arrangement of the second feature selection methods; 00105 Figures 13A and 13B show examples of the results from the SNP selection process; (00106] Figures 14A to 14D show comparative examples of the correlation between
MBV
and EBV for the PLS and SVM methods of dimension reduction; and 00107] Figure 15 shows a schematic depiction of an example apparatus for the implementation of the methods for prediction of merit and/or selection of at least one individual of interest as described herein.
874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 3'.AUG,2007 17:53 SPRUSON FERGUSON 92615486 NO. 1556 P, 32 en 0 c 0 0 (c -28- DETAILED
DESCRIPTION
[00108 Definitions [00109 In the claims of this application and in the description of the invention, except where the context requires otherwise due to express language or necessar impcation, the 5 word "comprise" or variations such as "comprises" or "comprising" is used in an inclusive sense, i.e. to specify the presence of the stated features but not to preclude the presence or addition of further features in various embodiments of the invention. As used herein, the singular forms and "the" include the corresponding plural reference unless the context clearly dictates otherwise. Thus, for example, a reference to "a marker" includes a to plurality of such markers, and a reference to "a SN" is a reference to one or more SNs.
[00110 1 It is to be clearly understood that this invention is not limited to the particular materials and methods described herein, as these may vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and it is not intended to limit the scope of the present invention, whioh will be limited only by the appended claims.
(00111] Unless defined otherwise, all technical and scientific terms used herein have the same eaning as commonly understood by one of ordinary skill in the art to which thi invention belongs. Although any materials and methods similar or equivalent to those described herein can be used to practise or test the present invention, the preferred materials and methods are described.
[00112 Where a rge of alue is expessed it will be clearly understood that this range encompasses the upper and lower limits of the range, and all values in between these limits.
[00113] The term "ADHIS" relates to the Australian Dairy Herd Improvement Scheme.
[00114] The term "Advanced Phenotypic Value" (APV) refers to a combination of two or more phenotypic measures that are used together in an appropriate analysis to provide a prediction of the value of a specific individual for a specific end-use, such as the production of a specific component of milk.
874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31, AUG. 2007 17:59 SPRUSON FERGUSON 92615486 NO. 1556 P. 33 29 0 S[0100] The term "Advanced Phenotypic and Genotypic Value" (APGV) refers to a Scombination of the APV above with additional information such as the predicted genetic Smerit of the said individual for the trait in question.
S0101 The terms "animal", "subject" and "individual" are used interchangeably to refer to o an individual at any stage of life, or after death. This includes an entity prior to birth such as a fertilised ovum, either before fusion of the male and female pro-nucleus or after the pro- "nuclei have fused to form a zygote, an embryo created by any means, including in vitro fertilization or somatic cell nuclear transfer or an individual cell of haploid diploid (2N) Sor greater ploidy. This term also includes a cell or a cluster of cells, including sem cells ad stem cell-like cells and cell lines derived therefrom, haploid gametes, and products resulting from the gametes, including embryos.
0102 The term "allele" or "allelic" or "marker variant" refers to variation present at a defined position within a marker or specific marker sequence, in the case of a SNP this is the actual nucleotide which is present; for a SSR, it is the number of repeat sequences; for a is peptide sequence, it is the actual amino acid present (see bio-marker); in the case of a marker haplotyp, it is the combination of two or more individul marker variants i a specific combination (see haplotype). An "associated allele" refers to an allele at a polymorphic locus which is associated with a particular phenotype of interest, e.g. a characteristic used in assessment of livestock, a predisposition to a disorder or a particular drug response.
0103 The term "base pair" means a pair of nitrogenous bases, each in a separate nucleotide, in which each base is present on a separate strand of DNA and the bonding of these bases joins the component DNA strands. Typically a DNA molecule contains four bases; A (adenine), G (guanine), C (cytosine), and T (thymidine).
S0104 The term "bio-marker" refers to a biological or physical characteristic at molecular, cellular or whole organism level to describe phenotype or physiological state of an individual as a diagnostic application of current state at time of measurement in response to stress, disease, injury, enappvironent, age, drug treatment, or other stimulus or factor), or a prognostic tool to predict future most likely performance/health status of an individual. For example, the bio-marker may be an epigenetio modification.
874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31, AUG. 2007 17:59 SPRUSON FERGUSON 92615486 NO. 1556 P. 34 0 [0105] The term "Best Linear Unbiased Prediction" (BLUP) refers to a statistical technique which is widely used to provide prediction of genetic merit, such as estimated <breeding value (EBV) The BLUP method was originally described in Henderson C.R. (1973) Sire Evaluation and Genetic Trends. in Proc. Anim. Breed. Genet. Symp. In honor of Dr. J. L.
Lush Am. Soc. Anim. Sci. and Am. Dairy Sci. Assoc. Champaign, Illinois, 10-41.
IN[ 0106 The term "Breeding Value" (BV) or "Estimated Breeding Value" (EBV) refers to any prediction of the genetic merit of an individual on the basis of phenotypic observations and quantitative genetic theory.
[0107 The term "cetiMorgan" (cM) refers to the genetic distance between two loci; for Sto example the genetic distance between two loci is 1 cM if their statistically-adjusted recombination frequency is the genetic distance in cM is numerically equal to the recombination frequency (adjusted for double crossovers, interference, etc.) expressed as a percentage. Typically in mammals, a genetic distance of 1 cM can be regarded as corresponding to a physical distance of roughly one million base pairs, although this vaies is both between species and within the genome of an individual. However, map distance is equivalent to recombination rate only for very closely-linked loci.
[0108 The term "companion animal" refers to animals which are commonly domesticated by people and used as pets or for companionship. This includes dogs and cats, but may also include more exotic pets such as various fish, reptiles, birds, horses, rabbits, hamsters, gerbils, mice, rats and the like.
[0109 The term "epigenetic" refers to a mechanism which changes the phenotype without altering the genotype. Epigenetic changes involve mitotically heritable changes in DNA other than changes in nucleotide sequence. Genetic information provides the blueprint for the manufacture of all the proteins necessary to create a living organism, whereas epigenetic infonnation provides additional instructions on how, where, and when the genetic information will be used. Epigenetic controls can become dysregulated in cancer cells. Such dysregulation can affect a variety of gene types, including tumour suppressor genes, oncogenes, and cancerassociated viral genes, all of which are subject to regulation by epigenetic mechanisms. A key component of epigenetic information in mammalian and other cells is DNA methylation, 874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31 AUG. 2007 17:59 SPRUSON FERGUSON 92615486 NO. 1556 P. -31- 0 mostly in the promoter region. For example, tunour suppressor genes are inactivated by hd hlock n t pbtethylation. Epigeneti- markers for Shypermethylation, whereas oncogenes are activated by methylation. p igenti markers forand bladder, colon, cervical, head and neck, lung, and prostate cancer have been identified, and can be used for early detection and risk assessment of cancer. Microarray technology such as s MethylScopeTM (described in US patent publication No. 20040132048; available from Orion SGenomics, St Louis, Missouri)) can be used to detect DNA methylation. Other epigenetic phenomena are known, including genomic imprinting in placental mammals and X- Schromosome dosage compensation, post-transcriptional gene silencing (PTGS) or RNA Sinterference and transcriptional gene silencing (TGS) seen in plants, and RNA-mediated silencing.
01101 The term "Epistasis" is the interaction between genes at different loci, and an epistatic variation a variation arising from epistasis.
S0111 The term "information" refers to information which is indicative of, or potentially indicative of genetic differences between individuals in the population The information is is represented by the different types of data sets, such as sex, age SNPs, genotypes and haplotypes, used in the generation of the explanatory variables as defined below and a predictor ftnction or functions. The informnnation is generally parameters which can be measured in a population, and may vary independently, or may vary according to the sex and age of the individual, [0112 The term "explanatory variables" refers to either products of a dimension reduction process or algorithm, for example latent components in a PLS analysis or principle components in a PCA analysis, or assigned weights or products of a genetic algorithm process.
10113 3 The term "fitness" refers to an evolutionary measure, and relates to how many descendants an individual leaves in the next generations. Fitter individuals contribute more than less fit ones. Fitness in the genetic algorithm is the relative measure of the functions.
[0114 The term "genetic algorithm" refers to a class of function optimisation algorithms.
Genetic algorithms are search algorithms that are based on natural selection and genetics.
Generally speaing, they cobine the concept of survival of the fittest with a randomized 874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG. 2007 18:00 SPRUSON FERGUSON 92615486 NO. 1556 P. 36 -32- 0 O) exchange of information. In each genetic algorithm generation there is a population composed Sof individuals. Those individuals can be seen as candidate solutions to the problem being Ssolved. In each successive generation, a new set of individuals is created usg portions of the fittest of the previous generation. However, randomized new informnation is also occasionally s included so that important data are not lost and overlooked. A basic characteristic of a genetic Salgorithm is that it defines possible solutions to a problem in terms of individuals in a population.
o0 [0115] The term "genetic merit" reflects the genetic or breeding worth of an individual 0 with respect to its own performance, and is based on the cumulative effects of all relevant to gene/genetic variants within its genome or as an assessment of the ability of the individual.to transmit its genetic superiority or inferiority to its progeny/descendants.
S0116 J The term "genotype" refers to the genetic constitution of an organism. ay be considered in total, or with respect to the alleles of a single gene, i.e. at a given genetic locus.
[0117) The term "haplotype" refers to a specific set or specific combination of markers at two or more markers or sites within a DNA sequence inherited together from the same individual. A haplotype may be a grouping of two or more SNPs which are physically present on the same chromosome, and which tend to be inherited together except when recombination occurs. The haplotype provides information regarding an allele of the gene, regulatory regions or other genetic sequences affecting a trait. The linkage disequilibriu and, thus, association of a SNP or a haplotype allele(s) and a trait can be strong enough to be detected using simple genetic approaches, or can require more sophisticated statistical approaches to be identified.
[0118] Some embodiments are based, in part, on a determination that SNPs, including haploid or diploid SNPs, and haplotype alleles, including haploid or diploid haplotype alleles, allow an inference to be drawn as to the trait of a subject, particularly a livestook subject.
Accordingly, the methods can involve determining the nucleotide occurrence of at least 2, 3, 4, 5, 10, 20, 30, 40, 50, or more. SNPs. The SNPs can form all or part of a haploytpe, wherein the method can identify a haplotype allele which is associated with the trait., Furthermore, the method can include identifying a diploid pair of haplotype alleles.
874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31, AUG. 2007 18:00 SPRUSON FERGUSON 92615486 NO. 1555 P. 37 -33- 0 0 os (0119 Numerous methods for identifying haplotype alleles in nucleic acid samples are known in the art. In general, nucleic acid occurrences for the individual SNPs are determined, and then combined to identify haplotype alleles. The Stephens and Donnelly algorithm (Am.
SJ. Hum. Genet. 68: 978-989, 2001, which is incorporated herein by reference) can be applied to the data generated regarding individual nucleotide occurrences in SN markers of the subject, in order to determine alleles for each haplotype in a subjet's geotype. Othe methods can be used to determine alleles for each haplotype in the subject's genotype, for example Clark's algorithm, and an EM algorithm described by Raymond and Rousset (Raymond et a. 1994. GenePop. Vr 3.0. Institut des Sciences de l'Evolution Universite de M to Montpellier, France. 1994).
Sin h ohih different alleles are found 0120 The term "heterozygote" refers to an organism in which different alleles are found at a given locus on homologous chromosomes.
[01211 The term h oygot refers to an organism which has identical alleles at a given 0121 Theterm "homozygote refers to anoroasm locus on homologous chromosomes.
01221 The term "IBISS" refers to the Interactive Bovine In Silico SNP database
(CSIRO
Livestock Industries; vWw.livestockgenomies.esiro.au [0123 The term infer" or "inferring", when used in reference to a trait, means drawing a conclusion about a trait using a process of analyzing, individually or in combilation, nucleotide occurrence(s) of one or more SNP(s), which can be part of one or more haplotypes, in a nucleic acid sample of the subject, and comparing the individual nucleotide occurrence(s) of the SNP(s), or combination thereof, to known relationships of nucleotide occurrence(s) of the SNP(s) and the trait. As disclosed herein, the nucleotide occurrence(s) can be identified directly by examining nucleic acid molecules, or indirectly by examining a polypeptide encoded by a particular genomic where the polymorphism is associated with an amino acid change in the encoded polypeptide.
[0124 The term "introgression" means the process of taking a gone from one population and introducing it to another, and then increasing its frequency in the new population.
0125 The term "low dimensional space" refers to, or a database of information with many variables or unknowns, a low dimensional space refers to a subset of the information 874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31.AUG.2007 18:01 SPRUSON FERGUSON 92615486 NO. 1556 P. 38 0 0O
CN
0 t'rcl -34database with a reduced number of variables or unknowns, however, the low dimesional space retains substantiallY all the information or substantially all the relationships between the information in the information database, [0126 The tern "marker" refers to an identifiable DNA sequence which is variable s (polymorphic) for different individuals within a population, and facilitates the study of inheritance of a trait or a gene. A marker at the DNA sequence level is linked to a specific chromosomal location unique to an individual's genotype and inherited in a predictable manner, and may be measured directly as a DNA sequence polymorphism, such as a single nucleotide polymorphism (SNP), restriction fragment length polymorphism (RFLP) or short 10 tandem repeat (STR), or indirectly as a DNA sequence variant, such as a single-strand conformation polymorphism (SSCP). A marker can also be a variant at the level of a DNAderived product, such as an RNA polymorphismlabundance, a protein polymorphism or a cell metabolite polymorphism, or any other biological characteristic which has a direct relationship with the underlying DNA variant or gene product.
1 1 0127) The term "merit" encompasses at least merit, of which genetic merit is but one type, fitness for purpose; susceptibility and/or predisposition to an outcome such as a disease.
[01281 The term "minimal prediction error" refers to maximising the accuracy of a prediction for example in terms of the of dviation of a true value to a predicted value, 0129 The term "Molecular Breeding Value" (MBV) refers to an estimate of breeding value or genetic merit obtained from marker information, especially for DNA-based markers, but not restricted to DNA-based markers, for example the predicted performance derived using marker information with or without auxiliary information such as pedigree and estimated breeding values from relatives.
2 [0130 The term "phenotype" refers to any visible, detectable or otherwise measurable property of an organism, such as protein content of milk produced by a dairy cow, or symptoms of, or susceptibility to, a disorder.
874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 AUG, 2007 18:01 SFRUSON FERGUSON 92615486 N O 1 5 5'6 P 3 9 I 0131 The term "polygenic breeding value" refers to an EBV arising from a genetic ;evaluation in which the effects of large numbers of genes, each of which has a small effect, axe analysed as a single joint effect.
0132 The term "polymorphism" refers to the presence in a population of two or more o 5 allelic variants. Such allelic variants include sequence variation at a single base, for example a asingle nucleoide polymorphism (SNP). A polymorphism can be a single nuclootide difference present at a locus, or can be an insertion or deletion of one, a few or many consecutive nucleotides. It will be recognized that while the methods of the invention are oexemplified primarily by the detection of SNPs, these methods or others known in the art can 10 similarly be used to identify other types of polymorphisms, which typically involve more than one nucleotide.
[01333 The term "primer" refers to a single-stranded oligonucleotide capable of acting as a point of initiation of template-directed DNA synthesis. An "oligonuclootide" is a singlestranded nucleic acid, typically ranging in length from 2 to about 500 bases. The precise 1s length of a primer will vary according to the particular application, but typically ranges from to 30 nucleotides. A primer need not reflect the exact sequence of the template, but must be sufficiently complementary to hybridize to the template.
[0134] The tern "predictor function" refers to the matrix of coefficients which have been established for each of the marker variants in the training population. The coefficients essentially represent the relationships between the marker variants alleles) and the variation observed in the trait. To utilize the relationship, it is necessary to identify and use a marker which has a defined relationship to the coefficient.
[0135 1 The term "quantitative trait" refers to a phenotypic characteristic which varies in degree, and can be attributed to the interactions between two or more genes and their environment (also called polygenic inheritance).
0136 The term "quantitative trait locus (QTL)" refers to stretches of DNA which are closely linked to the genes which underlie the quantitative trait in question. QTLs can be identified by methods such as PCR to help map regions of the genome which contain genes involved in specifying a quantitative trait. This can be an early step in identifying and 9743247 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31.AUG;.2007 18:01 SPRUSON FERGUSON 92615486 NO. 15 56 P. -36- 0 sequencing these genes. A QTL affects a quantitative trait incompletely. Eye colour in Shumans is a qualitative trait, and the locus provides the complete effect, whereas fat yield is a quantitative trait which is affected by many loci, all of which could be considered QTL, but cn most of which would be too small to locate.
0137 The term "Quantitative Trait Nucleotide" (QTN) refers to the actual variant which IN is responsible for the defined variation in a trait of interest.
0138] The term "sampling" refers to choosing individual items from a larger set of items.
c Sampling may be random or non-random, or may be performed on the basis of a rule. The sampling may be conducted on the basis of a desired outcome, such as an improvement in a to trait.
[0139] The term "single nucleotide polymorphism" (SNP) refers to common
DNA
sequence variations among individuals. The DNA sequence variation is typically a single base change or point mutation which results in genetic variation between individuals. The single base change can be an insertion or deletion of a base. Thus a SNP is characterized by the presence in a population of one or two, three or four nucleotides, typically less than all four nucleotides, at a particular locus in a genome.
S0140 A "trait" is a characteristic of an organism which manifests itself in a phenotype, and refers to a biological, performance or any other measurable characteristic(s), which can be any entity which can be quantified in, or from, a biological sample or organism, which can then be used either alone or in combination with one or more other quantified entities. Many traits are the result of the expression of a single gene, but some are polygenic, i.e. result from simultaneous expression of more than one gene. A "phenotype" is an outward appearance or other visible characteristic of an organism, Many different traits can be inerred by the methods disclosed herein. For any trait, a "relatively high" characteristic indicates greater than average, and a "relatively low" characteristic indicates less than average. For example "relatively high marbling" indicates moe abundant marbling in meat than average marbling for a bovine population. Conversely, "relatively low marbling" indicates less abundant marbling than average marbling for a bovine population. Furthermore, in certain aspects, methods of the present invention infer that a bovine subject has a significant likelihood of 814124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31 AUG. 2007 18:02 SPRUSON FERGUSON 92615486 NO. 1556 P. 41 -37- 0 having a value for a trait which is within the 5th, 10th, 20th, 25th, 30th, 40th, 50th, 60th, 80th, 90th, or 95th percentile of bovine subjects for a given trait.
0141 "Trait performance" is a phenotypic measure, such as milk yield, or a phenotypic score in the case of type traits.
o 5 [0142 The term "tag SNP" refers to a representative single nucleotide polymorpisms S(SNPs) in a region of the genome with high linkage disequilibrium.
d herein have the meanings commonly [0143 Technical and scientific terms used herein have the eanngs cn l understood by one of rdinary kill in the art to which the present invention pertains, unless nderstoodbyone of odin sta idhll ein te aro u methodologies known to those of skill 0- otherwise defined. Reference is made herein to various methodologies known to those of skill lo in the art. Publications and other materials setting forth such known methodologies to which reference is made are incorporated herein by reference in their entireties as though set forth in fll. Standard reference works setting forth the general principles of recombinant
DNA
technology include J. Sambrook et al., 1989, Molecular Cloning: A Laboratory Manual, 2d Ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY; P. B. Kaufman et al., (eds), 1995, Handbook of Molecular and Cellular Methods in Biology and Medicine,
CRC
Press, Boca Raton; MJ. McPherson 1991 Directed Mutagenesis. A practical Approach, IRL Press, Oxford; J. Jones, 1992, Amino Acid and Peptide Synthesis, Oxford Science ublications, Oxford; B. M.o Austen and 0 R. Westwood, 1991 Protein Targeting and Publications, Oxford, B. M. Austen and VolM. ume I a 11; M.J Secretion, IRL Press, Oxford; D.N Glover 1985, DNA Cloning, Volumes I and 11; M.J Gait 1984, Oligonucleotide Synthesis; B. D. Hmes and S Higgins (eds), 194, Nucleic Acid Hybridization; Quirke and Taylor (eds), 1991 PCR-A Practical Approach; Harries and Higgins (eds), 1984, Transcription and Translatio; R.I. Freshney 1986, Animal Cell Cultue; Immobilized Cells and Enzymes, 1986, IRL Press; Peral, 1984, A Practical Guide to Molecular Cloning, J. H. Miller and M. P. Calos (eds), 1987, Gene zs Transfer Vectors for Mammalian Cells, Cold Spring Harbor Laboratory Press; M.J. Bishop 1998, Guide to Human Genome Computing, 2d Ed., Academic Press, San Diego,
CA;
L.F. Peruski and A.H. Peuski, 1997, The Internet and the New Biology. Tools for Genomic and Molecular Research, American Society for Microbiology, Washington, D. C. Standard reference works setting forth the general principles of immunology include S. Sell, 1996, 874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG. 2007 18:02 SPRUSON FERGUSON 92615486 NO. 1556 P. 42 -38- SLange, Nowal, CT and A,K Abbs e a l .,1991 Cellular and Molecular mun W. B. Saunders Philadelphia,
PA.
144] Any suitable materials andr methods nown to those of skill in the art can be utild 1unol arlm mopautthe l senog i enimon;it h ppleton Lange, Stand/or Cmth; Male desbed. Mat, vaneo t ad the like to which referene is made in ihe following SLondon; D. P. Stites and AL Tene, 1991, Basic and Clinial Imunology, 7th Ed., Appleton O description and examples Abbas generall obtainab, 1991, Cle from commercial sungy, 0 0145 Te methods of the invention identify animals whicaunderh have supeCo.,or hiladels, prphia,
PA.
y a014 c Any suitable matnbe us nd/to identifys knon tsof ths e nex t generati n beo setiized in carrying out the presentThe invention vides methor dwevermining the optimum male and female arent described. Materials, eagents, and the genetic components f dominance and epistasis, thus maximizing Sheersis and hybrid vigour in the progeny animals S0145 The methods of the invention identify aockimals which have sperianimals traits, predicted 1 6 is to ascertain the genetic S0147] An objective of any genetic improveme n t p rogram is at as erain the ge potential of individuals for a broad range of economically important traits at a very early age.
While the classical breeding approach has produced steady genetic improvement in livestock species, it is limited by the fact that accurate prediction of an individual's genetic potential can only be achieved when the animal reaches adulthood (fertility and production traits), is harvested (meat quality traits), or commences training or racing (performance traits). This is particularly problemati for meat animals, since harvested animals obviously canno t enter the breeding pool Furematinore, it is difficult to utilize the classical breeding approach for traits which are difficult or costly to measure, such as disease resistance and meat tenderness respectively.
S0148 in some aspects, the invention provides methods which use analysis of livestock genetic variation to improve the genetics of the population to produce animals with consistent desirable characteristics, such as animals which yield a high percentage of lean meat and a low percentage of fat efficiently. Thus the invention provides a method for selection and 8741247 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31, AUG. 2007 18:02 SPRUSON FERGUSON 92615486 NO. 1556 P. 43 -39breeding of livestock subjets for a trait. The method includes inferring the genetic potential for a trait or a series of traits in a group of livestock candidates for use in breeding programs __from a nucleic acid sample of the livestock candidates. The inference is made by a method Swhich includes identifying the nucleotide occurrence of at least one SNP, wherein the s nucleotide occurrence is associated with the trait or traits. Individuals are then selected from Othe group of candidates with a desired performance for the trait or traits for use in breeding programs. Progeny resulting from mating of selected parents would contain the optimum combination of traits, thus creating an enduring genetic pattern and line of animals wit specific traits. These premium lines may be monitored for purity using the original SNP markers, which may be used to identify them from the entire population oflivestock and protect them from genetic theft.
S0149 Under the current standards established by the United States Department of Agriculture (USDA), beef from bulls, steers, and heifers is classified into eight different quality grades. Beginning with the highest ,d continuing to the lowest, the eight quality grades are prime, choice, select, standard, commercial, utility, cutter and canner, The characteristics which are used to classify beef include age, colour, texture, firmness, and marbling, a term which is used to describe the relative amount of intramuscular fat of the beef. Well-marbled beef from bulls, steers, and heifers, beef which contains substantial amounts of intramuscular fat relative to muscle, tends to be classified as prime or choice; whereas, beef which is not marbled tends to be classified as select. Beef of a higher quality grade is typically sold at higher prices than a lower grade beef. For example, beef which is classified as "prime" or "choice," typically, is sold at higher prices than beef which is classified into the lower quality grades.
[0150 Classification of beef into different quality grades occurs at the packing facility and involves visual inspection of the ribeye on a beef carcass which has been cut between the 12th and 13th rib prior to grading. However, the visual appraisal of a beef carcass cannot occur until the animal is harvested. Ultrasound can be used to give an indicatio of marbling prior to slaughter, but accuracy is low if ultrasound is done at a time significantly prior to harvest.
874124,7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 3i. AUG., 2007 18:03 SPRUSON FERGUSON 92615486 NO. 1556 P. 44 0 1) 0151 Another characteristic of beef which is desired by consumers is tenderness of the Scooked product. Currently there are no procedures for identifying live animals whose beef <would be tender if cooked properly. Currently there are two types of procedures which are Cused by researchers to assess the tenderness of meat samples after they have been aged and subsequently cooked. The first involves a subjective analysis by a panel of trained testers. The Ssecond type is characterized by methods used to cut or shear meat samples which have been en removed fi-romn an animal and aged. One such method is the Wamer-Bratzler shear force c, procedure which involves an instrumental measurement of the force required to shear core samples of whole muscle after cooking. Neither of these procedures can be used to any practical effect in a fabrication setting as the need to age product prior to testing would lead to maintenance of inventory of fabricated product which would be cost prohibitive.
Consequently, the methods are used at research facilities but not at packing plants.
Accordingly, it is desirable to have new methods wvhich can be used to identify carcasses and live cattle which have the potential to provide beef which will be tender if cooked properly.
is 01521 Currently there are no cost-effective methods for identifying live cattle which give accurate prediction of the genetic potential to produce beef which is well-marbled. Such information could be used by feedlot operators to identify animals for purchase prior to finishing, to identify animals under contract for one or more premium programs administered by a packer, by feedlot managers to make management decisions regarding individual animals within a lot (including nutrition programs and sale dates), by cow-calf producers in marketing their animals to various feedlots or in making decisions regarding which animals will be sold on various carcass evaluation grids. Such information could also be used to identify cattle which are good candidates for breeding. Thus it is desirable to have a method which can be used to asses the beef marbling potential of live cattle, particularly young cattle well in advance of the arrival of the animal at the packing house.
0153 Feedlots in the United States generally contain pens which typically have a capacity of about 200 animals, and market to packers, pens of cattle which are fed to an average endpoint. The endpoint is calculated as a number of days on feed estimated from biological type, sex, weight, and frame score. Animals are initially sorted to a pen based on the estimated number of days on feed and incoming group. However, sorting is done by a 874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 3, AUG. 2007 18:03 SPRUS N FERGUSON 92615486 NO. 1556 F. -41 0 Oi) series of subjective and suboptimal parameters, as discussed herein. The cattle are fed to an ;endpoint in order to maximize the percentage of animals rn which Grade USDA Choice, beef can be obtained at slaughter without developing cattle which are too fbt, and thus are discounted for insufficient red meat yield. The present invention provides a method for maximizing a physical characteristic of a bovine subject, including optimizing the percentage of bovine subjects which produce Grade USDA Choice and Prime beef in the most efficient manner c[0154) While many visual and automated methods of measurement and selection of cattle 0oin feedlots have been tried, such as ultrasound, none has been successful in accomplishing C1 10 the desired end result, namely the ability to identify and select cattle with superior genetic potential for desirable characteristics, and then manage a given animal with known genetic potential for shipment at the optimum time, considering the animal's condition, performance and market factors, the ability to grow the animal to its optimum irdividual potential of physical and economic performance, and the ability to record and preserve each animal's is perfonnance history in the feedlot and carcass data from the packing plant for use in cultivating and managing current and future animals for meat production. The beef industry is extremely concerned with its decreasing market share relative to pork and poultry. However, to date it has been unable to devise a system or method to accomplish on a large scale what is needed to manage the current diversity of cattle least about 100 different breeds and comingled breeds) to improve the beef product quality and uniformity fast enough to remain competitive in the race for the consumer dollar spent on meat.
(0155] Beef cattle traits vhich may be analyzed include, but are not limited to, marbling, tenderness, quality grade, quality yield, muscle content, fat thickness, feed efficiency, red meat yield, average daily weight gain, disease resistance, disease susceptibility, feed intake, protein content, bone content, maintenance energy requirement, mature size, amino acid profile, fatty acid profile, milk production, hide quality, susceptibility to the buller syndrome, stress susceptibility and response, temperament, digestive capacity, production of calpain, calpastatin and myostatin, pattern of fat deposition, ribeye area, fertility, ovulation rate, conception rate, fertility, heat tolerance, enviroumental adaptability, robustness, susceptibility to infection with and shedding of pathogens such as B, celi, Salmonella or Listefia species.
874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG. 2007 18:04 SPRUSON FERGUSON 92615486 NO. 1556 P. 46 -42ch combine geetics for ed Meat 031) [0156 It has been difficult for the livestock industry to o e ee for ed ea yield and marbling and/or tenderness. In fact, conventional measurement techniques indicate that marbling and red meat yield tend to be antagonistic, Hence, there is a need for tools which identify superior genetic potential for the combination of red meat yield, tenderness and marbling. Another trait of interest is live cattle growth rate (average daily gain), Currently Ve cattle producers do not have tools to identify animals with superior genetic potential for rapid l prou ooethods currently available to identify C growth prior to purchase. In addition, there are no methods curr animals which combine capability for superior grwth rate ith dsirable cca O- characteristics S [0157 The invention further prvides methods for seleting a given animal for shipment at 5the optimum time, considering the animal's genetic potential, performance and market factors, the ability to grow mthe nimal to its optimum individual potential of physical and economic the ability to grow the animal to s op animal's performance istory in the performance, and the ability to record and preserve each anim ng and hmangiin cu feedlot and carcass data from the packing plant for use n cultvatng and managng current and future animals for meat production. These methods allow management of the current diversity of cattle to improve beef product quality and uniformity, thus improving revenue generated from beef sales.have supeior traits which [0158 The invention allows the identification of amals which have superior traits which can be used to identify parents of the next generation through selection. These methods can be imposedat the nucleus or elite breeding level ere the improved aits would, through time, flow to the entire population of animals, or could be implemented at the mutiplier or foundation parent level to sort parentsinto most geneticall desirable. The optimum male and female parent can then be identified to maxi e genn l, tpris and hybrid vigour in the market animals 0eistasis, thu s me s ad of the invention are particularly well suited for S 01591 The methods and systemsof dairy or beef breeds. They allow for the managing, selecting or mating bovine subjects o f dividu. Theya an llow fo thse ability to identify and monitor key characteristics of individual animals and manage those individual animals to maximize thei individual potential perfositiorce and milks provided herein allow edible meat value. Therefore, the methods, systems, and compositi provided herein allow 874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG 2001 18:04 SPRUSON FERGUSON 92615486 NO. 1556 P. 47 -43 0 the identification and selection of cattle with superior genetic potential for desirable ;characteristicS.
e10160 In certain embodiments, the subject is a member of a cattle breed used in beef production, such as Angus, Charolais, Limosin, Hereford, Brahman, Simmental or Gelbvieh.
oThe methods and systems Of the present invention are especially well-suited for eimplementation in a feedlot environment. They allow for the ability to identify and monitor key characteristics of individual animals and manage those individual animals to maximize their individual potential performance and edible meat value. Furthermore, the invention oprovides systems for collecting, recording and storing such data by individual animal to identification so that it is usable to improve future animals bred by the producer and managed by the feedlot. The systems can utilize computer models to analyze information regarding nucleotide occurrences of S1,11s and their association with traits, to predict an economic value for a bovine subject.
[0161) In certain aspects, the method firther includes managing at least one of food intake, diet composition, administration of feed additives or pharmacological treatments such as vaccines, antibiotics, hormones and other metabolic modifiers, age and weight at which diet changes or pharmacological treatments are imposed, days fed specific diets, castration, feeding methods and management, imposition of internal or external measurements and environment of the bovine subject based on the inferred trait. This management results in improved, and in some examples, a maximization of physical characteristic of a bovine subject, for example to obtain a maximum amount of high grade beef from a bovine subject, and/or to increase the chances of obtaining grade USDA Choice or Prime beef, optimize tenderness, end/or maximize retail yield from the bovine subject taking into account the inputs required to reach those endpoints.
0162] The method canbe used to discriminate among those animals where interventions such as growth implants or vitamin E could provide the greatest value. For example, animals which do not have the traits to reach high choice or prime quality grades may be given growth implants until the end of the feeding period, thus maximizing feed efficiency while animals 814124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31, AUG, 2007 18:04 SPRUSON FERGUSON 92615486 N O 15 5 6 P 4 8 0q 0 ci en -44with a propensity to marble may not be implanted at the final stages of the feeding period to ensure maximum fat deposition intramuscularly.
0163 The method also allows a feedlot and processor to predict the quality and yield grades of cattle in the system to optimize marketing of the fed animal or the product to meet target market specification. The method also provides information to the feedlot for purchase decisions based on the predicted economic returns from a specific supplier. Furthermore, the method allows the creation of integrated programs spanning breeders producers, feedlots, packers and retailers, [0164] Examples of feed additives used in the United States in beef production include to antibiotics, flavours and metabolic modifiers. Information from SNPs could influence use of these additives and other pharmacological treatments, depending on cattle genetic potential and stage of growth relative to expected carcass composition. Examples of feeding methods include ad libitum versus restricted feeding, feeding in confined or non-confined conditions and number of feedings per day, Information from S*NPs relative to cattle health, immune is status or stress response could be used to influence choice of optimum feeding methods for individual cattle. These methods allow management of the current diversity of cattle to improve the beef product quality and uniformity, thus improving revenue generated from beef sales.
[01651 In another embodiment, methods are provided for selecting a given animal for 2o shipment at the optimum time, considering the animal's condition, performance and market factors, the ability to grow the animal to its optimum individual potential of physical and economic performance, and the ability to record and preserve each animal's performance history in the feedlot and carcass data from the packing plant for use in cultivating and managing current and future animals for meat production, [0166 Similar problems to those experienced with beef cattle and dairy cattle have been encountered with other livestock animals, such as pigs and poultry, which are intensively farmed.
[0167 In some embodiments the subject is a pig. In these embodiments, the trait can be age at puberty, reproductive potential, number of pigs farrowed alive, birth weight of pigs 8741247 COMS I0 No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 3'.AUG-2007 18:05 SPRUSON FERGUSON 92615486 ,NO. 15 5,6 P. 49 (13 0O 0 c) en 0N 0 cxl farrowed, longevity, weight of subject at a target time point, number of pigs weaned, percent of pigs weaned, pigs marketed/sow/year, average weaning weight of pigs, rate of gain, days to a target weight, meat quality, feed efficiency, manure characteristic, muscle content, fat content (leanness), disease resistance, disease susceptibility, feed intake, protein content, bone content, maintenance energy requirement, mature size, amino acid profile, fatty acid profile, stress susceptibility and response, digestive capacity, production of ealpain, calpastatin activity and myostatin activity, pattern of fat deposition, fertility, ovulation rate, optimal diet, or conception rate. Manure characteristics include quantity, organic matter, plant nutrients, or salts.
to s 0168] In certain embodiments, the subject is a bird or avian species. For example, the bird or avian species can be a chicken or a turkey. In these embodiments, the trait can be egg producion, feed efficiency, livability, meat yield, logevity, white meat yield, dark meat yield, disease resistance, disease susceptibility, optimal diet time to maturity, time to a target weight, weight at a target timepoint, average daily weight gain, meat quality, muscle content, fat content, feed intake, protein content, bone content, maintenance energy requirement mature size, amino acid profile, fatty acid profile, stress susceptibility and response, digestive capacity, production of calpain, calpastatin activity and myosttin activity, pattern of fat deposition, fertility, ovulation rate, or conception rate. In one embodiment, the trait is resistance to Salmonella infection, ascites, and Listeria infection.
[0169 1 The egg characteristic can be quality, size, shape, shelf-life, freshness, cholesterbl content, colour, biotin content, calcium content, shell quality, yolk colour, lecithin content, number of yolks, yolk content, white content, vitamin content, vitamin content, nutrient density, protein content, albumen content, protein quality, avidin content, fat content, saturated fat content, unsaturated fat content, interior egg quality, number of blood spots, air cell size, grade, a bloom characteristic, chalaza prevalence or appearance, ease of peeling, likelihood of being a restricted egg, or Salmonella content.
[0170] Methods according to the invention can be used to infer more than one trait. For example a method of the present invention can be used to infer a series of traits. As used herein, a phenotype and a trait may be used interchangeablY in some instances. Accordingly, $74124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG. 2007 18:05 SPRUSON FERGUSON 92615486 NO. 1556 P. -46- 0 0a method of the present invention can infer, for example, quality grade, muscle content, and feed efficiency. This inference can be made using one SNP or a series of SNPs. Thus, a single e SNP can be used to infer multiple traits; multiple SNPs can be used to infer multiple traits; or t c n a single SNP can be used to infer a single trait, o 0171] In another aspect, the invention provides a method for improving profits related to Vae selling meat from a livestock subject. The method includes drawing an inference regarding a trait of the livestock subject from a nucleic acid sample of the livestock subject. The method cis typically performed by a method which includes identifying a nucleotide occurrence for at oleast SNP, wherein the nucleotide occurrence is associated with the trait, and wherein the trait '0 affects the value of the animal or its products. Furthermore, the method includes managing at least one of food intake, diet composition, administration of feed additives or pharmacological treatments such as vaccines, antibiotics, hormones and other metabolic modifiers, age and weight at which diet changes or pharmacological treatments are imposed, days fed specific diets, castration, feeding methods and management, imposition of internal or is external measurements and environment of the livestock subject based on the inferred trait.
Then at least one livestock commercial product, typically meat or milk, is obtained from the livestock subject.
[0172] Methods according to this aspect of the invention can utilize a bioeconomic model, such as a model which estimates the net value of one or more livestock subjects on the basis of one or more traits. By this method, one trait or a series of traits are inferred, for example an inference regarding several characteristics of meat which will be obtained from the subject.
The inferred trait information then can be entered into a model which uses the information to estimate a value for the livestock subject, or a product from the subject, based on the traits.
The model is typically a computer model. Values for the traits can be used to segregate the animals. Furthermore, various parameters which can be controlled during maintenance and growth of the subjects can be input into the model in order to affect the way the animals are raised in order to obtain maximum value for the livestock subject when it is harvested.
0173 In certain embodiments, meat or milk can be obtained at a time point which is affected by the inferred trait and one or more of the food intake, diet composition, and 874124_'7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG. 2007 18:05 SPRUSON FERGUSON 92615486 NO. 1556 P. 51 -47 0 01) management of the livestock subject. For example, where the inferred trait of lvestock subject is high feed efficiency, which can be identified in quantitative or qualitative terms, _meat or milk canbe obtained at a time point which is sooner than a tme pint foralivet ^subject with low feed efficiency. As another example, livesto subjects with different fed efficiencies can be separated, and those with lower feed efficiencies can be implanted with growth pronmotants or fed metabolic partitioning agents in order to maximize the profitabilit r of a single livestock subject.
[01741 In another aspect, the invention provides methods which allow effective 0omeasurement and sorting of animals individually, accurate and complete record keepingof i =s Sgenotypes and traits or characteristics for each animal, and production of an economic end point determination for each animal using growth performance data. Accordingly, the present pointThe method includes inerring a invention provides a method for sorting livestock subjects. The method includes inferring a trait for both a first livestock subject and a second livestock subject from a nucleic acid sample of the first livestock subject and the second livestock subject. The inference is made by a method which includes identifying the nucleotide occurrence of at least one SNP, wherein the nucleotide occurrence is associated with the trait. The method further includes sorting the first livestock subject and the second livestock subject based on the inferred trait.
0175 The method can further include measuring a physical characteristic of the first livestock subject and the second livestock subject, and sorting the first livestock subject and the second livestock subject based on both the inferred trait and the measured physical characteristic. The physical haracteristic can be, for example, weight, breed, type or frame size, and can be measured using many methods known in the art 0176 In another aspect the invention provides a method for cloning a livestock subject such as a cow or bull which has a specific trait or series of traits The method includes identifying nucleotide occurrences of at least one o at least two SNPs for the livestock subject, isolating a progenitor cell from the livestock subject, and generating a cloned livestock fom the progenitor cell. The method can further include before identifying the nucleotide occurrences, identifying the trait of the livestock subject, wherein the livestock subject has a desired trait and wherein the SNPs affect the trait.
874124 7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG. 2007 18:06 SPRUSON FERGUSON 92615486 NO. 1556 P. 52 -48 0 S0177 Methods of cloning livestock are known in the ar and can be sed for the present Sinvention. For example, methods of cloning pigs have been reported (See Carter D. B, t. "henotyping of transgenic cloned piglets," Cloning Stem Cells 4: 131-45 (2002)). For en methods involving beef, milk and dairy product traits, known methods for cloning cattle an be used (See Bondioli, itCommercial cloning of cattle by nuclear transfer", In: be uSymposim on Cloning Mammals by Nuclear Transplantation, Seidel pp. 35-38, (1994); Willadsen, "Cloning of sheep and cow embryos," Genome, 31: 956, (1989); Wilson t al, transfer (cloning), embryo transfer and natural mating", Animal Reprod. Sci., 38: 73-83, 0 (1995); and Barnes et al., "Embryo cloning in cattle: The use of in vitro matured oocytes",
J
Reprod. FPert., 97: 317-323, (1993)). These methods include somatic cell cloning (See e.g., Enright B. P. et al., "Reproductive characteristics of cloned heifers derived from adult somatic cells," Biol. Reprod., 66: 291-6 (2002); Bruggerhoff et al., "Bovine somatic cell nuclear transfer using recipient oocytes recovered by ovum pick-up: effect of maternal lineage of oocyte donors," Biol. Reprod., 66: 367-73 (2002); Wilnut, et al., "Somatic cell nuclear transfer," Nature, 419: 583 (2002); Galli, et al., "Bovine embryo technologes, Theriogenology, 59; 599 (2003); Heyman, et al,, "Novel approaches and hurdles to somatic cloning in cattle," Cloning Stem Cells, 4: 47 (2002)).
0178 In another aspect, the invention provides a livestock subject resulting from the selection and breeding aspect or the cloning aspect of the invention, discussed above, 0179] In another aspect, the invention provides a method of tracking a product of a livestock subject. The method includes identifying nuleotide occurrences for a series of genetic markers of the livestock subject, identifying the nucleotide occurrences for the series of genetic markers for a product sample, and determiing whether the nucleotide occurrences of the livestock subject are the same as the nucleotide occurrences of the product sample. In this method identical nucleotide occurrences indicate that the product sample is from the livestock subject. The tracking method provides, for exmple, a method for historical and epidemiological tracking the location of an animal from embryo to birth through its growth period, to harvest and finally the retail product after it has reached the consumer. The series of genetic markers can be a series of single nueleotide polymorphisms (SNPs). The method can 874124-7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:49 Date 2007-08-31 31 AUG. 2007 18: 06 SPRUSON FERGUSON 92615486 NO, 15 5 6 P. 53 0 C c 0i -49 further include coparing the results of the above determination wit a determinaton of whether the meat s from the liestock subject made sing another tracking method. In this embodiment, the present invention provides quality control information which improves the accuracy of tracking the source of meat by a single method alone.
01so I The nucleotide occurrence data for the livestock subject can be stored in a 0180 1 The nucleotide occurrence d i nuclode compuer readable form, such as a database. Therefore, in one example, an initial nucleotide occurrence determination can be made for the series of genetic markers for a young livestock subject and stored in a database along with information identifying the livestock subject.
Then, after meat from the livestock subject is obtained, possibly months or years after the initial nucleotide occurrence deteination, and before d/or after the meat is shipped to a to r suc a for exa e, a wholesale distributor, a sample can be obtained from the customer such as, for eXample aformation determined using methods discussed product, meat, and nucleotide occurrence information deterfaced using methods discussed herein, with the herein. The database can then be queried using a user interface as discussed hein, with the nucleotide occurrence data from the meat sample to identify the livestock subject nucleotide Tom the me data fmat method for inferring a trait of a subject 0181] The invention in another aspect provides a method for nferring a theait ofulic aecid from a nucleic acid sample of the subject, which includes identifying, in the nucleic acid sample, at least one nucleotide occurrence of a SNP. The nucletide occurrence is associated with the trait, thereby allowing an inference of the trait.
0182 In another aspect the invention provides a method for i gentifyifg a livestock genetic marker which influences a trait. The method includes anal eet a s wh association with the trait. The genetic marker can be a SNP or can be at least two SNPS which influence the trait. Because the method can idertify at least two SNPs, and in some embodien many SNs, the method can identify not only additive genetic components, but non-additive genetic components such as dominance dominating trait of an allele of one 2 genomic over an allele of another gene) and epistasis interaction between genes at different loci). Furthermore, the method can uncover pleiotropic effects of SNP alleles (i.e.
SNP alleles or haplotypes effects on many different taits), because many traits can be analyzed for their association with many SNPs using methods disclosed herein [01833 Performance animals 874124.7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG. 2001 18:07 SPRUSON FERGUSON 92615486 NO. 1556 P. 54 0 on [0184 J In certain embodiments, the subject is a horse, Horses of various breeds are used in ;racing, and management and breeding of horses for this purpose are very substantial industries In addition to thoroughbreds, which are used in horse racing in many countries, standardbreds are used in trotting and pacing races, and quarterhorses and Arab horse are also used in racing. Horse bloodstock breeders currently rely on biomechanical, geometric, and physiological criteria to evaluate young adult horses (14 months and older) for their inherited racing and breeding potential. The size and relative positions of major muscles in the fore and c,1 hind limbs are measured to estimate stride power. Slow-motion videography is utilized to evaluate the efficiency of a hors&s gait. Blood pressure and ultrasound are used to determine 1 heart size, thickness, and stroke volume.
0185] However, because the phenotype of an adult horse depends on the interaction of its genotype and environment, an adult phenotype does not provide an accurate prediction of the horse's genetic potential. In addition, parental phenotype is a poor predictor of offspring genotype. Phenotypically superior horses often produce below average foals, demonstrating the limitations of phenotypic analysis and performance or pedigree records such as stud books or race results in predicting breeding potential. Thoroughbreds for racing are normally selected and sold as yearlings, ie approximately 12-16 months old. In the absence of performance records, prospective purchasers rely largely on pedigree and physical conformation to select animals which they consider to have potential for racing success.
z0 However, because at this age a horse is still growing and developing, its physical conformation may not accurately predict its adult physical capacity and its performance.
[0186 A variety of phenotypes may be measured, especially those related to traits of interest, including those related or thought to relate to performance characteristics, physical structure or disease susceptibility. These measurements may include, but are not limited to, physiological parameters such as limb length, limb angle, muscle volume, resting heart rate, time to resting heart rate after physical exer-ton, blood pressure, maximum oxygen uptake (VOzraaX), maximum carbon dioxide production (VCOmax), blood volume at rest and exercise, rebreathing measurements of lung volumes, maximum sprint speed, heart size, and health parameters such as history of joint, skin, and diseases or conditions such as cardiovascular disease, orthopaedic diseases, chronic obstructive pulmonary disease, pulmonary "bleeding" 874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31, AUG. 2007 18:07 SPRUSON FERGUSON 92615486 NO. 1556 P, 0^ 0 0 Va 0 ciD -51during extreme exertion, muscle diseases like exertional rhabdomyolysis, immune system disorders causing sarcoid tunours, and insect bite hypersensitivity. The condition may comprise normal, apparently normal, pre-clinical disease, overt disease, progress and/or stage of disease, undiagnosed or unclassified conditions, presence of drugs, response to exercise, response to vaccines, therapies, nutritional states and response to environmental conditions.
The disease may comprise inflammation or involvement of the immune system, and conditions affecting respiratory, musculoskeletal, urinary, gastrointestinal and adnexal, cardiovascular, reticuloondothelial, nervous, special senses, reproductive, and integment systems. Such conditions in the horse include laminitis, lameness, viral or bacterial disease, 1o colic, gastritis, gastric ulcers, respiratory ailments, Pistaxis, fractures, musculoskeletal damage or disorders and joint disease.
0187 Variables chosen for phenotypic determination may have a numerical format or can be grouped into ranges to form categorical variables. For example, a continuous variable such as a horse's maximum sprint speed can be grouped into several categories, such as fastest is horses, having a sprint speed of over 17.5 metres /second; fast horses, having a sprint speed of between about 16 and 17.5 metres /second, and average horses having a sprint speed of between 15 and 16 metres/second. As will be apparent to one of skill in the art of statistical analysis, the segmentation of such variables can be chosen through groups of categorical variables according to the distribution of the continuous variable.
[0188 Horses can be screened for two genetic disorders, hyperkalaemic periodic paralysis (flYPP) and severe combined immunodeficiencY disease (SCID). HYPP is a genetic disorder effecting quarterhorses which results in muscle spasms and paralysis (Rudolph, Spier, S. et al. (1992), "periodic paralysis in quarter horses--a sodium-channel mutation disseminated by selective breeding," Nature Genetics 144-147). A PCR-based genetic test is available to identify horses with the HYVP disease allele. Breeders use this information to minimize the prevalence of HYPP in their stock or to identify animals needing treatmen t SCID is a genetic disease of the rn-une system effecting Arabian horses (Don-vanIt Slot, H. and J. van der Kolk (2000), ,Severe-Combined-ImmunodefioiencyDisease (SCID) in the Arabian horse: a review." Tijdschrift Voor Diergeneeskunde 125(19): 577-581;S Shin, L. Perry m an et al.
(1997), "Evaluation of a test for identification of Arabian horses hetrozygous for the severe 874124 7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG 2007 18:07 SPRUSON FERGUSON 92615486 NO 1556 P. 56 52- 0 ocombined inmunodeficiency trait," J, American Veterinary Medical Association 211(10): Horses carrying the SCID disease allele have dysfunctional immune systems. As <with HYPP, a genetic test is available to identify carriers of the defective SCID gone.
0189 3 It will be appreciated that similar performanue and physical parameters and criteria to those used in the evaluation and selection of horses are also applicable to other animals used in racing, such as mules, camels and dogs. While mules are sterile, the methods and _systems of the invention other than those relating to breeding can be applied to these animals.
Cl Similar performance and physical parameters and criteria may also be used in prediction of 0human athletic performance, particularly for sports which involve running and/or endurance, l i0 including but not limited to athletics events, swimming, rowing, kayaking, football codes (Australian Rules Football, rugby, American football, soccer), baseball, basketball and ice hockey.
[01901 In one embodiment the animal is a dog. The methods of the invention can be used to predict performance for racing dogs such as greyhounds, for dogs to be used in dog shows 1s and breed club shows, or for working dogs such as guide dogs or other dogs used for assisting disabled people, sheep dogs, police dogs, and drug or quarantine detection dogs. The methods of the invention can also be used to predict performance for other companion animals, including those to be used for show. For example, the inference can be drawn regarding a coat or conformational characteristic or a health characteristic, for example, susceptibility to hip dysplasia, arthritis, diabetes, hypertension, atherosclerosis, autoimmune disorders, kidney disease and neurological disease. The invention is also useful for assessing complex traits such as energy metabolism, aging and breed-specific traits, [0191 Methods according to the invention may be used in companion animal management, for example management in breeding, typically include managing at least one of food intake, diet composition, administration of feed additives or pharmacological treatments such as vaccines, antibiotics, age and weight at which diet changes or pharmacological treatments are imposed, days fed specific diets, castration, feeding methods and management, imposition of internal or external measurements and environment of the companion animal subject based on the inferred trait.
874124_1 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG. 2007 18:08 SPRUSON FERGUSON 92615486 NO. 1556 P. 57 53 9ed torelated toilveP ti)[ 0192 Methods according to the invention may be used to imp rove profits selling a companion animal subject; to manage companion animal subjects; selling a companion ct; to m n ag e sbelecting an animal subjects to improve the genetics of a companion animal population by selecting and breeding of companion animal sbjects, to clone a companion animal subject with a specific genetic trait, a combination of genetic traits, or a combination of SNP markers whichpredict C) a genetic trait; to track a companion animal subject or offspring; and to diagnose or determine susceptibility to a health condition of a companion animal subject.
[0193 In another aspect, the invention provides a method for identifying a companion animal genetic marker which influences a phenotype of a genetic trait, The method includes 0 analyzing companion animal genetic markers for association with the genetic trait. refeabl (SNPs). Preferably, nucleotide occurrences of at least t SNs e identified which influence the genetic trait or a group of traits.
1 0194] The following table gives references for sets of markers in a variety of animal s species, which may be used in the methods of the invention (refer to Table 12 for examples of marker and geaome data sets within a variety of families and genus' which may be directly utilised by the methods and systems disclosed herein), In most cases the reference is to sets of markers which have been used to create linkage maps for that species, Sheep: Crawford et al. (1995) Genetics 140:703-724, Beef cattle: Barendse et al. (1997) Mammalian Genome 8: 21-28, pig: Archibald et al. (1995) Mammalian Genome 6:157-175.
Goat: Vaiman et al. (1996) Genetics 144: 279-305.
Deer: Slate et al. (2002) Genetics 160: 1587-97.
Horse: Gu6rin et al. (1999) Animal Genetics 30: 341-54.
Chicken: Levin et al. (1994) Journal ofHeredity 85: 79-85.
Turkey: Burt et al. (2003) Animal Genetics 34: 399-409.
Mouse: Dietrich et al. (1994) Nature Genetics 7: 220-245, Rat; Yamada et a. (1994) Mammalian Genome 5: 63-83.
Cat: Menotti-Raymond et al. (1999) Genomics 57: 9-23.
Dog: Werner et al. (1999) Mammalian Genome 10: 814-823 874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31.AUG.2007 18:08 SPRUSON FERGUSON 92615486 NO. 1556 P. 58 0 0 Cm
V--
0i -54- Baboon: Rogers et al (2000) Genomics 67: 237-247, Salmon: Naish and Park (2002) Animal Genetics 33: 316-318; Beacham et al.
(2003, Fishery Bulletin 101: 243-259 Rainbow trout: Sakamoto et al (2000) Genetics 155: 1331-1345, Catfish: Waldbieser et al. (2001) Genetics 158: 727-734.
0195 Nucleotide occurrences can be determined for essentially all, or all of the SNPs of a high-densitY, whole genome SNP map. This approach has the advantage over traditional approaches in that since it encompasses the whole genome, it identifies potential interations of genomic products expressed from genes located anywhere on the genome, without o0 requiring preexisting knowledge regarding a possible interaction between the genonic products. An example of a high-density, whole genome SNP map is a map of at least about 1 NP per 10,000 kb, at least I SNP per 500 kb or about 10 SNPs per 500 kb, or at least about SNPs or more per 500 kb. Definitions of densities of markers may change across the genome and are determined by the degree of linkage disequilibrium within a genome region.
is [0196 Thus in embodiments where SNPs which affect the same trait and which are located in different genes are identified, the method can farther include analyzing expression products of genes near the identified SNPs, to determine whether the expression products interact. Thus the present invention provides methods to detect epistatic genetic interactions.
Laboratory methods for determining whether genomic products interact are well known in the [0197] Where the trait is overall quality, the method can infer an overall average quality grade for a product obtained from subject. Alternatively, the method can infer the best or the worst quality grade expected for a product obtained from the subject. Additionally, as indicated above, the trait can be a characteristic used to classify the product.
0198 The methods of the present invention which infer a trait can be used instead of present methods used to determine the trait, or can be used to provide further substantiation of a classification of milk, meat or another product using present methods, 0199] It will also be appreciated that the methods of the invention are useful in the identification of markers useful in determination of physiological parameters, diagnosis of 874124 7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG. 2007 18:08 SPRUSON FERGUSON 92615486 NO. 1556 P. 59 0 dis e, etimion of risk of multifactorial genetic disorders; and identification of SpharMnagenomic markers, in both humans and non-human animals such as livestock and peromanc e animals. Prior art methods for analysis of genome-wide associations have been used to identify markers for conditions such as Crohn's disease (see for example /2007/025 0 8 5 and diabetes (Sladek et al, Nature doi038/nature0566;2 0 07 and Smarkers for longevit (WO/2006/138696). However, these studies have tended to search for en markers for just one condition or disease at a time, using known disease-affectd kindreds.
[0200 The invention is further described in detail by way of reference only to the 8 following examples and drawings. These are provided by way of reference only, and are not intended to be limiting. Thus the invention encompasses any and all variations which become evident from the teaching provided herein, 0201] The methods disclosed herein have been developed primarily for use as a computational method for prediction of the genetic and phenotypic merit of individuals based on the use of molecular breeding values (MBVs), and will be described hereinfte 1i particularly with reference to this application. However, it wil be appreciated that the methods are not limited to this particular field of use.
0202 True breeding worth r true genetic merit of an individual cannot be measured, but is usually estimated statistically as Estimated Breeding Value (EBV), which is generally based on a statistical analysis of the performance of the individual itself and of progeny or relatives of the individual, using statistially-based analytical systems such as BLUP.
However, there is a need in the art for selection methods which enable accurate selection of individuals for breeding prior to the availability of data which can only be obtained once the individual, or its relatives, have entered their productive phase. For example, this may be used to enable accurate selection of young sires for progeny testing.
[0203 A variety of potential methods for such selection, for example PCA and regression using a genetic algorithm, involve the use of both DNA-baed genotypic informatior and indirect predictors of genotype and therefore phenotype, directly based on DNA markers as a source of biomarkers. These can be used either separately or together, and with or without statistical information, to assess individuals for their genetic merit. For example biomarkers 874124.7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31, AUG. 2007 18:09 SPRUSON FERGUSON 92615486 NO. 1556 P. 56 0 levels an be used with together with DNA markers to predict phenotypes.
such as hormone of genetic merit can be assessed on the basis of single or multiple In this context the nature of genetic merit ce Of Molecular genetic markers, which rank the individual for breeding worth on the basis of Molecular C Breeding Values (MBV). The MBV can be obtained in addition to the pedigree information and BLUP-based information discussed above.
N[ 0204] In accordance with at least some of the methods disclosed herein the MBV may be derived without the need for direct pedigree or relationship information, i.e. as a function of relationships between markers, genotypes and EBV.
sisted selection for individual 0205] As will be appreciated, such genetic assay-assisted selection f ndividual to breeding may allow selections to be made without the ned for generation and phenotypic testing of progeny/descendants. In particular, such tests allow selections to be made among related individuals which do not necessarily exhibit the trait in question, and which can be used in introgression strategies to select both for the ait to be introgressed and against undesirable background traits.
[0206] Inthis context, the present methods relate to the use of the relationship between BLUP genetic merit and MBV genetic merit to predict the underlying true genetic merit.
[0207] Prediction of genetic merit 0208 The present invention relates to methods and systems for the prediction of genetic and phenotypic merit on the basis of genome-wide marker information and example methods are exemplified in Figure 1A to IF. Figures 1A to 1F merely provide examples, which should not unduly limit the scope of the claimns. One of ordinary skill in the art would recognize many variations, alternatives, and modificationsPerformance records of individuals and marker genotype data from which to derive prediction equations are combined with dimension reduction techniques to make predictions of merit on the basis of marker information alone, or in combination with information from other sources.
0209 igure IA shows an example arrangement of a method to predict the merit of an [0209 ]Figure A shows an example arrangulation P, where genotypic and individual comprising the steps of: creating 1 a first population P1, where genotypic and phenotypic information on the individuals in the first population arc known; selecting an individual 2 or set of individuals forming a second population
P
2 where only genotypic 874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG. 2007 18:09 SPRUSON FERGUSON 92615486 NO. 1556 P. 61 57on information on the individual(s) in P 2 are known; detemiining 3 a set of explanatory variables ;for at least one marker for individuals in the first population; defining 4 a predictor function for the at least one marker;; applying 5 the predictor function to an individual of interest from n P2; and determining 6 the merit (eg, genetic merit) of the individual of interest with respect to s the marker. In an alternative arrangement, as shown in Figure 1B, the predictor function may 0 nbe applied to all individuals in the second population P7 and detennining the merit of all individuals in P7, and then depending on the merit of each of the individuals, selecting 7 a cparticular individual of interest from P 2 for a purpose.
0o[0210 Figure IC shows a further arrangement of the methods disclosed herein for io determining the merit and/or selecting an individual of interest from a second population having known genotype information, based upon genotype and phenotype information of individuals in a first population. Again, first and second populations are created (10 and 11 respectively) wherein the first population has known genotype and phenotype information and the second population has known genotype information only. A trait of interest is 1s selected 12 on which a particular individual of interest from the second population will be assessed and/or selected, and a dimension reduction process as described hereunder is performed 13 on the genotype and phenotype information of individuals in the first population. As a part of the dimension reduction procedure, a subset Pl,A is selected 14 with respect to the selected trait and the prediction error is determined 15 for the subset PIA with respect to the number of explanatory variables used to describe the genetic date (eg, the number of principle components for PCA or the number of latent components for PLS etc), and the prediction error is then determined for the remaining subset P,b of individuals in P 1 with respect to the number of variables, from which the model complexity is determined which minimises the prediction error for individuals in PI,B. Next a new subset P1,A of the first population is selected and steps 14 through 18 are repeated 19 to determine the optimal number of explanatory variables for all individuals of the first population
P
1 with respect to the selected trait. Once the optimal number of explanatory variables is determined 20, a predictor (eg, a predictor function) is defined 21 for the trait of interest from the explanatory variables. Once the predictor has been determined, then an individual of interest is selected 22 from the second population P2 an the predictor applied 23 to the genotype data on the 874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 AUG, 2007 18:10 SPRUSON FERGUSON 92615486 N O 1 5 5 56 P 6 2 ;0 <i selected individual to obtain a prediction of the characteristics of the individual of interest with respect to the selected trait, Optionally, the steps of selection and prediction (22 and 23 respectively) may be repeated 24 for all individuals in Pz to obtain a prediction of the characteristics of all individuals in P2 with repect to the selected trait, from which a particular individual may be selected 25 on the basis of their predicted merit with respect to the selected trait.
[0211 Figure ID is a further arrangement of the prediction and selection process described herein, where for two populations P, and P 2 (32 and 33 respectively) selected from individuals of a common family 31 (for example any one of the bovine, ovine, porcine, avian, 0 human or any other family as would be appreciated by the skilled addressee, or even to a particular genus of breed within the family for example the Holstien-Fresian breed of the bovine family, or human genus for individuals of a common race, geographic location ete) the following steps are taken to select a particular individual' a dimension reduction procedure such as those described herein is performed 35 on known genotypic and phenotypic information of the individuals of Pr with respect to a selected trait and a set of explanatory variables is determined 36 with respect to that trait. A predictor function is then defines 37, and the predictor function applied 38 to known genotype information on the individuals of P2.
Front the application of the predictor function, the merit of the individuals of P? is determined with respect to the selected trait, and one or more individuals with a high predicted merit for the selected trait may then be selected 40 for a particular purpose.
0212 An arrangement 50 of the process of determining the predictor function of the arrangements of Figures IA to 18 is exemplified in Figure 1E wherein trait, phenotype or observational data 51 and marker data 52 is obtained 53 for a plurality of individuals of a common family/genus/breed. It will be appreciated that, due to the nature of such information, a filtering or preprocessing 54 of the data obtained in 53 may be required i.e.
quality control of the data for example exclusion of DNA or SNp data according to a particular criteria which may be data duplication or low frequency etc, (see for example Zenger et. al (2007)), and examples of such filtering are described below, although other miethods of filtering the data as would be appreciated by the skilled addressee may also be employed, to obtain a working data set 55 on which the predictor function is determined.
874 124Q COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 331AU1.2007 18:10 SPRUSON FERGUSON 92615486 NO, 1556 P. 63 59- O) A cross-validation procedure 56 is determined to obtain the optimal model complexity of the working data for a particular reduction method (for example the optimum number of principle components for PCA or the optimal number of latent component for PLS, or other alternate methods) and the working data 55 is then analysed 57 using the optimal model complexity to s obtain a predictor function 58 which may for example depending on the chosen method) may comprise a matrix or regression components 59 In Figure 1F an example arrangement of the application of the predictor function 58 is described for a selected individual 81, In C. this example the predictor function is applied to predict the MBV of the selormacted individual 81. A marker assay 82 is obtained 83 to determine ths genotype 0--0 individual 81 and the predictor function 58 is then applied 85 to the genotype informnation 84, thereby to obtain a prediction of the individual's MBV 86 (or other assessment of merit of the individual as required).
ent of the dimension reduction process 56 S0213 Figure ncororatng a PLS meooog rssalidation 64 as described in more of Figure 1E incorporating a PLS methodology with cross" 1) On is detail below. The working data 55 is iterated or a suitable number of times 10). On each iteration different groups of data sets 61 are selected. Bach data set 61 is divided into a randomly chosen 'test set' 62 10%) and a residual set 63 A dimension reduction methodology 65 is applied using PLS 66 across the residual set 63 to obtain a set of 1 to n latent component models 67 (eg. Models [Mi to M,1 as described in more detail below). The prediction capability of latent component models 67 is then performance assessed 68 on the test set 62 and the performance of each Model I to n is recorded to obtain a plurality of Model performance variables/function Mp 1 to 1Mp_ 69, from which the prediction error 70 is calculated for each of the Model performance variables/function Mp 1 to prediction error 70 is ca7 li then calclated for each Mpn and each of the data sets 61. The average prediction error 71 is then calculated fo each of the models with corresponding the same) latent variables and the optimal number of latent components 72 is chosen on the basis of the minimal the smallest) prediction error observed. A PLS regression model comprising the latent components of the minimal prediction error 2 is then fitted to the working data 55 from which the predictor function 571 is derived, 874124 7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG. 2007 18:10 SPRUSON FERGUSON 92615486 NO. 1556 P. 64 0 She silled addressee that, for the arrangements as [0214] It will be appreciated by the skilled addr e that, for the arrangement a exemplified in Figures 1A to 1G, where the merit of an individual is deterfied for particular trait and/or marker, that the process may be repeated for any number of traits and/or markers, or potentially a particular combination of at least two to any numbe (for exaple 2 ton 00rti s/vmarkers).
0O Va) S to 100 or 21 T t t to the use of genetic mk including genetic markers mr icun ge markers 0215 3 The method relates te eo elec of efficiently combining marker and distributed across the genome in a process cpurable of efficientalues for quaining mtitative or phenotypic information in order to produce more acuate breeding values for quantitative or qualitative traits, particularly those traits which are difficult to estimate conventionally Ts 1o process is interchangeably referred to as Genome Wide Scanning or Genome Wide Selection or by the collective abbreviation
"GWS"
02163] The method provides a screening tool to capture as much of the additive genetic L 02161 The method provides a screnin molecular breeding values variation in production traits as possible in order to develop molecular breeding values (MBV) as a foundation for BBVs, and may also be used to capture epistatic variations in kodo EB~ sp a na ironents, This will then provide the basis perfonnrmance or to rank individuals for specific enviroments This will then provide the basis to consider new advanced breeding opportunities by the creation of individuals with elite genetic profiles in combination with advanced reproductive technologies to reduce generation interval and increase selection intensity.
in0217 The method enable s selection of individuals from within a population on the basis of an assessment or estimation of their merit or appropriateess for a ptarticular end-use The method may involve the application of a combination of a group of hnique or p and the to the selection of individuals e.g. animals, cells, mbryos, gametes, or plants and the subsequent individuals, e.g. animals, cells, gametesor plats, thereby selected or bred as a result, on the basis of their value or merit or fitness for purpose for a particular end-use.
e rmit is one o f 0218 1 Such end-uses include bVeeding, in which case the assessm l genetic merit, or alocation to a desired end-use, such as the production of a specific genetio merit, or allocati th as essent of merit is one of a phenotypic merit with or component of milk, in which case the assesThe output may be Advanced Phenotypic and without an assessment of genetic merit. The output m Genotypic Value (APGV).
874124)- COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG. 2001 18:11 SPRUSON FERGUSON 92615486 NO. 1556 P. -61- Oil) L0219]1 The method may incorporate one or more of the following sources of data or ;Z information for the individuals under study or evaluation within the population, in the form Of information on the individuals which may be utilised by the methods of the invention to generate a set of explanatory variables and define a predictor function. The informationi may s include, for example, one or more of:.
0n pedigree of the individual, which may include data ranging from knowledge of the sire only through to a multi-generation pedigree, where a number of maternal and/or ci paternal ancestors are defined; this includes pedigrees defined by reference to the inheritance o by offspring of marker variants from their parents; b) indices of genetic merit for one or more traits of interest, such as en EBV for a trait for an individual, where the EBV may be derived using statistical analysis such as ELUP, and/or derived by evaluation of pro geny/descendants of the individual; c) data on genotypes or marker variants at markets within the genonme for the individual, or markers for/of the individual; d) data on genotypes or marker variants at markers within the genome for relatives of the individual, or markers for/of the individual; e) indices of phenotype for the individual, for relatives of the individual and for the phenotypic variation of the population, for the trait or traits of interest; f) indices of phenotype, including bio-markers, which may in themselves be predictive of other indices of phenotype for the individual, and for relatives of the individual, and/or of underlying genetic or phenotypic variation for individuals within the population; g) indices of epigenetic modification or status for an individual; h) other sources of data indicative of, or potentially indicative of, genetic differences between animals.
0220)1 Examples of factors which enable the process to generate usefuil information in a timely and cost-effective manner include: a) access to a system to define the genotypes at a large numnber of markers across the whole genome or within a defined part thereof for a population of individuals; b) access to accurate genotypic and phenotypic2 data for a population of individuals; the quanta of data for the individuals within the population, and the population 874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 3'!.AUG,2007 18:11 SPRUSON FERGUSON 92615486 NO. 1556 P, 66 62- 0 Sitself, must both be of sufficient size to provide robust estimates of the genotypes or marker variant-trait relationships; c) ready access to a database or databases wherein the data referred to above are stored; Sd) a set of computational methods for the statistical analysis of data for the Sgeneration of genetic information (such as BLUP, principal component analysis, or genetic algorithms) and for the derivation of the genotypes or marker variant-trait relationships; Cl e) access to scientific literature and/or public databases of genomic Sinformation which enable the identification of genes which are potential candidates as Sito contributors to variation in the trait of interest.
S0221 The above lists are respectively not exhaustive and no preference for the preferred types of information or process factors should be implied for their inclusion or placement with these lists. For example the present methods disclosed herein do not require the pedigree information for the individual to enable the prediction of merit of that individual.
[0222] Amplification ofnucleic acids in the analysis ofgenetic markers 0223 Nucleic acids used as a template for amplification may be isolated from cells, tissues or other samples according to standard methodologies. For example these may find particular use in the detection of repeat length polymorphisms, such as microsatellite markers.
Amplification analysis may be performed on whole cell or tissue homogenates or biological fluid samples without substantial purification of the template nucleic acid.
0224 Pairs of primers designed to selectively hybridize to nucleic acids ate contacted with the template nucleic acid under conditions that permit selective hybridization.
Depending upon the desired application, high stringency hybridization conditions may be selected so as to allow hybridization only to sequences that are completely complementary to the primers. Alternatively hybridization may occur at reduced stringency to allow for amplification of nucleic acids containing one or more mismatches with the primer sequences.
Once hybridized, the template-primer complex is contacted with one or more enzymes that facilitate template-dependent nucleic acid synthesis. Multiple rounds of amplification also 874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31, AUG. 2007 18:12 SPRUSON FERGUSON 92615486 NO. 1556 P. 67 -63- 0 0 1 referred to as "cycles", are conducted until a sufficient amount of amplification product is Sproduced.
en[ 0225 The amplified product may be detected or quantified by visual means; alternatively, the detection may involve indirect identification of the product via chemiluminescence, s radioactive scintigraphy of incorporated radiolabel or fluorescent label or even via a system IN using electrical and/or thermal impulse signals. Typically, scoring of repeat length polymorphisms is performed onthe basis of the size of the resulting amplification product.
[0226 3 A number of template-dependent processes may be used to amplify the oligonucleotide sequences present in a given template sample. One of the best known o0 amplification methods is the polymerase chain reaction (PCR), which is described in detail in U.S. Pat. Nos. 4,683,195, 4,683,202 and 4,800,159, each of which is incorporated herein by reference in its entirety.
(0227] Detection ofgenetic markers for use in the prediction of genetic merit 0228 Non-limiting examples of methods for identifying the presence or absence of a is polymorphism include detection of single nucleotide polymorphisms (SNPs), haplotypes, microsatellites (simple tandem repeat STR, simple sequence repeat SSR), restriction fragment length polymorphisms (RPLP), amplified fragment length polymorphisms (AFLP), insertiondeletion polymorphism (INDEL), random amplified polymorphic DNA (RAPD), ligase chain reaction, insertion/deletions, simple sequence conformation polymorphisms (SSCP) and direct sequencing of the gene. These techniques are well known in the art; see for example Sambrook, Fritsch and Maniatis; "Molecular Cloning: A Laboratory Manual" 2 nd ed. Cold Spring Harbor Laboratory Press (2001).
[0229] In particular, techniques employing PCR detection are advantageous in that detection is more rapid, less labour-intensive and requires smaller sample sizes. Once an assay format has been selected, selections may be unambiguously made on the basis of genotypes assayed at any time after a nucleic acid sample can be collected from an individual, such as an infant animal, or even earlier in the case of testing of embryos in vitro, or testing of foetal offspring. Any source of DNA may be analyzed for scoring of genotype. For example, the DNA may be nuclear or mitochondrial DNA, or any other form of DNA.
874124 7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG. 2007 18:12 SPRUSON FERGUSON 92615486 NO. 1556 P. 68 -64convenient tissue, such 0230 The nucleic acids to be screened may be isolated from any convenient tissue, such [0230] Thenucleicacids of the animal. Single cells from early-stage Sas blood, milk, tissue, hair follicles or semene e a te o e of embryos may also be used. Peripheral blood cells are conveniently t o e DNA from young or adult animals, A sufficitt number of cells is obtained to provide a sufficient amount of DNA for analysis, although only a minimal sample size will be needed where scoring is by amplification of nucleic acids The DNA can be isolatedom the ll O e o y aown to those skilled in the art.
Ssample by standard nucleic acid isolation techniques known to those skilled in ,1 0231 Bio-Markers 023]marker, ban also be used. The bio-marker may 0232 In addition to genetic markers, bio-markers can eptide, inludig hormone such o comprise a component which may be a RNA sequence, a peptide, sch a suc as insulin-like growth factor-1, a steroid such as progesterone a metabolite such as glucose, urea or an amino acid, or an immune-mediator molecule such as y-interferon. Such molecules have potential as diagnostic aids and/or as advanced phenotypes. For example they may be used as indirect selection criteria for variation in complex traits; in many cases the biois markers can be used in combination to define the Advanced Phenotypic Value
(APV).
1 0233 3 Bio-markers offer potential as diagnostics and/or predictors of performance, health or production traits in animals such as dairy cattle. Generally such bio-markers aremeasured or detected in samples such as blood or milk including somatic cells or from other easilyaccessible tissues or sources, including urine, tissue biopsies, placenta post-birth, etc.
[0234] Genetic Marker Screening Platform 0235 A number of genetic marker screening platforms are now commercially available, and can be used to obtain the genetic marker data required for the process of the present methods. In many instances, these can take the form f genetic marker testing arrays (microarrays), which allow the simultaneous testing of many thousands of genetic markers For example, these arrays can test genetic markers in numbers of greaterthan 1,000, greater than 1,500, greater than 2,500, greater than 5,000, greater than 10,000, greater than 15,000, greater than 20,000, greater than 25000, greater than 30,000, greater than 35,000, greater than 40,000, greater than 45,000, greater than 50,000 or greater than 100,000, greater than 250,000, greate than 500,000, greater than 1,000,000, greater than 5,000,000, greater than 874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31.AUG2001 18:12 SPRUSON FERGUSON 92615486 NO. 1556 P. 69 (13 0 0 (c r-.
en 0 VaD 0 0i 0,000,000 or greater than 15,000,000. The nucleotide occurrence of at least 2 SNPs can be determined, At least 2 SNPs can form a haplotype, wherein the method identifies a haplotype allele which is associated with the trait. The method can include identifying a diploid pair of haplotype alleles for one or more haplotypes.
0236 Examples of such a commeialy available product or bovine genomes are those marketed by Affymet r i x Inc ((http:II/www.affymetrix.com)) or Illumina (http:www.illuninafcom). The Affymetrix Inc product was the first 1 Ok bovine SNP array to be commercially released. uIlumina and Affymetrix also have larger SNP panels available for humans.
0 0237 The 10k SNP array has been developed from the public domain bovine sequencing consortium (http:/www.affyetrix.coproducts/arraysspecific/bvineaffx using largely intronic SNPs discovered by the 6x whole genome shotgun sequencing prject across 6 breeds, 1000 SNPs all coding SNPs derived from the Interactive Bovine in silio SNP database Expessed Sequence Tag (IBISS EST) coparison/alignmnt Livestock Indusres: www.livestokgenomics.csiro.au) Only SNPs with a high probability of being genuine not sequencing artefacts) have been submitted on the 10k SNP array. The SNPs are being developed by massie multiplex padlock probe streamlining, by which 10,000 SNP genotypes can be performed in a single reaction and visualized on an Affyetrix universal genotyping array. The core elements for this system have been proven in other mammalian z0 systems, and are available as routine services or commercially-available testing kits, Similar products for human genotyping are available, for example from Affynetrix, Illumina and products for human genotyping ar avall S equenom.
10238] Statistical Analysis 10239 Statistical and computing strategies have been develoed to itegate inform) ation on individual animals and their relatives to produce estimated breeding values and years The are not biased by non-random use of sires in different regions, season, herds and years. The Australian Breeding Value (ABV) is a representative product from such an evaluation system for dairy cattle. Other databases in Australia include 3REBDPLAN (Beef), OVIS (sheep), PIGBLUP (swine) TREEPLAN (Forest trees).
874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 AUG. 2007 18:13 SPRUSON FERGUSON 92615486 NO. 1556 P. 66- 0 0240 The developments in genetic technology described above now allow large numbers Sof SNP genotypes to be generated for a single organism. For animal breeding, these SNPs can be used to predict the genetic merit of animals at an early stage so that a group of superior animals can be identified for further testing or breeding. The large number of SNPs that can Sbe evaluated means that the predictor functions are contained in a high dimensional space Swith large empty spaces between them. This is reerred to as the "Curse of Dimensionality C ^(Bellman, 1961), which is a phenomenon which can be overcome either by adding more animals to the experiment or by reducing the dimension of the predictor space. In many cases it may not be practicable to increase the number of animals in many cases because the required increase is of order 3n4 where n 5 is the number of SNPs, which for GWS can Stypically be in the tens of thousands. Thus the present methods relate to a reduction in the diension of the predictor space, This is usually used to reduce the dimensions of the variables to be predicted. The present method discloses the application of a number of statistical methods, such as PCA, PLS and SVM among onthers, to the explanatory variables, but it will be appreciated that the application of these particular dimension reduction techniques is not restricted to these methods alone.
[0241] Principal Component Analysis [0242] A widely-used method of dimension reduction is Principal Component Analysis (PCA), which finds linear combinations of the data such that the variance is maximised.
Principal component analysis PCA) is a statistical protocol for extracting the main relations in data of high dimensionality. A common way of finding the Principal Components of a data set is by calculating the egenvects of the data correlation matrix. These vectors give the directions in which the data cloud is stretched most. The projections of the data on the eigenvectors are the Principal Components. The corresponding eigenvalues give an indication of the amount of information the respective Principal Components represent. Principal Components corresponding to large eigenvalues represent much information in the data set, and thus tell us much about the relations between the data points. Principal component analysis is described in, Jolliffe, principal Component Analysis, Springer Verlag, 1986, ISBN 0-387-96269-7. This method has been widely exploited for the analysis of very large volumes of data.
B741247 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31, AUG. 2007 18:13 SPRUSON FERGUSON 92615486 NO. 1556 P. 71 -67- [il) 10243 1 In the process described herein, a SNP array, such as the Affyme trx SNP aiay, ;with SNp markers known to be located at strategic positions in the genome, either from prion andr QTL information and or genome gaps, is used as a basis for genomec r genotyping.
S024 For the const tion of an index relating any of the SN markers to moleul Sbreeding val ues (MVs), several information c o reduction edu were used The pmary method is a genetic algorithm described further herein. An alternative information reduction method based on pinipa component analysis (CA) is also described. Both metods rely on analysis of a training data set, in which data on explanator variables (eg individuals, tor w dich n o BV ri. me eai r ente f the 10 SNP genotypes) and traits (eg EBVs) is available for each animal 0245 1 The training dataset comprises a set of genotyped animals with multiple genomewide markers and some performance measure, such as EBV or trait phenotype. The information reductiOn algorithms (GA and PCA) search for the optimal relationship of subsets of markers which maximises the prediction of the EBV in the training population Once is established via this "training set", predictions can be made with respect to untested individuals, for which no EBV or trait measurement is available, but which have been genotyped either for all markers or for the appropriate subset of markers identified from the training set. In so doing, predictions for the EBV of an individual can be made with a very high degree of accuracy, which may be up to 0.9 or even greater. The accuracy depends on Sthe nature of the marker and its degree of eritabiity. ccu is very gh for simulate data, whereas experimental or field data are more complex, and tend to be less accurate.
Regression coefficien s for traits related to fitness tend to be of low heritability.
[0246] Partial Least Squares Analysis 0247 Another widely used statistical methodology, Partial Least Squares is a highly efficient statistical regression technique that is well suited fbr the analysis of whole genome scan data. This method searches for a set of components (also called factor, latent of the covariance between predictor and response.
874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 3, AUG 2007 18:13 SPRUSON FERGUSON 92615436 NO. 1555 P. 72 68- 0 0) 0248 1 PLS analysis methods are superior to alternatives such as principal components regression, which extracts factors to explain as much predictor sample vaiatioii without reference to the response variables. PLS has the advantage that is balances the two objectives, seeking for factors that explain both response and predictor variation.
o 0249 The number of latent components to extract using PLS analysis depends on the en data, Basing the model on more extracted factors improves the model fit to the observed data, but extracting too many components can cause over-fitting, that is, tailoring the model too much to the current data, to the detriment of future predictions. Procedures to choose the 0onumber of latent components are moss validation or bootstrapping, [0250 Described hereunder is a cross-validation method to determine the number of latent components to be used in the regression.
02511 In order to estimate the number of latent components, observation from the data were removed in a stepwise procedure, computing a prediction model based on the remaining samples and finally testing the calculated model by comparing the estimated value with the true value for the excluded observations, This process is then repeated by excluding a new selection of observations, until all observations have been excluded once. In the following discussion, the complete data set (learning set, L) consist of N objects. The learning set vas partitioned in k segments (k 10) of length I( If k I 1:0 N, the k I Nlast segments contained only 1-1 objects. The N -1 objects form the construction data which is used to derive the predictive model using PLS, which then in turn was used to predict the removed
I
objects (the validation data).
0252 The Mean Squared Error of Prediction (MSEP) was used as the objective function in model complexity selection, The k-fold cross-validation estimate is k MSEPr, 0 -t}n 1XB~, 2 L 0253 where 9 is the number of latent components used the estimate and BN-1,o is an estimate of the regression coefficient using 0 latent components based on the construction data yN-1 and XNV-. The value of 0 which minimizes the mean error rate then determines the number of latent components in the finial model as described above.
874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31, AUG. 2007 18:14 SPRUSON FERGUSON 92615486 NO, 1556 P. 73 1- -69- 0 0254 In the processes described herein, a SNP array, such as for example the Affymetrix SNP array, with SNP markers known to be located at strategic positions in the genome either prior QTL information and or genome gaps is used as a basis for GWS and genotyping.
S[ 0255 For the construction of a matrix of coefficients capable of relating any marker §variants to variation in the trait information of the training population, several information Sreduction procedures were used. The primary one is a genetic algorithm (GA) described Sfurther herein. An alternative information reduction method is also described based on partial Sleast squares analysis (PLS), Both methods rely on analysis of a training data set in which animals have data on explanatory variables (eg. SNP genotypes) and traits (eg EBVs).
0256 The training dataset of the present method comprises a set of genotyped animals with multiple genome wide markers and some performance measure such as EBV or trait phenotype. The information reduction algorithms search for the optimal relationship of subsets of markers which maximises the prediction of the EBV in the training population.
Once established via this "training set", forward predictions can be made with respect to untested individuals for which no EBV or trait measurement is available, but which have been genotyped either for all markers or for the appropriate subset of markers identified from the training set.
0257 Principal Component Analysis 0258 Principal Component Analysis (PCA) is a multivariate analysis technique in which the aim is to reduce the dimension of a dataset comprised of many correlated variables, while still accounting for a large proportion of the variance, Given a vector X of random variables, the first Principal Component (PC) is the linear function, wfX such that var(wTX) is maximised and w'w, Thej PC is the linear function, wj, which is orthogonal to all other PCs which maximises var(wrX). The problem of finding PCs is equivalent to finding the eigenvalues, and eigenvectors, w, of the covariance matrix of X, X, [0259 PCA can be used to identify redundancy or correlation among a set of measurements or variables for the purpose of data reduction. This powerful exploratory tool 874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31 AUG. 2007 18:14 SPRUSON FERGUSON 92615486 NO. 1555 P. 74 ith ability to include additional information'
PCA
0 b provides insightful graphical summaries with ability to include ad /or trends in the dataion.
provides tr iafuends in the data; can also be used to summarize large sets of data; identifY structure andOr tr Sidentify redundacy, correlation in the data; and produce insightful graphical displays of the identify redundancylts.
result 0260 Described hein is a method of predicting genotypic merit using PCA regression in methods applied to SNP data from the entire geoe A cross-valdatio method is used to Sselect the optimal number of principal omponets (PCs) to use n the regression, and select the optimal n] ~fh m el, Th methods to decide whbich PCs to include in the model are utilized to improve the model The 0 methods have been applied to simulated and real data for evaluation, o 0261]] Algorithm for Principal Component Analysis h .estimated
BV
s
(K)
0262 The individuals of interest can be partitioned into usv Vt. and those to have their BVs estimated The animals in the st K o the training set from which to estimate parameters which are to be used to predict the BVs of the animals in the set U. The SNPs which do not show any variation are removed from the study. The 0 et W N, where ;0 is the number of copies of one Sremai SNPs arearranged into a matrix X whee ii allele (0,1 or 2) in the ith SNP position for thej individual. PCA is performed for all individuals j K u U and (ii) only animals in the training set j o K [0263 separatelY to examine the effectiveness of the method when the St values for the training set are known, and when the SNP values of the training set are not available, but the rotation matrix is known.
,,-,traced from X to d, is computed, savea [0264] The vector of SNP means, io., is computned saved compnent fo the matrix of n Ps for na individuals, analysis is performed on the matrix X via the Expectation Maimisation (EM) algorithm as described by Rowes (1998), whic has an advantage in high dimensional dataecause it does not require computation of the sple covariance matrix. The algorithm to find the first npc is: for i npc do S74124-7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG. 2007 18:14 SPRUSON FERGUSON 92615486 NO, 1556 P. -71 0 Choose a vector w (w ,iw 1 )T so that (wT)w 1 Sloop (E step) Compute Y e (M step) Compute 'wew XY (y Scale wnew such that (infW)Tr (wnw) 1 0end loop .1 Subtract the projection of each point onto the principal component from X to obtain X w l^ end for Sto 0265 The Ith principal component is given by pe w X and all principal components c pc,) are now ordered such that pel accounts for the most variation in X and pc,, accounts for the least variation. The principal components and rotation matrix W, 2 w) are stored. A linear model of the form is fitted to the principal components: TJ Aej.i +Apcj,2 (1) (0266] where e- TeK is the measurement of a particular trait or BV of individual je K, pcj,i is the i th principal component for the j individual and are the regression coefficients. This is referred to as Principle Component Regression
(PCR).
0267 To predict the genotypic value of the desired individuals, the estimated regression zo coefficients from Equation 1 are used: r A pCJ2 Ap (2) I 0268 To examine the case where the SNP values of the training set are unavailable, but the rotation matrix is available, PCA is performed on the set K. It is anticipated that the use of animals in the set U may add noise to the PCs to be used in the PCR. In order to compare the accuracy of the PCR when PCA is performed on animals in the set K U to when PCA is performed on animals in the set K, PCA is performed on the set K. The regression S74124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31, AUG. 2007 18:15 SPRUSON FERGUSON 92615486 NO. 1556 P. 76 -72 2l) coefficients are estimated as before (Equation 1) The individuals whose breeding values are coffrens r eti where z! is the number of alleles of one to be predicted are arranged into a matrix where type in the th SNP position for the j 1 t individual as before The vector of mean SNP values from the training set, is subtracted from each row of 7 to form the matrix Z The O principal components are computed for these individuals by the equation: C ,peC...p Z W (3) [0269 These PCs are used to predict the genotypic merit through Equation 2.
[0270 Supervised Principal Component Analysis M 0271 Many SNPs may have no effect on genetic merit. The inclusion of such SNPs to may add noise to procedures used to predict BVs Supervised Principal Components Analysis (SPCA) is a method whereby a univariate regression is performed to measure the univariate effect of each gene on the BV. Only SNPs whose t-test on the regression coefficient exceeds a threshold, 9, are taken and PCA is perfonrmed on this subset of SNPs. This method is used for 9 2 (corresponding p-alue 0:05) and 8= 3 (p-value 0003). The case of 0 is equivalent to PCA.
[0272] Choosing the Number of Principal Components 0273 Classically, methods utilising the Eigenvalues corresponding to the rows of the rotation matrix have been used in order to choose the number of principal components to keep. This includes methods such as keeping principal components with eigenvalue greater than unity, Scree plot, Horn's procedure, regression methods, Bartlett's test and the brokenstick test (see, for example Johnson and Wichern (198) and Sharna (1996)). However, we have found that such methods greatly underestimate the number of principal components needed to accurately predict genotypic merit, since not all of the important information in the SNP data is necessarily captured in the leading principal components. This is because the 2s quantitative trait loci do not necessarily occur in areas of the chromosome where there is a large amount of variability and may be captured in PCs that ccount for a relatively small proportion of the overall variance.
8741247 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31-AUG.2007 18:15 SPRUSON FERGUSON 92615486 NO. 1556 P. 77 -73 Oil) 0274)] Described hereunder is a cross-validation method to determine the number of ;Z principal components to be used in the regression. In order to estimate the number of principal components required, the breeding values of nuk IS0 individuals are randomly dropped from the samnple and saved. These individuals form the group of unknowns, U and s the remaining individuals form the group of' cnowns, K. Principal component regression is V.0 performed, and the regression coefficients are estimated, with varying numbers of PCs being ~zI- used in the regression. The genotypic values of the nuk individuals in U are estimated, and c-I the correlation with their saved breeding values is examined. This process is repeated.
O [0275] Selection of Princiftal Components 0276)j Although the FCs are ordered from the PC which accounts for the most information to the PC which accounts for the least variation, this does not necessarily imply that the first PC contains the most relevant information for predicting genetic value. Thus, the association of some of the P~s with the response variables, which accounts for a significant part of the variation of the original data, may be spurious and therefore make the linear model unsound is for prediction.
0277 Three methods are used to select the PCs. in the first method, P~s are ranked according to the proportion of variance accounted for by each PC. Secondly, the correlations are computed between each PC and the response variable. The PCs are ordered according to their absolute correlation with the response variable, so that the first PC fitted in the model is the most highly correlated with the response variable. Forward stepwise regression may also be used to build the model. Under forward stepwise, regression, the Oh' PC added is the PC which adds the most information, given that the previous (k 1) PCs have already been fitted.
[0278 J The third method of ordering the PCs is a combination of the first two methods.
The PCs which are most highly correlated with the BV may account for a very small proportion of the variation in the SNPs, making the ['CR less robust. Similarly, the PCs which account for a large proportion of variance in the SNPs may not influence BV at all.
The PCs are ranked according to I, 874124_7 COMS ID No: ARCS-i 59253 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG. 2007 18:16 SPRUSON FERGUSON 92615486 NO, 1556 P. 78 -74-
;Z
0 0 ,2,p(pc 1
DV)
<[0279 1 where Xi is the th Eigenvalue and p (pci ;BV) is the correlation between the 0 PC and the BV.
S[ 0280 A fourth possible approach, not set out here in detail, would be to use the GA e s described below to select the best subset of principal components for use The principal Scomponents would form the explanatory variable inputs to the GA, for example instead of SNP genotypes.
0281 Genetic Algorithm Process 0282 We have developed a program for finding the molecular breeding value (MBV) or 1o quantitative trait loci (QTL) using a genetic algorithm when there are very large numbers of explanatory variables (SNPs, genotypes, haplotypes) and relatively few observations.
0283 A simple linear model was fitted. This contained an overall mean, a fixed (predetermined and parameterised) number of explanatory (genetic) effects and a residual. If the available data were less reliable, the inclusion of a polygenic effect would require the use jj of Restricted (or Residual) Maximum Likelihood (ReML). SNP effects were calculated by regression, and MBVs calculated for all individuals as the sum of the effects for each individual. These MBVs can later be compared with the EBVs of individuals, such as young bulls once their test results are analysed.
[0284 The model employed is a hierarchical model based on the Gauss-Markov theorem, including random effects, and is of the general form: y u+ Zf(g) e 0285 where the observations are the sum of the general mean the sum of the genotype effects (the molecular breeding value m) for the individual and a residual In matrix form this is expressed (where bold type represents a matrix) as yXp+e 8741247 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG. 2007 18:16 SPRUSON FERGUSON 92615486 NO, 1556 P. 79 75 0 0 Ol )0286 1 The normal equations are XTXl=XTY, which may be solved by direct inversion if f is short enough, viz.
X
T
y, or by iterative means otherwise, 0287] The errors are calculated from the general equation: e-y-XP.
O[ 0288] A genetic algorithm is used to find the optimum model. All models found will contribute to weighted averages of the SNP effects and MBVs.
o" [0289] Evaluation of Genetic Algorithm 0290) The ratio of the sum of squares of the model to the sum of squares of the best model is the same as the ratio of the likelihoods, so weights can be calculated as 8 e 0291 where e* is the vector of residuals from the best model. The weights, the product of the weights by the effects and MBVs (and possibly the sums of squares) are summed.
When a new best model is found, the weights and the sums of variables (explanatory or MBVs) are reduced in value by 1/w (multiplication) and e* is replaced by e, is 0292 The end results are the weighted averages of the P effects for all explanatory variables, and the weighted MBVs. Different numbers of explanatory variables are fitted and in different ways. With SNPs it is possible to fit the genotyp e s or simply the number 1 or 2) of one allele (as a covarlate). When more complex explanatory variables, such as haplotypes, are fitted they must be fitted as cross classified variables.
0293] The analysis program is written in such a way that other models for evaluation can be easily substituted for the initial one. This may even include other random effects, such as a polygenic breeding value.
[0294] Using the Genetic Algorithm to Find an Optimal Model 1 0295 1 In order to describe the GA in the terms commonly used by computer scientists working with GAs while avoiding confusion with the terms used by geneticists, it is 874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG. 2007 18:16 SPRUSON FERGUSON 92615486 NO. 1556 P. -76- 0 0 0i chromosome
(G
A C necessary to define these terms at the outset. Thus a geneic algorithm chromosome
(GA)
defines a model contains the explanator variables in a £0296] Each GAC derived for the genetic algorithm contains the explnt ariables in a model. This consists of the section of real chromosome, comprising eithe the loci or the haplotype With some models such as haplotype there may be a variable number of S categories. W chromos e al eg nt; some could have 2, 3, 4 or more. Ideally, segments at categories per chr om o so mal segm low frequency may be amalgamated into a single group.
0297 Prior to running the GA, XTX and XTy are created for all effects, allowing subsets to be etrieved during the GA rather than being re-calculated S[ 0298 A initial population of GAC i generatd by random selection of explanatory variables. All members of this population of GACs are evaluated as subsequently described.
e .anonm from the ,f he A two parent GACs a r e c h os en at r- [0299 In each round of the GA two parent QACs are cnosen f population. These are "mated" together to form an offspring AC, eles do not appear from each parent GAC and ensuring that the same explanatoy variables do not appear twice. If II they do, then others can be chosen randomly frm the complete set, or ro te et ote in the two parents which were not chosen if after evaluationb the offrig GAC outperformsA The GA perfanCe criterion is currently eTe, but is not restricted to this, for example, if a subset of individuals only to be predicted is included the sum of their squared prediction s errors could be used.
errors coulde exampl of use of the GA to evaluate MBVs comprises the steps of: A. Parameter definition 1, Total number of potential explanatory variables 2. Number of explanatory variables in the models 3. Number ofobservations 4. Number of individuals (includes individuals without observations Number of models in the GA B. Memory allocation and initialisation 1. declare variables 814124,7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 AUG. 2007 18:16 SPRUSON FERGUSON 92615486 NO, 1556 P, 81 I- -7 7 0 0 ci S2. zero variables 3. read data 4, build complete X'X matrix (half stored) n 5. build complete X'y C. Populate the initial set of models 0 1. Randomly choose explanatory variables C 2. Evaluate (see above) Compute MBVs and residuals b. Compute weights O c. Accumulate weighted sums of MBVs and effects D, Search with the GA until improvement ceases 1. Breed (see above) 2. Evaluate (as per step C.2.) 3. Replace parents E. Reportage 1. Report best solution 2. Report weighted averages (and standard errors) of the MBVs and effects F. End [0301] The algorithm may be repeated a number of times with different numbers of explanatory variables.
[0302] Evaluation of GAC 0303 Each GAC is evaluated by first loading the addresses of represented effects into a vector. The vector is then used to extract the subset of elements of XTX and XTy from storage. Solutions for P3 can be obtained by direct inversion of XTX if the nmuber of effects is sufficiently small or by iterative means otherwise. Weighted effects and MBVs (mn) are accumulated, and eTe is calculated.
874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 I31. AUG. 2007 18:17 SPRUSON FERGUSON 92615486 NO.1556 P. 82 -78- 0 c
O
c 0 0 ci 0304 1 Partial Least Squares Aalysis otypic merit using PLS methods [0305 Described hereunder is a process for prci ti methd is used for inteal aplied to SNP data frm he entire genole A crossvalidation methOd is used for inter applied to SNP data from the t determine a model's predictive capacity and to validation of data using cross-valdation to methods have been applied to real data for 5 determine the optimal model complexity. The m evaluation.variables
Y,
[0306 The PLS prediction method aims to predict q con lnusuL Las l Yq using p continuous explanatory variables X1. Xp. The available data sample aniting of n observations is denoted as where and y e denote and response varibles, respectively. The dots denote the Iitjh Observation of the predictor ad"s..
unCenered basic data. Their removal indicate the subtration of the sample average, ie.: i x [03071 The x (xixip)T are collected in the nXp matrix X. SimilarlY, Y is the nxq (0307 The xi (xil.s,~'T matrix containing the yi (yi, ,..yip)T xIT Y1 x' rY
Y
S0308 1 PLS is based on the latent basic component decomposition: X TYT
E
S- TQ F (2) 03091 were TeO is a matrix giving the latent components for the n observations.
03 09 WhereT R x is a matrix giqvinS h aetc y arxsO 0309 whered E 0 -P and F eD aeatiso Pcf 0xc and QeD q are matrixes of coefficients and EE P and Fb x are matrixes of random errors. transfoation of 0310) PLS constructs a matrix of latent components T as a linea ansfOatiO Of X: 874124) COMS ID No: ARCS-159283 Received by IP Australia: Time (tH:m) 18:19 Date 2007-08-31 31. AUG, 2007 18:17 SPRUSON FERGUSON 92615486 NO, 1556 P. 83 -79- 0 0 T XW (3) 0311] where W c P" is a matrix of weights. The columns of W and T are denoted as wi (wa wp)T and ti (tLb tni)T, respectively, for i= 1, For a fixed matrix W, the random variables obtained by forming the corresponding linear transformations of X, ,Xp s are denoted as TI, Tc: T1 =Wln Wpi XP, SrT
X.
0312 The latent components are then used for prediction in place of the original variables: once T is constructed, Q is obtained as the least squares solution of Equation Q (T'T)-TY 0313 Finally, the matrix B of regression coefficients for the model Y XB F is given as: B= WQ r W(TrT)ITry.
[0314] For a new raw observation xo, the prediction Yo of the response is given by n 1 k- 0315 In PLS, dimension reduction and regression are performed simultaneously, i.e. they output the matrix of regression coefficients B as well as the matrices W, T, P and Q. In the PLS literature, the columns ofT are often denoted as 'latent variables' or 'scores'. P and Q z0 are.denoted as 'X-loadings' and 'Y-loadings', respectively. Latent variables and scores can be used for diagnostic purposes and for visualization.
Algorithm for Partial Least Squares Analysis 0316 The individuals of interest may be partitioned into those with estimated BVs (L) and those to have their BVs estimated The animals in the set L form the training set from which parameters are estimated that are to be used to predict the BVs of the animals in the set K. The SNPs that do not show any variation are removed from the study. The remaining SNPs are arranged into a matrix X* where x is the number of copies of one allele (0,1 874124 7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31, AUG. 2007 18:17 SPRUSON FERGUSON 92615486 NO, 1556 P. 84 080- 0 or 2) in the i' SNP position for the j" individual. PLS is performed for all individuals Se L tuK and (ii) only animals in the training set j L separately to examine the effectiveness of the method when the SNP values for the training set are known and when the c SNP values of the training set are not available, but the rotation matrix is known.
0317] PLS analysis was performed using a KERNEL PLS algorithm (see Dayal B.S, and I, J.F. Macgregor: Improved PLS Algorithms, Journal Of Chemometrics, vol. 11, 73.85 (1997)).
This method is particularly efficient when the number of SNP markers is much larger than the C'l number of responses, as it does not require the calculation of the sample covariance matrix of t'- SX. The algorithm has the following form: Cl 0 1. Compute weights of the sample covariance matrix X.
2. Compute score weights.
3. Compute the loading vectors p. and q,.
4. Update the covariance matrix, store w, p, q and r in W, P, Q and R 6. Repeat steps 2 to 5 for computation of each latent vector.
7. When done computing latent vectors, the regression coefficients are given by Bps RQT.
0318] More rigorously, the steps of the algorithm are described as follows: [0319] For each a 1, A, where m is the number of response variables and A are the number of PLS components to be computed: 1. Ifm I W. XrYa else compute q, the dominant eigenvector of (YTXXTY).
2s wT q, W" 5 2. ri wi r7 W pwr p2Wrz P.-iw1r 41 a >1 B74124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 AUG. 2007 18:18 SPRUSON FERGUSON 92615486 NO. 1556 P. -81 0 0 3. tu'Xr pa trX /Vt.
a
T
n q- ra r (XY)a /tTta S4. (XTY)ai (X paq (tata) W [Wl W2 WA r- P [p Pz--.PA] S0(qi q2 41 R=[rI rA] 6. Go to step 2 for next latent vector computation 7. Retrieve regression coefficients BpLs
RQ.
0320] Model Validation Procedure 0321 The critical issue in developing a "good model" is generalization. How well will the model make predictions for cases that are not in the training set? A model that is too z0 complex may fit the noise, not just the signal, leading to overfitting [0322 A over fit model may well describe the relationship between SNPs and EBVs of the sires used to develop the model, but may subsequently fail to provide valid predictions (molecular breeding values MBV) in new bulls. As will be shown in the following examples, the derived PLS models show adequate fit of the data and provide valid predictions of of MBV in new bulls.
[0323 Internal validation of data using cross-validation is performed to determine a model's predictive capacity and to determine the optimal model complexity number of latent components). The number of latent components is estimated by cross-validation teohniques with is the process of removing observations from the data in a stepwise ao procedure, computing a prediction model based on the remaining samples and finally testing 874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG. 2007 18:18 SPRUSON FERGUSON 92615486 NO. 1556 P. 86 -82 0 o value for the excluded he calculated model by comparing the stimated value with the true value Sobservalclatedion. This process is then repeated by excluding a new selection of observations, until all observations have been excluded once.
0324 I the following discussion, the complete data set earning set, ons
N
Sobjects. The learning set was partitioned in k segments (c 10) of length 1 (1 If k the k I N last segments contained only L-1 objects. The N objects form the construction data which is used to derive the predictive model using PLS, which then in turn n c t c dnsquared error of c, is used to predict the removed I objects (the validation data). The mean squared erra k-fold Sprediction (MSEP) of Equation above is used as the objective function cross-validation estimate.ed, in which the 0325 To further validate the models a different approach was applied, in which the indices of the response variable were randoml permutated so that responses do not agree indices of the response variable were rfo randomized mdels indicate that the with those of the SNP data. High predictive scores for randomized models indicate that the model suffers from overfitting and that fewer predictors must be used.
1 1 0326 Feature selection set 0327 1 The goal of feature selection is to identify a reduced set of non-redundant SNPs that are useful in predicting breeding values. The SNP marker set is pruned by eliminating insignificant SNP (as will be described with reference to the methods described below, in particular with referenc the VIP method). Removal of uninformative SNP decreases the zo noise and complexity and therefore can improve the prediction performance of the model An issue which is tightly connected with the prediction of breeding values is gene detection, the identification of SNP whose genotypes are associated with the considered outcome.
identification of SNP whose genotypes o cst-effective genotyping of Furthermore, a reduced SNP set d ion etc. which can not animals and allows to apply statistical methods (ordinary regression etc.) which can ot 032 handl Five mase where used for feature selection. In the first, the loading vector of the f 0328 l Five methods are response PLS model, w, is used, where wl is the weight of irst latent component of a si gle r m
,TO
the first latent component ti in the transformatin matrix of Equation above. This method, however, only provides limited information.
874124.7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31. AUG. 2007 18:18 SPRUSON FERGUSON 92615486 NO. 1556 P. 87 -83- I 1 0 [0329 A second selection approach is based on several latent components of the PLS Smodel and uses the weight vectors Wi, ,we, and has the advantage that it is capable of captrig information on a single SNP from all PLS components included the PLS analyis.
Thus it can discover non-linea patterns which the previous measure would a to variable influence of SNP k for the a-th PLS component is defined as a function of wka VIP S(variable importance in projection) is the accumulated sum over all PLS dimensi of the variable influence: 4 -SSY ssY 0 ssrY [0330] whe (S SSY) is the sum of squares explained by PLS dimension a, The sum of (0330 where (SSY, SSY) is th( model and therefore the average VIP squares of all VIP's is equal to the number of in the moel a therefore the average VP would be equal to 1. SNP with large VIP, larger than ar the most relevant for explaining Y, Th VIP values reflect the importance of terms in the model both with respect to Y, i.e. its correlation to all the responses and with respect to X.
0331 The third approach is based on finding a threshold value of w d only SN with Is values over the derived threshold are used for modelling. A new X-mat is created by column-wise permutation of the elements in X. For example, this may be repeated n times, which may be 10 times or more. The new randomised X-matrix will then consist of n times the number of variables in the original X-matrix (for example, with 10715 initial SNPs and iterations, the new randomized X-matrix will have 107150 variables). Using this new permuted X-matrix a new PLS model is then calculated. The SNP are then ranked according to their wi- values. For a given rate of false positives 1% false positives) the cutoff point will be at the 1701 (107015 0.01) largest w, value, fo w, the weight of the first latent component.
0332 After ranking the SN according to oe of the three methods above, the final Spredictive model is build in a serious of selection steps. At the start of the selection process, a PLS analysis is performed including only the highest ranked marker. In subsequent steps, SNP are added to the model according to their rank. A marker is retained in the final list of 874124A7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31.AUG. 2007 18:19 SPRUSON FERGUSON 92615486 NO. 1556 P. 88 -84on selected SNP if its inclusion to the model resulted in a decrease in the cro ss-validated ;Z prediction error.
en [00115]J The fourth method of feature selection is a multivariate variable selection strategy utilising a genetic algorithm (G3A) search procedure (similar to that described above) coupled to the unsupervised learning algoritbhn of the PLS methods described above.
en (0333] Genetic algorithms are variable search procedures that are based on the principle of evolution by natural selection. In the GA terminology variables are defined as genes whereas 17- a subset of n variables that is assessed for its ability to -fit a statistical model is called a 8 chromosome. The procedure works by evolving sets of variables (GA chromosomes) that fit certain criteria from ani initial random population via cycles of differential replication, recombination and mutation of the fittest chromosomes.
[0334] The GA algorithm for the present feature selection method maybhe implemented as follows: 1. Start with a randomly generated population of n chromosomes.
The chromosomes have fixed length 100 SN\P markers).
2, Calculate the fitness f of each chromosome x in the population, f(x) 1(R2) 3. Repeat the following steps -until n offspring have been created a. Select a pair of parent chromnosomnes from the current population, the probability of selection being an increasing function of fitness. Selection is done "with replacement," meaning that the same chromosome can be selected more than once to become a parent.
b. With probability pc (the "crossover probability" or "crossover rate"), cross over the pair at a randomly chosen point (chosen with uniform probability) to form two offspring. If no crossover takes place, form two offspring that are exact copies of their respective parents.
e. Mutate the two offspring at each locus with probability pm. (the mutation probability or mutation rate), anad place the resulting 874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time (I-tm) 18:19 Date 2007-08-31 3 AUG. 2007 18:19 SPRUSON FERGUSON 92615486 NO. 1556- P. 89 chromosomes in the new population. If n is odd, one new population ;member can be discarded at random.
4. Replace the current population with the new population.
Repeat from step 2.
o 0335] The chromosome size is fixed by an initial parameter and the GA procedure
IN
en provides a large collection of chromosomes, Although these are all good solutions of the problem, it is not cleax which one should be chosen for developing a final model. The fixed chromosome size implies that some of the SNP selected in the chromosome could not be ocontributing to the prediction accuracy of the correspondent model. For this reason there is a to need to develop a single model that is, to some extent, representative of the population.
0336] A simple strategy to follow is to we the frequency of SNP in the population of chromosomes as criteria for inclusion in a forward selection strategy. The model of choice will be the one with the highest prediction accuracy and the lower number of SNP. However alternative models with similar accuracy but larger number of SNP can also be developed.
t This strategy ensures that the most represented SNP in the population of chromosomes are included in a single summary model.
[0337 A fifth method for variable selection is based on uncertainty measurements (standard errors and confidence intervals) of the PLS regression coefficients. The method is based on the so-called "Jack-knife" resampling (Efron, Tibshirani, R.J. (1993)) comparing perturbed model parameter estimates from cross-validation with estimates from the full model. The formula of the jack-knife estimation of the standard error for 5. is as follows: r "1/2 n )2.J [0338 where x9pj is the PLS regression coefficient, the ith observation having been removed from the data set before the determination of the PLS model, and i 4 is the average of the u values fi/.
874124_7 COMS ID No: ARCS-159283 Received by 1P Australia: Time 18:19 Date 2007-08-31 31, AUG. 2007 18:19 SPRUSON FERGUSON 92615486 NO. 1556 P. -86 0 0 O) 0339 The limnts of an approximate (1 a) confidence interval for I3p are defined as: i c P S[ 0340 where tn-. a /2 is the Student (a 2)th percentile. For a chosen a, all of the variables whose PLS regression coefficients have jack-knife confidence intervals that contain zero are s eliminated at the same time, 0341 Variable selection based on the jack-knife as it is described above for the PLS C, regression coefficients can be applied in the same way to VIP.
8 10342] The jack-knife technique is also useful for detecting outliers. Uncertainty measurements C (standard errors and confidence intervals) can be computed for scores, loadings and predicted Yvalues of a PLS model.
[0343] Validation offeature selection 0344 The main goal of feature selection methods described above is to select a subset of the original SNP such that the resulting model can perform well on unseen future data points.
The commonly used validation strategy for the feature selection consists of: Step 1) Selection of features by using all the data points.
Step 2) The obtained model with the selected features is validated under a validation scheme (cross-validation, bootstrapping, etc.).
0345 In the examples below of the present case, the cross-validated prediction error is calculated within the feature-selection process. Therefore, the estimated error is optimistically biased, due to testing on samples already considered in the feature selection process.
0346 To correct for this selection bias, cross-validation or the bootstrap validation is used external to the gene-selection process. This requires that samples in the test set must not be used in the training set.
[0347] In general the sample will be relatively small, and one would like to make full use of all available samples in SNP selection and training of the prediction rule.
874124_7 COMS ID No: ARCS-159283 Received by IP Australia: Time 18:19 Date 2007-08-31 31, AUG. 2007 18:31 SPRUSON FERGUSON 92615486 NO. 1558 P. 2/88 1" -87- 0 O) 0348 The use of different training subsets results in different list of SNP, how0er many Z or most will overlap. The most frequent SNP are selected to form the final list of selected
SSNP.
r n [0349] The procedure outline is as follows: Va1. Divide the data into M parts of equal size.
e 2. For each M-l part DO: C 2.1. Define a series of ranked SNP dO dl using one of the 0^ selection approaches described above.
2.2. At step i perform a forward selection starting with the current di
SNP.
2.3. Estimate the prediction error using the remaining m subset, retain the SNP if it improves the prediction error.
2.4. Set i i+l, repeat from step 2.2, 3. Calculate error rate at each d-0~dk level, 1 4. Select the top SNP with the highest frequency.
S03501] Figure lE shows a schematic outline of an arrangement of a validation technique for feature (eg. SNP) selection and assessment. The data is first split into M parts of equal size. The M-1 sets 110 form the training set (TRm) and the remaining subset 120 is used as testing set (TSm) For a given training set TRm 130, a SNP ranking method produces a list of ranked SNP (RSm) 140. Models Mini 150 are are developed for increasing SNP subsets. The Mini models 150 are evaluated on the TSm test data, comniputing the prediction error Em 160.
The average error Et 170 is obtained as 1L
M
By then selecting the most frequent SNP, an optimal feature set n (180 of Figure 1E) is derived.
0351] Handling of Missing Data 8741247 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 3i. AUG 2007 18:32 SPRUSON FERGUSON 92615486 NO. 1553 P 3/88 -88 0 [0352 Missing data is a common feature in large genomic data sets. Dealing with missing ;genotypes can follow different strategies. Eliminating SNP markers with incomplete observations will result in considerable information loss if many SNP have missing genotypes for various animals..
[0353 For example the percent of missing SNP genotypes was 0.8% for 16565390 data npoints (1546 bulls x 10715 SNP). Despite this very low rate, after eliminating SNP marker with one or more missing genotypes only 68 SNP remained, In order to be able to apply Cdimension reduction methods to the complete SNP data we used an imputation approach, i.e.
oreplacing each missing genotype with a predicted value. We applied imputation with the 1i 0 NIPALS (nonlinear iterative partial least squares) algorithm, The aim of the NIPALS algorithmn is to perform principal component analysis in the presence of missing data.
0354 A demonstration of the performance of dimension reduction by means of PLS in combination with missing SNP genotype prediction using NIPALS is shown in Figure
IF.
Missing values of SNP genotypes were randomly generated in the range of 5% up to 85% and subsequently predicted from the 1st and 2nd principal component and factor using the NIPALS algorithm. The analysis was replicated 5 times and is shown in each of the lines of Figure IF. For each replicate 200 animals were randomly selected as test data group of animals for which breeding value was predicted based on SNP, molecular breeding value (MBV). Animals in the test data sets did not overlap between replicates. Analyses were performed for the trait APR. The results show that even in the case of a large proportion of missing marker genotypes most of the SNPs can be reconstructed with a minimal loss of infonnation, For example, increasing the proportion of missing genotypes from 5% to results in a slight decrease of the average correlation between MBV and known breeding value (EBV) from 0.80 to 0.78.
z 0355 Application to Individual Breeding Programme 0356 The MBV estimation procedure is applicable to all traits commonly recorded by, for example, the dairy industry including individual phenotype traits such as either bull or cow fertility and semen quality etc. For example, the MBV estimation technique could be used for, but is not restricted to, phenotype traits such as APR, ASI, Protein kg, Protein 874124_ COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 31. AUG. 2007 18:32 SPRUSON FERGUSON 92615486 NO. 1558 P. 4/88 -89- 0 Percent, Milk yield, Fat kg, Fat Percent, Overall Type, Mammary System, Stature, Udder STexture, Bone Quality, Angularity, Muzzle Width, Body Depth, Chest Width, Pin Set, Pin Sign, Foot Angle, Set Sign. Rear Leg View, Udder Depth, Fore Attachment, Rear Attachment
C
c Height, Rear Attachment Width, Centre Ligament, Teat Placement, Teat Length, Loin s Strength, Milking Speed, Temperament, Like-ability, Survival, Calving Ease, Somatic Cell SCount, Cow Fertility, Gestation Length, or a combination thereof.
[0357 The system described herein may be readily adapted for prediction of the ABV of an animal external to the local population of animals such as an animal that has been Simported into Australia from overseas and the likely impact the imported animal will have St on the breeding within the local population. At present, external animals such as imported bulls in relation to the dairy industry are usually re-ranked when used in Australia due to genotype by environment interaction (GxE), however, the addition of the environmental factors creates a large degree of uncertainty with respect to the local population. It is anticipated that the methods described herein significantly reduce the degree of uncertainty s for animals which have been progeny tested overseas, which has a large impact on the generation interval and associated costs.
0358 The methods described above will now be further described in greater detail by reference to the following specific examples, which should not be construed as in any way limiting the scope of the arrangements of the methods.
EXAMPLES
[0359] Development of high-density large-scale single nucleotide polymorphism
(SNP)
genotyping platforms has opened the possibility of GWS in any species. The following examples illustrate the techniques described above when applied to a base set of dairy cattle comprising 1546 Australian progeny-tested dairy bulls which were tested for 15,036
SNP
markers, leading to the following GWS platform for use in dairy cattle.
[0360] SNP discovery [0361] The platform is built on a commercial SNP genotyping platform (Parallele- Affymetrix) incorporating 10,410 public domain SNP markers and around 4,626 proprietary 874124_7 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 31 AUG. 200 18:33 SPRUSON FERGUSON 92615486 NO 1558 P. 5/88 -91- 0 S[ 0369 A remarkable feature of model selection and cross validation methods has been the Saccurate prediction of true breeding value (TBV) via EBV. Accuracies of prediction within the range of 0.7-0.85 in the absence of pedigree, and QTL/gene information have been C obtained.
S 0370 Typically only a fraction of the available SNP are used to predict MBV for Sall major traits used in dairy cattle selection. Realization of GWS may therefore well Srepresent the first true promise of DNA based technologies for livestock improvement.
[0371] Utility and Application of GWS 0372 Deriving MBV from a population in which future predictions have to be made offers immediate use in young sire and elite dam selection. Features of GWS can be readily incorporated with advanced reproductive technologies, leading to greatly increased rates of genetic gain and potential significant cost reduction as breeding programmes move from progeny testing in sire selection to progeny validation. Use of MBV allows for screening of suitable germplasm from global sources, and may possibly extend to incorporate gene-byenvironment (GxE) and gene-by-gene (GxG) and an NRM based on shared genome content in genetic evaluation, Molecular keys (coefficients) for GWS can be readily updated as new sires enter the industry.
[0373] Additional applications 0374 In addition to GWS, the SNP information can be used in, among other applications, the assessment of genome wide and population diversity, mate selection, management of inbreeding, study of inherited disorders, pedigree validation, assembly of the bovine Hapmap, and high-density integrated maps.
0375 Example 1: Demonstration of the Genetic Algorithm 0376 Data from two sources were analysed separately. Genotypic data were taken from either the Affymetrix 15380 SNP chip or an independent genotyping of 1282 SNPs using the Illumina platform. The Affymetrix data corresponded to 1545 bulls with EBVs in the 2006 ADHIS genetic evaluations. The Illumina data corresponded to a subset of 412 of the 1545 bulls. In relation to this, reference is made to International Patent Application No.
874124_7 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 31. AUG. 2007 18:33 SPRUSON FERGUSON 92615486 NO. 1558 P. 6/88 -92- 0 01) PCTfUS2006/04 174 5 dated 25 October 2006, corresponding to Australian Provisional Patent ;Application Nos. 2005905899 and 2005905960, the entire disclosures of each of which are incorporated herein by reference.
0377 The SNP markers are derived from a comprehensive bank of 1545 DNA samples o from all available sires which have ABVs based on progeny tests. Location knowledge was edetermined to choose 5000 additional markers in regions of most interest. All 1545 bulls were genotyped with the 15,000 SNP marker panel.
[0378 3 This provides the ability to link the discovery phase to the application phase in a 8single step, and to make predictions of genetic merit in young prospective bulls to be used in the Australian national dairy herd under Australian conditions. Some of the semen samples are from bulls born more than 50 years ago; thus deep pedigree structures which are essential for certain powerful statistical analyses can be structured. Of the collection of 1650 DNA samples available, some are from the sire or grandsire of a bull which has been thoroughly progeny-tested by well-accepted methods.
is [0379) Editing of the Affymetrix SNP genotypes was performed to remove SNP with no genotyping data present; more than 100 unknown genotypes; a minor allele frequency of less than 0.1; and a degree of synonymy greater than 0.95.
0 E 0380 After these edits were sequentially applied, 7420 SNP remained, The same edits were applied to the Illumina data set to leave 550 SNP, These edits may not always be applied in the future, or may be revised as necessary in accordance with requirements.
0381 The Affymetrix data were analysed using the GA set to model 500 SNP simultaneously. The observations on the 1545 bulls used were the EBV for protein yield (kilograms of protein). The resulting estimates of MBV explained 97% of the variation in the BLUP EBVs of the 1545 bulls. Figure 2 is a plot of MBV v EBY for this analysis. This analysis was repeated with the GA fitting either 10, 25, 50, 100, 200, 300 and 500 SNPs simultaneously. Figure 3 shows the correlation between the MBV and EBV for the 1545 bulls included in the analyses, 874124_7 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 31-AUG-2007 18:33 AU. 2~718:33 SPRUSON Ft-RGUSON 92615486 Q158 .78 NO. 1558 P. 7/88
;Z
-93- 0382 jDue to the limited size of the fli-umi-na dataset, the GA was set to model 100 SNP simnultaneouslY. F-stimated breeding -values for each of 38 traits and indices which showed variation for the 412 bulls were analysed. The correlations between the weighted est imates of the MBV produced and the BLUP EBV r anged from 0. 83 to as shown in Table 1.
Table I Correlations between MBV and EBY of 412 bulls for each of3 8 indexes and traits analysed using the Illumna genotype data and ADHIS EEV. The GA was set to find the best 100 SNP model.
r 9c trait r ITrait Vz p0383] [0384] Example I Effectiveness of prediction Editing of the Affymetrix SNP genotype$ was performled to remove SNP with a minor allele frequency of I ess than 0. 1; and a degree of synonymy greater than 0.95.
0385 j After these edits were sequentially applied, 7865 SNP remained. These edits may not always be applied in the fulture, 874124-? COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 AUG. 2007 18:34 SPRUSON FERGUSON 92615486 NO. 1558 P. 8/88 S-94- 0 0386 The 1545 genotyed bulls were matched with a set of ADHIS evaluation results Sfrom August 2001 to give 1516 bulls with either an EBV for protein kg or a sire-maternal from August 2001 to g 1 1516 buls 63 eb ,i grandsire prediui u'f their 2001 EBV for protein kg, of t e 1516 bulls, 163 were born in the years 2000 or 2001, and hence would not have any progeny daughter records included in Sthe August 2001 evaluation.
S0387 Ten random subsets of 75 bulls were selected from the 163 bull cohort and the GA run 10 times, with each of these subsets being excluded from the regression analyses but their C MBV being predicted using the outcomes. Thus 1441 bulls were used in the estimation of the predictors, and 75 bulls were predicted. The GA was set to locate the best 200 SNP model.
Sl The mean correlation between xu'&V and EBV foi taw 10 greups of 75 animals was 0 74. and they ranged from 0.69 to 0.78, which is less than the 0.9+ correlations netween lvTV mid EBV for individuals in the training set.
[0388 Figure 4 displays the cumulative proportion of the variance accounted for by the PCs when PCA and SPCA are used. If all 1546 of the PCs are taken when PCA is used, clearly all of the variance of the original data is contained (line 10 of Figure The tirst 200 and 500 PCs account for 50% and 75% of the variation respectively when all of the SNPs are used in the reduction. The SPCA methods do not account for 100% of the total variation when all PCs are included, because not all of the original 15380 SNPs have a t-value greater than the threshold When 9= 2 (line 12 of Figure 42.69% of the SNPs are taken, and these SNPs account for 35.54% of the total variation, and when 09- 3 (line 14 of Figure 4), 22.39% of the SNPs are taken, which account for 1811% of the variation in the unedited data.
0389] Pairwise plots of the BVs of the animals and the first 3 PCs reveal some interesting structure in the data, as displayed in Figure 5. The plots above the diagonal are obtained 2s when PCA is used, and plots below the diagonal are from SPCA with 0 2. Figure distinguishes between animals born before 1995 and those born in 1995 or later. This year was chosen because it divides the animals into two approximately equal groups. In the majority of plots above the diagonal in Figure 5, the year of birth of each animal influences 874124_7 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 31. AUG. 2007 18:35 SPRUSON FERGUSON 92615486 NO. 1558 P. 9/88 -95 0 02the distribution of points. It can be seen that animals born before 1995 tend to have lower ;breeding values than those born in 1995 or afterwards.
£0390 When PCA is used to reduce the data, older animals tend to have a lower score for PCI than newer animals, indicating that PCI is in the opposite direction to selection pressure.
There are two distinct clusters in the plot of PCI against PC2, where age defines the cluster to nwhich animals belong. A number of outliers can also be identified from the pairwise plots which arise from PCA.
0391] When SPCA is used to reduce the data, more outliers can be identified, and less 0 variation is evident in the first four PCs. Animals of similar age are not grouped together to when the PCs are plotted against each other, and these plots are more elliptical in shape than their counterparts which are often obtained when PCA is used.
[03923 Example 2: Principal Component Analysis Simulation 0393 Organisms having two copies of one chromosome of length 20 million base pairs were simulated. A total of 1,000 SNPs were placed on the chromosome, with their base pair is positions sampled from the integers between I and 20 million without replacement, Some of these SNPs were simulated to have an additive effect, and these effects were sampled from a N(0,1) distribution a Normal distribution with mean 0 and variance In order to simulate the effect of Linkage Disequilibrium a small nunber of chromosomes, nc, was created in order to generate the base population. The number of founder chromosomes used was nc 20 and (ii) nc 200. The probability of a less common allele at the i h site, pi was sampled from a uniform distribution randomly sampled between 0 and so that the matrix of haplotype values for the original chromosomes is given by: with probability 1 p, Bq 7with probability p, 0394 The, top 30% of the rows of the uatnx B were paired up to form males and the zs remaining 70% paired up to form females. Random mating was performed to produce 500 individuals. The distance between cross-overs in the breeding process was sampled from a Poisson distribution with parameter 1 million, so that each chromosome is 20 Morgans long, No mutation was simulated, 874124.7 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 31. AUG. 2007 18:35 SPRUSON FERGUSON 92615486 NO, 1559 P, 10/88 -96 0 [n j 0395 Figure 6 is a schematic diagram of the propagation from one generation to the ;next. The population structure was designed to be a simplified representation of the breeding structure in place in the dairy industry in Australia. The initial population of 500 anmals (generation 1) was split into 40 males (20 of Figure 6) and 460 females (22 of Figure 6) and random breeding was simulated to form a new 395 animals 24 and 26 in the generation in Figure 6, Ten of these animals (24) were male and 385 (26) were female. Thirty males and 75 of the females from the previous generation (28 and 30 respectively) were cIadded to the current population of 10 males and 360 females to form the next generation (not shown). This process was repeated for 10 generations, and the last three generations were stored.
0396] The phenotypic value for each animal was calculated as: in000 T= Zqa,+s 1=1 (0397) where qi is the number of less frequent alleles (0,1 or 2) at SNP position i, aj is the allelic substitution effect of the ih polymorphic allele and s is sampled from a N(0, 'o distribution, The allelic substitution effect is sampled from a Gamma distribution with shape parameter 0.59 and scale parameter 7.1, with an equal probability of this effect being positive or negative. The predefined heritability (h2) and the additive genetic variance determine via the equatiofl 2 [0398] Example 2 Simulation Results 0399 Figure 7 examines the predictive performance of principal component regression for the simulated SNP data when h 2 of the trait is varied as well as the number of SNPs with an additive effect, nsa. Figures 7(a) to are respectively the correlation between estimated breeding value and simulated breeding value when: 10 SNPs have an additive effect and 20 chromosomes are in the initial population; 100 SN4Ps have an additive effect and 20 chromosomes are in the initial population; 1000 SNPs have an additive 874124_7 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 31 AUG. 2007 18:35 SPRUSON FERGUSON 92615486 NO. 1558 P. 11/88 -97- 0 Oi)effect and 20 chromosomes are in li hnitil population; 10 SNP have au $dditive effect ;and 200 chromosomes are in the initial population; 100 SNPs have an additive effect and 200 chromosomes are in the initial population; and 1000 SNPs have an additive effect and 200 chromosomes are in the initial population.
Vao 0400 The simulated heritabilities are 0.4 and 0.7 and each line is the mean of 50 samples. The PCs are added according to the proportion of the total variation accounted for. It can be seen that the optimal number of PCs to use is about 30 for all nine combinations of h 2 and nsa when no -20 (Figures 7(a) to with correlations of greater than r 0.9 for all combinations and greater than approximately r 0.98 for heritability values of h 2 0.4.
0401 Beyond this optimal number of SNPs, spurious PCs are fitted and the correlation between the estimated and true values decreases rapidly, before this descent becomes more gentle at about 50 PCs. As expected, the heritability of the trait influences the performance of the PCR, with higher h 2 values allowing better prediction of genotypic merit when the is optimum numbers of PCs are fitted. The influence of the number of SNPs with an effect is more subtle. For low h 2 nsa has little effect on the performance of PCR. However, for h 2 0:7, and h 2 273 0:4 increasing, the number of SNPs with an additive effect from 100 to 1000 improves the performance of PCR when more than 50 PCs are fitted.
0402 J When nc 200 ((Figures 7(d) to the number of SNPs with an additive effect, nsa, has very little influence on the performance of the PCR. The h 2 has a larger effect when nc= 200 than when no 20, with higher h 2 yielding better predictive performance. More PCs are required in the regression when no 200, with around 125 PCs needed for a h 2 of 0.7 for optimum predictive performance.
0403 J Example 3: Principal Component Analysis SNP Data [0404] SNP data comprising 15380 SNPs taken from 1546 male animals born between 1955 and 2001 which come from a large recorded pedigree were used, so that breeding values were supplied for each animal along with the reliability of each estimate. Of the 23,777,480 SNP values, 7.10% are missing values. All of these missing values were replaced with Is, so that all of the SNP values are consistent with Mendelian principles for the entirely male data 874124_7 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 31. AUG. 2007 18:36 SPRUSON FERGUSON 92615486 NO.1558 P. 12/88 -98 0 Oil) set. If SNP data from female animals was desired to be included in the data set, any missing Z -values could be samlpled from the set of possible values given the parental genotypes. There are only males in this population, so any genotype is feasible for the sire or its offspring; if the dams' genotypes had been known, then the missing values would have been sampled from the possible set given the parents; genotypes. It will be appreciated that if the animal is the en progeny of two similar homozygotes it must have the same genotype as its parents.
0405 Example 3 SN? Results 0406] Figure 8 shows the mean correlation between the predicted and measured ogenotypic merit when the cross-validation method described above is repeated 40 times (i.e.
c' 10 each line is the mean of 40 samples), with the PCs being added according to the proportion of variance accounted for in the unrotated data. PCs were added according to the size of the corresponding eigenvalue correlation with the BVs and a combination of the two methods Figures 8(a) to 8(f) respectively refer to the cases when PCA is performed on all animals (K U U) and all SNPs, PCA is performed only on animals with known BVs and all SNPs, PCA is performed on all animals (K Q and SNPa with 9 2, PCA is performed only on animals with known BVs and SNPs with 0 2, (e) PCA is performed on all animals (K u U) and SNPs with 6> 3, PCA is performed only on animals with known BVs and SNPs with 0> 3.
0407 When all SNPs are used in all animals (Figure the mean correlation reaches a maximum of 0.65 when 300 to 500 PCs are fitted according to their eigenvalues, and gradually reduces as more PCs are fitted. Before this makimumn is reached the curve is not monotonically mcreasing, with the inclusion of some PCs in the regression reducing the predictive performance of the model. When PCs are added according to the correlation with the known BVs a maximum of 0.57 is obtained, and when PCs are added according to the 2s value of is,I the maximum is 0.63.
[0408 There is a slight improvement in predictive performance when SPCA is used on all individuals (Figures 8(c) and This improvement is greatest for 0 3, where a maximum mean correlation of 0.67 is obtained for methods adding PCs to the regression 8741247 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 31. AUG. 2007 18:36 SPRUSON FERGUSON 92615486 NO.1558 P. 13/88 -99- 0 COil according to 21 and according to ji. When the correlation between the PCs and BVs is used to determine the order in which PCs are added, the maximum is reached after relatively few en PCs, but then falls away quickly.
[0409 The best predictive model for these data is when PCA is performed on individuals o with known breeding values (Figure A maximum mean correlation of 0.69 is obtained nfor all three methods of adding PCs to the regression when more than 600 PCs are added.
cWhen SPCA is used only on the individuals with known BVs, the estimates are further from the known BVs.
c--o.[0410] Example 4; Comparison of MBV and EBV as predictors of true BV [0411] The ability of MBVs and BLUP EBVs to predict true BV was compared using a simple simulated example. The PCA was used to predict the MV of the individuals in a simulated population where the true BVs were known for comparison. The data consisted of 1,000 SNPs, evenly spaced across the genome, with effects sampled from N(0, 1) and some regions were more favoured than others to give assumed differential gone locations across the is genome. A heritability of 0.30 was used in both the simulation and BLUP analyses.
A
pedigree with approximately 1500 individuals was created.
0412 Figures 9 and 10 show the significant improvement of the MBV from the PCA for predicting the true breeding value of the individuals in the simple example compared with the comMonly-used BLUP techniques over two generations.
0413 1 Figure 9A is a plot of the BLI:P EBV for the simple example against the true BV as simulated, resulting in a correlation of 0.63. In comparison, Figure 9$ is a plot of the MBV for the simple example against the true BV as simulated, showing a significant improvement in the correlation to a value of =0,9 8 0414 Figure IDA is a plot of the BLUP EBV for the next generation of the simple example against the true BV as simulated, In this generation the correlation using the BLUP methods has deteriorated to only r 0.49. In comparison, Figure lOB is a plot of the MBV of the next generation for the simple example against the true BV as simulated. In this case, the correlation is r 0.96 which is only a reduction of about 2%.
874124-7 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 31, AUG. 2007 18:37 SPRUSON FERGUSON 92615486 NO. 1558 P. 14/88 100- 0 0415 It is clear that calculation of MBVs provides a clear advantage over currently-used methods for prediction of BVs in a population across generations, at least for simple modes of inheritance.
0416] Example 5: Partial Least Squares Analysis O 0417 Table 2 shows the results of PLS analysis for 38 indexes and traits of 1546 bulls e using 10715 SNP. The proportion of the variance accounted for is shown for the PLS model of optimal complexity. The optimal complexity number of latent components) was derived by 10-fold cross validation. A relatively small number of latent components is required to account for a large proportion of the EBV variance (69% Less than 1 S 10 of the SNP variance is explained by the model, indicating a large proportion of redundant information in the marker data. The correlation between MBV and EBV is computed as the square root of the proportion of the explained EBV variance and lies between 0.82 and 0.97 Table 2: Fit of PLS model for 38 indexes and traits of 1546 bulls using 10715
SNP
SNumber of latent Proportion of variance accounted for Trait components V Marker EBV Marker APR 6 91.64 7.06 ASI 6 90.95 7.13 Protein kg 7 94.07 7.60 Protein 8 93-20 8.56 milk 7 91.70 7.69 Fat kg 5 81.86 6.34 Fat% 8 92.05 8.66 Overall Type 4 78.67 5.59 Mammary System 4 80.74 5.68 Stature 4 71.77 5.92 Udder Texture 4 79.24 5,97 Bone Quality 4 73,09 593 Angularity 4 69.54 5.76 Muzzle Width 5 79.86 6.70 Body Depth 6 85.83 7.19 874124_7 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 31. AUG, 2007 18:37 SPRUSON FERGUSON 92615486 NO, 1558 P. 15/88 -101- 0 0 ci en [0418) Example 6: PLS Model Validation 0419] Table 3 shows the results of the validation of the PLS model for the Cow Fertility trait. The PLS model had 20 latent components and was first derived for the trait Cow SFertility using 1546 bulls and 10715 SNP (original data). The model fit was assessed by the coefficient of determination A prediction model (validation set) was computed based on cross-validation. To test if high R values for the original data are caused by overfitting (ie. using a large number of SNP) the EBV of the original data were randomly 8741247 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 31, AUG. 2007 18:37 SPRUSON FERGUSON 92615486 NO, 155B8 16/88 102- 0 0 assigned to animals (permuted data). This step was repeated 20 times, It can been seen from Table 3 that even for randomized data the PLS method fits the observations well, particularly Sif an increasing number of components is fitted in the model. However, these models show no predictive power. The high R z values in the prediction set of the original data demonstrate that the PLS method does not suffer from overfitting.
D 0420 This is further reiterated by the results shown in Figure 11, which show an example of the effect of prediction bias in SNP selection. The potential for inducing a bias in the SNP selection process can be shown for the trait APR. An external validation set of 200 o bulls were randomly selected and excluded from the PLS analysis. The error curve 201 labelled "Internal" was estimated by cross-validation of models trained on subsets of increasing size, after the feature ranking was performed on all available data. The line 203 labelled "Test Data" shows the true prediction error when these internal validated models were used to predict MBV in the unseen test data. The reuse of information leads to optimistically biased estimates of the prediction error, suggesting that a small number of SNP is can provide an accurate prediction of MBV. Using an external validation i.e. line 205 of Figure 11 for performance assessment yields unbiased estimates of the prediction error.
Table 3.Validation of PLS model for Cow Fertility latent late R in original data R in permutated data components Learning set Validation set Learning set Validation set 1 .51 .58 ,20 .005 2 .67 .65 ,36 .008 3 .76 .67 .50 .007 4 .84 .70 .62 .007 .89 ,70 .70 .006 6 .92 .68 .77 ,006 7 .94 .67 .82 .006 8 .96 .67 .86 .007 9 .97 .66 .89 .007 874124_7 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 31, AUG, 2007 18:38 SPRUSON FERGUSON 92615486 NO, 1553 P, 17/88 -103- 0 0 c4 0 VaD 0 0i [0421] Example 7: SNP Weight Distribution [0422 1 Figures 12A and 12B show the VIP (variable importance in projection) distribution for the traits ASI and Overall Type, respectively. SNP with an average contribution to the model have a VIP value of equal 1. High values reflect the importance of the SNP in the PLS model both with respect to their correlation to the EBV and with respect to the SN? data. For both traits more than half of the SNP are of less than average unportance. For the trait ASI less than 40 SNP have a VIP> 2, compared with more than 400 for the trait Overall Type. Ranking SNP acoording to their VIP value allows identification of SNP that are useful in predicting breeding values.
[0423] Example 8: SNP Selection Process 0424 Figures 13A and 13B show examples of the results from the SNP selection process for the traits Protein percentage (Figure 13A) and Overall type (Figure 13B). First a PLS analysis including all SNP (N-10715) was fitted. The number of SNP, the EBV variance explained and the prediction error of the model were set to equal 100% and compared to four different approaches of SNP selection. ''he irst selecton appru-.a (JK (C195)) based o the jackl 'Af 0 metbnd, pnd all variables whose PLS regression coefficients have jack-knife confidence intervals (at the 95% level) that contain zero are eliminated at the same time. The set of SNP derived by JK (C195) was used for a 874124_7 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 31. AUG. 2007 18:38 SPRUSON FERGUSON 92615486 NO. 1558 P. 18/88 104- 0 0 second SNP selection method m which individual SNP were selected by forward selection (JK sel). In the third model (VIP 1.3) only SNP with a VIP 1.3 were included in the PLS model. The fourth selection method was forward selection of SNP based on their VIP value (VIP sel). The SNP e selection models were validated by 5-fold cross-validation. The results show that SNP selection methods are able to derive models with a predictive performance that is very similar to the model Sutilizing all SNP.
O 0425 Example 9: Comparison between PLS and Support Vector Machine Analysis S[ 0426 Figures 14A to 14D examine the predictive performance of the two supervised 0 learning methods partial least squares (PLS) and support vector machines (SVM) using a So radial basis function kernel. Five replicates were analysed for the four traits APR, Milk yield, Protein yield and Overall Type (Figures 14A to 14D respectively).
0427 In each replicate 200 animals were randomly selected to form a test data set, which was not included in training the models. The test sets were chosen in a way that they do not overlap between replicates. PLS and SVM performed equally well in predicting molecular is breeding value (MBV). For example for the five replicates of APR the correlation between MBV and EBV was in the range of 0.78 to 0.83 for both methods.
0428 Example 10: Australian Profit Ranking
(APR)
[0429 The Australian Profit Ranking (APR) is an index which uses ABVs to estimate a ranking that identifies those bulls that produce the most profitable daughters. ADHIS will continue to produce ABV's for all individual traits and the Australian Selection Index (ASI).
This provides producers with the option to select on ASI or other combinations of traits.
0430 The Australian Profit Ranking (APR) Selection Index (ASI) Milking Speed (MS) Temperament (TEMP) Survival (SURV) Somatic Cell Count (SCC) Live Weight (LWT) Fertility (FERT), wherein each component is calculated as per the 2s following: ASI (3,8 x Protein ABV) (0,9 x Fat ABV) (0.048 x Milk ABV) Milking Speed (MS) 1,2 x (Milking Speed ABV) Temperament (TEMP) 2.0 x (Temperament
ABV)
Survival (SURV) 3.9 x (Survival ABV) 874124-7 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 31, AUG. 2007 18:39 SPRUSON FERGUSON 92615486 NO. 1558 P. 19/88 105
O
0 SCC -0.34 x (Somatic Cell Count ABV) LWT -0.26 x (Liveweight ABV) 1FERT 3.0 x (Daughter Fertility ABV) 0431 Example 11: Production traits o 5 [0432] Protein Yield (kg) 0433 Protein content of milk is assessed in automated machines (Bentley Instruments www. Bentleigh instrunents.com; Foss Instruments www.Foss.dk). Protein content of milk is assessed by infrared scanning of milk specific for N-H amine bond absorption.
[0434] Protein o1 [0435 Protein is calculated by dividing protein yield by milk volume litres (L) multiplied by 100.
[0436] Milk Volume (Litres) 0437 A volumetric sample from an on-farm meter is weighed, and milk volume is calculated on the basis of the weight and average density of milk.
[0438] Fat Yield (kg) [0439] Fat yield is assessed in automated machines (Bentley Instruments; Foss Instruments). Fat yield of milk is assessed by infrared scanning of milk specific for C=0 and C-H groups.
[0440] Fat (w/v) 0441 Fat is calculated by dividing fat yield by milk volume litres multiplied by 100 S0442] Example 12: Individual type traits 0443 3 These traits include stature, udder texture, bone quality, angularity, muzzle width, body depth, chest width, pin set, pin width, foot angle, rear leg view, udder depth, fore attachment, rear attachment height, rear attachment width, centre ligament, teat placement, teat length and loin strength Stature 874124- 7 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 31. AUG. 2007 18:39 SPRUSON FERGUSON 92615486 NO 1558 P. 20/88 106c,,l 1) 0444 Stature is measured from the top of the spine in between the hips to the ground.
;The measurement is precise. The trait is measured on a linear scale of 1-9, and each point _increase is 3 cm within the range listed below: 1- Short 1.30 Metres Intermediate 1.42 Metres 9 Tall 1.54 Metres en [0445] Udder Texture [0446] This is a measure of the glandular milk-producing tissue in the udder emphasized 0by its collapsibility when milked, vein network and softness. Fibrous and fatty tissue in the udder restricts a dairy cow's ability to produce large quantities of milk. A prominent and distinctive vein network on the side of the udder is a reliable indicator of desirable texture The trait is measured on a linear scale of 1-9, wherdemn I Fleshy 9 Soft [0447] Bone Quality 0448 1 Bone quality is believed to be a reliable indicator of milking ability in a dairy cow.
A flat bone is "dense", and is more desirable in dairy compared with round or coarse bones which are associated with beef rather than dairy proauction. The ULis i6 ,,i%,&aSUF M a linear scale of 1-9, wherein; Coarse bone 9 Flat bone [0449] Angularity (04501 Angularity is defined as the angle and openness of the ribs, combined with the flatness of bone in two year old heifers. Angle and open rib account for 80% of the weighting and bone quality accounts for 20%. The trait is scored on a scale of 1-9 wherein: 1 Non Angular Lacks angularity, close ribs, coarse bone 4 6: Intermediate angle with open rib 7 9: Very angular open ribbed flat bone.
8741247 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 31. AUG. 2001 18:40 SPRUSON FERGUSON 92615486 NO. 1558 P. 21/88 -107- 0 [0451] Muzzle Width 0452 Muzzle width and openness of nostrils is a highly desirable trait in a country such as Australia where cattle frequently walk vast distances to access feed in extremely warm conditions, The trait is scored on a scale of 1-9, wherein: I Narrow muzzle NO) 9 Wide Muzzle 0453 Body Depth [0454 Is the distance between the top of spine and the bottom of the barrel at the last rib othe deepest point. The trait is scored on a scale of 1-9 wherein: 1 3 shallow 4 6 intemediate 7 9 Deep [0455] Chest Width 0456 Chest width is measured from the inside surface between the froat two legs. This trait is measured on a linear scale from 1-9 where each point is equal to 2 cm based on the range listed below as per Narrow 13 cm, Intermediate and Wide 29 cm.
[0457] Pin Set 0458 This trait is measured as the angle of the rump structure from hooks (hips) to pins on a linear scale of I 9: 1 High Pins (4 cm) 2 (2 cm) 3 Level (0 cm) 4 Slight slope Gin) Intermediate cm) 6 (-6cm) 7 0m) 8 9 Extreme Slope (-12 em) 874124_7 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 AUG. 2007 18:40 SPRUSON FERGUSON 92615486 NO. 1558 P, 22/88 -108- 0 0 Cr 4 [0459] Pin Width Z 0460 This trait is calculated as the distance between the most posterior point of the pin Sbones, where 1 10cm and 9 26 cm and every point between is calculated upon intermediate 2 cm lengths.
S1 Narrow 0 4- 6: Intermediate en 7-9: Wide ci [0461] FootAngle 0 S[0462 1 This trait is calculated as the angle at the front of the rear hoof measured from the floor of the hairline at the right hoof. This trait is measured on a linear scale from 1-9, where: 1 3: Very Low angle 4-6: Intermediate angle 7-9 Wide angle where 1 15 degrees, 5 45 degrees and 9= 65 degrees [0463] Rear Leg View 0464 This trait is the direction of the feet when the animal is viewed from the rear, 1 Extreme toe out Intermediate toe out 9 Parallel feet [0465] Udder Depth [0466 This trait is calculated as the distance from the lowest part of the udder floor to the hock where: 1 Below hook 2- Level with hock 5 Intermediate 9 Shallow [0467] Fore Udder Attachment 874124_7 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 31, AUG. 2007 18:40 SPRUSON FERGUSON 92615486 NO. 1558 P. 23/88 109 0 0468 This trait is calculated as the strength of the attachment of the fore udder to the Sabdominal wall. This is not a true linear trait.
1 3: Weak and Loose Cc 4- 6: Intermediate acceptable s7 9: Extremely strong and light 0 N [0469] Rear (Udder) attachment height 0470 This trait is calculated as the distance between thebottom of the vulva and the milk secreting organ in relation to the height of the animal. A score of 4 represents the mid point of o 29 cm, and each point is worth 2 cm, 1 Very Low 23 cm 2 25 cm 3 27 cm 4 Intermediate 29 cm 31 cm 6 33 cm 7 8 37 cm 9 High 39 cm [0471] Rear (Udder) attachment width [0472 This trait is calculated wherein the reference point for measurement is the top of the milk secreting organ to each pin measued on a linear scale of 1 to 9, where 1 is extremely narrow and 9 is extremely wide.
[0473] Central Ligament f 0474 This trait is calculated as the depth of the cleft measured at the base of the rear udder.
1 Convex to flat floor (1 cm) 2 3 (0 cm) 4 Slight Definition cm) 874124 7 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 3'.AUG.2007 18:41 SPRUSCN FERGUSON 92615486 NO. 1558 P. 2 4/8 8 -110- 0 0 e-n Deep Definition cm) cm) cm) cm) cm) 0475] S0476 quarter.
Teat Placement This trait is calculated as the position of the front teat from the centre of the 1 3: Outside of quarter 4 6: Middle of quarter 7 9: Inside quarter [0477] Teat Length S0478 This trait is calculated as the length of the front teat, where each point is 1 cm and the scale ranges from 1 to 9.
S1 3: Short 4 6: Intermediate 7 9: Long [0479 Example 13: Live Weight 0480 Live Weight is reported as a deviation in kilograms of live weight from the base set at zero. Live Weight is based on ABVs measured by breed societies. The predictors and their relative contributions are: Live Weight (0.5 x stature ABV) (0.25 x Chest Width) (0.25 x Body Depth) [0481] Example 14: Workability [04821 Workability is reported as a combination of the following traits: milking speed, temperament and likeability.
[0483] Each of these traits is scored on a scale from A to E by the dairy farmer, where A is very desirable and E is very undesirable. Satisfactory daughters are those expected to receive scores of C, B or A from the farmer, The metric is expressed as a percentage: 874124 7 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 31, AUG. 2007 18:41 SPRUSON FERGUSON 92615486 NO.1558 P. 25/88 number of offspring expected to be satisfactory (AB,C) X 100 ;all offspring ranked 10484 Example 15: Somatic Cell Count [0485 Somatic cell count breeding value is expressed as the increase or decrease in cell V count compared to the average or BASE the average count is scored as a zero percentage deviation). Thus a bull with lower SCC ABV has daughters wit lower somatic cell count _which is an indicator of increased mastitis resistance, and a bull with a higher SCC ABV has daughters with higher somatic cell count which is an indicator of mastitis susceptibility.
o[ 0486 Somatic cell count can be assessed by laser-based flow cytometry, which is a common method for distinguishing between different cell populations and/or counting cell numbers. Briefly, a milk sample is taken and mixed with a fluorescent dye, which disperses the globules and stains DNA in somatic cells. An aliquot of the stained suspension is injected into a laminar stream of carrier fluid. Somatic cells are separated by the stream of carrier fluid and exposed to a laser beam, As the cells pass through the excitation source the stained cell nuclei fluoresce, the signal is multiplied and cell number calculated. Indicative SCC levels are as follows: Over 200,000: mastitis <200,000: maximum desired number of somatic cells/nrl milk <100,000: number of somatic cells/ml milk where the cow is considered to have minimal to no mastitis
[ICAR]
0487 Example 17: Fertility 0488] Daughter fertility is a measurement of the difference between bulls for the percentage of their daughters pregnant by 6 weeks after mating start date. In year-round herds this is equivalent to the percentage of their daughters pregnant by 100 days after calving. Data is derived from the following records: Calving dates used to determine calving interval and stage of pregnancy Mating data is used to determine days to first service 5741247 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 3/AUG, 2007 18:41 SFRUSN FIRGUSON 92615486 NO. 1558 P. 26/88 -112- 0 [0489] Example 18: Survival Z[ 0490] The survival index is reported as the percentage of daughters that survive from one nyear to the next compared to the average/BASE (set at zero). The Survival Index is based on actual daughter survival and a combination of predictors of survival. The predictors and their o relative contributions are: nSurvival Predictors (0.5 X likeability) (1.8 x Overall Type) (3.0 x Udder Depth) (2.2 x Pin Set) f 0491 3 Example 19; Calving Ease [0492] The calving ease is expressed as the percentage of 'normal' calvings expected 1o when joined to mature cows in the average Australian herd. The calving ease for a bull is based on farmer assessment of the difficulty experienced with the birth of the progeny of the bull, relative to births in the same herd in the same season.
[0493 Example 20: Mammary System [0494] Mammary System ABV is calculated using the formula below based on linear traits is that have been differentially weighted. The differential weighting of each of the linear traits is based on regression analysis and the contribution of these traits to the variance observed in the system overall.
Mammary System (Udder texture x 0.161) (Fore Attachment x 0.4753) (Rear attachment height x 0.454) (rear attachment width x 0.448) (Centre Ligament x 0.355) (teat placement x 0.269) [0495 1 Example 21: Overall Type (0496 1 Overall type is a categorisation of an individual assigned by a person skilled in the art on the basis of an assessment of "type" traits individually assessed.
[0497] Example 22: Selection Index 0498 Selection Index is expressed as the net financial profit (in S) per cow per year. It includes a consideration of protein, fat and milk volume traits. The formulation is based on 8741247 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 31, AUG. 2007 18:42 SPRUSON FERGUSON 92615486 NO. 1558 P. 27/88 -113- 0 the milk payment system whereby farmers are paid by the amounts of protein and fat in milk, ;with a charge on milk volume; ASI (3.8 X Protein Yield ABV) (0.9 X Fat Yield ABV) en (0.048 X Milk Volume ABV) Ss 0499] Example 23; Lactation Traits n 0500] Lactation traits can also be used in predicting the genetic merit of an animal.
[0501 A lactation curve is the graph of milk production against time. Each cow in a herd has its own individual curve relating to its lactation potential and other external influences osuch as the environment and nutrition. Characteristics of the curve include measurements such as the persistency of lactation, total milk produced over the lactation, and the time of peak production.
[0502] Wood proposed the following function to model the lactation curve W(t) atbe Ct where W(t) is the theoretical or expected milk yield at time t; and a, b, and c are parameters which determine the shape of the curve (Wood et al. 1967). The parameters of the Wood finction have been reparameterised to obtain estimates for total volume, peak volume and time to reach the peak.
0503 Negative energy balance in early lactation is often associated with reduced fertility.
This is usually a result of the cow producing at her peak at the time of insemination. A cow with a low peak and consistent production should be able to avoid these problems and maintain fertility. These cows can now be identified with the assistance of the estimates from the model.
0504 Another application of the model is prediction of lactation potential from the first few records, which would allow farmers to manage their herds appropriately in terms of feeding and reproduction (an example list of common lactation traits and corresponding variables of importance for each trait is p-ovided in Table 4), 874124_7 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 3. AUG. 2007 18:42 SPRUSON FERGUSON 92615486 NO. 1559 P. 28/88 -114- 0 0 Table 4: List of Lactation Traits Vaiqble Nmes CatgsorY No LogA Wood Model B 2. B Yield 6 Cm-lative 1 to-300 d Y P0-- t §"Yield [7m( ci Cuula j old uz to 300o s =Y*330 Original Parameter No. 1-3; Derived Parameter: No. 4-14 [0505 Exaiple 24: Application to other animals and species 0506 hole geome-wide marke information is available for humans, many other species of manmals, several non-mammalian vertebrate species, some fish, and many plants.
As a first step, whole genome marker information can be generated using one of several genotypi systems which are ommercially available from Illumina, San Diego, California.). Accordingly, using the methods described above, SNP information is associated with the trait, thereby inferring the trait The SNPs can comprise all marker data, or a limited set of markers may be inferred. Where the trait is a health condition, the outcome may be infernng the risk that an individual will pass on the condition to its offspring. The methods 874124_7 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 31, AUG, 2007 18:42 SPRUSON FERGUSON 92615486 NO.,1556 P. 29/88 -115- 0 disclosed herein also enable persons skilled in the art to develop a set of diagnostic SNPs and ;Z genetic profiling tools for assessing the likelihood that an individual will have a specific S characteristic. This includes: the risk that an individual will develop a disease or condition, such as diabetes, s heart disease etc; the risk that an individual will deelop an adverse reaction to a specific pharmaceutical agent; C,1 predictions regarding productivity, eg for livestock animals; and 0o predictions regarding athletic performance, eg for human athletes and sportspeople or for racing animals.
0507 A whole-genome association study can be undertaken in a number of ways, depending on the number of animals and the number of traits under study. The population structure can be of several types. The situation in the case of animals with high reproductive rate differs considerably from that with large animals, which generally have a low Is reproductive rate. Differences also exist between individual animals within a species. For example, in chickens an exemplary strategy may comprise producing 1000 progeny from sires, mated to 2000 dams, with half-sib groups of 50 progeny per sire. In this case highly accurate breeding values can be computed from the progeny means. Other designs are possible, depending upon the use to which the results will be put.
050 For example, Zebaneh and Mackay (2003) computed breeding values for the trait fasting triglyceride level using data studied at the Genetic Analysis Workshop 13. Their method was similar to other methods which used adjusted phenotypes of various forms.
0509 Therefore the methods of the invention can be applied to this type of analysis, and are not limited to breeding value information, but are applicable to trait information of any kind.
0510 1 Many analyses of human genomic information to identify markers for disease susceptibility have been performed. For example markers for multiple sclerosis and for endometriosis have been identified. The methods of the invention may be applied to this type of analysis.
8741247 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 31. AUG. 2007 18:43 SPRUSON FERGUSON 92615486 NO 1558 P, 30/88 -116j 05111 The population structure can be of several types. The situation in the case of ;animals with high reproductive rate differs considerably from that with large animals, which generally have a low reproductive rate, Differences also exist between individual animals within a species. In chickens an exemplary strategy may comprise producing 1000 progeny from 10 sires, mated to 2000 dams, with half-sib groups of 50 progeny per site, In this case INC highly accurate breeding values can be computed from the progeny means. Other designs are possible, depending upon the use to which the results will be put.
C [0512) A whole-genome association study can be undertaken in a number of ways, depending on the number of animals and the number of traits under study. The simplest C 10 analysis is least-squares regression on every marker. However, a serious problem with this approach is overestimation of the SNP effects. Therefore several methods which analyse several linked marker or haplotypes have been developed. These methods use either linkage or linkage disequilibrium information, or a combination of the two (Meuwissen et al, 2002), which requires prior information about the location and the distances between SNP. In contrast to prior art methods, a powerful feature of the invention is that the phenotypic merit of individuals can be assessed without the need for comprehensive and annotated genome information in a species, which may not be available at the time of analysis.
0513 It will be apparent to the person skilled in the art that while the invention has been described in some detail for the purposes of clarity and understanding, various modifications and alterations to the embodiments and methods described herein may be made without departing from the scope of the inventive concept disclosed in this specification.
1 05141 Example 25- Application to Mouse Data 0515 1 The following example show the application of the methods described above to genotype and phenotype data in mice. The data used in the present example was sourced from http://gscan.well.ox.aCeuk and include phenotypic and genotypic measures for 2296 mice from 4 generations. A total of 12112 SNPs are genotyped for each mouse, but some are missing genotypic scores. The heterogenous stock mice are a result of 50 generations of breeding between 8 inbred families. The first generation of phenotyped mice in these data are defined as mice with unknown parents. The generation number of mice in subsequent 874124.7 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 31. AUG. 2007 18:43 SPRUSON FERGUSON 92615486 NO. 1558 P, 31/88 U' -117- 0 0 generations is defined as the maximum generation of the parents plus 1. Table 5 displays the total mice in the pedigree mice with more than 11112 recorded SNPs (ngeo), and the number of fuill sib families in each generation (nfo,,u).
Table 5: Number of mice per generation 0 VaD 0 0i
O
t'€3
C',I
U-
0 0 Geterabion I n I !Z I h t 1 258 155 2 1019 1016 113 3 558 558 36 4 461 461 33 All 2296 2190 182 0516 The families in table 1 are defined to be full sib families and each family may be comprised of more than one parity. The distribution of the number of parities per family is displayed in Figure 16.
0517 Same sex litter mates were housed together in cages, Only a small number of cages contained more than one litter, as displayed in Table 6. This experimental design makes the environmental cage effects and the genetic effects almost completely confounded. This is illustrated by the small effective population size for each trait, defined as II Y^ -iiIs 7ij} [0518 1 where nflj is the number of mice in family i, cagej and q is the number of mice in the/ cage. Similarly, sex effects cannot be separated from cage effects, Table 6: Number of individuals, families and cages with phenotypic records for selected traits.
874124_7 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 I31. AUG. 2001 18:43 SPRUSON FERGUSON 92615486 NO. 1558 P. 32/88 1 -118- 0 0 ci en 0 Va en ci 0 0 ci All records Famnilies in >2 ca es Ca es WithAfaui Trait fnlind nlw fled 71e fnid am naa e jd It n CD%1869 166 450 41.8 1367 76 328 57 23 14 CCD S 164 166 44 41.4 1363 76 327 56 23 14 CD4ICD3 1868 166 45 0 41.8 1366 76 328 57 23 14 1322Oo i858 164 440 41.8 1329 72 15 57 23 14 CD3% 1869 166 450 41.8 1366 76 328 57 23 14 CD%1867 166 450 41.8 136 76 328 7 2 14 ACbD49145 17 525 62.8 1560 97 420 73 30 1 calcium 1945 176 521 52. 1558 9 1 4 3 30 i Gcoe 1905 176 527 44.4 1521 97 422 69 30 1 Guo seif 1832 176 50 ,2 47 1 1414 92 388 75 31 19 Poen1945 176 518 56.1 1558 98 415 79 32 Growth 2474 180 500 65.7 1997 101411~2 liematocnit 1888 160 458 3. 1458 79 350 42 19 12 RB3C 1885 160 458 2. 1456 79 350 41 19 1 0OS19] Variance Components adainedet niomI t0520 Valdar et at- (2006) gi've the herntabilites anvrne due toes enietfo a ,Variety of traits for all animals with phenotypic records. Sm fteehrtble r recalculated here for mice- with both genotYPic and PhenOtYPlo information and are displayed in table 3. The model used is as in Valdar et al. (2006):befi grn ma, je S0521) Let yo c 0 be the phenotype of the i'l animal in cage]j, Izethgrnme, 4 jb the ando effct of cage], ay be the animal'S additive gentic random effect, x be its value forcandom u, Jf e th ova ite sociatrd With fixq4 effect c C be the set Of fxed effect to co-variates and ey the randoml effect Of uncorltdnieTe 574U24_7 cOMS ID No: ARCS-i 59284 Received by IP Australia: Time 18:47 Date 2007-08-31 31, AUG. 2007 18:44 SPRUSON FERGUSON 92615486 NO. 1558 P. 33/88 -119g1 yi p+ Z#j(ec dj ail %k4 S,6EC (4) [0522 where e N d a N(0, UA) and A is the genetic relationship c Nmatrix. 'Normalizing translbnuatlluus me plidJ Lu athw, yNuuljyp'O a liig the tranaforn b tiO Sas described in Valdar et al, (2006) for each trait, The set of fixed effects is comprised of s age, cage density, litter, weight (continuous), month, sex, experimenter and year (categorical).
Table 7: Variance components and their approximate standard errors o 2 2 2 o Cl Phenotype n p 2 c CD8 1521 21,55 (1.42) 19.25 (2.79) 0.38 (145) 0.89 (0.08) 0.09 (002) CD4/CD8 1516 2.23 (0.15) 1.90 (0,29) 0.26 (0.05) 0.83 (0.08) 0.10 (0.02) (x 10-2) CD4/CD3 1520 7.49 (0.48) 5.95 (0.94) 0.84 (0.15) 0.79 (0.08) 0.11 (0.02) 105) 0.23 (003) B220% 1522 82.90 (4.84) 48.97 (9.28) 19.11 (2.50) 0.59(009) 0.23(0.03) CD3 1521 1.13 (0.06) 0.53 (0.11) 0.27 (0.35) 0.47 (0.08) 0.27 (0.03) (×108) 4 1520 48.64 (2.47) 20,09 (4.43) 12.13 (1.61) 0.41(0.08) 0.25 (0.03) Albumin 1744 6.39 (0.26) 0.92 (0.36) 1.20 (0.21) 0.14 (0.05) 0.19 (0.03) (g/1iter) Calcium 1751 2.72(0.12) 0.37 (0.18) 0,81 (011) 0.14 (0.06) 0,30 (0.04) (mmolx10-2) 554 (77) 0.22 (0.07) 0.27 0.03) Glucose 1705 2022(92) 444 (146) 554 (77) 0.22(0.07) 0.27(0.03) Protein (x05) 1640 1,48 (0.06) 0,19 (0.09) 0.34 (0.06) 0.13 (0.06) 0,23(0.03) Urea (10-2) 1743 3.06(0.14) 0.87 (0.22) 0,64 (0.10) 0.28 (007) 0.21 (0.03) Start Weight 1928 2,29 (0.07) 1,69 (0.05) 0.60 (0.02) 0.73 (0.09) 0.26 (0.03) x10-1) End Weight 1884 1.43 (0.07) 0.87 (0.14) 0.25 (0.03) 0.61 (0.07) 0.17 (0.02) x10-2) Growth Slope 1920 2.72 (0.12) 0.92 (0.21) 0.91 (0,09) 0.34(0.07) 0.33(0.03) (x10-3) Hematocrit 1593 2.11 (0.08) 0.22 (0.10) 0,44 (0.07) 0.10 (0.05) 0.21 (0,03) (08) Red blood cell 1590 2.38 (0.09) 0.32 (0.12) 0.48 (0.07) 0.13 (0.05) 0.20 (0.03) oount (x104) [0523 Table 7 shows the variance components and their approximate standard errors to wherein is the number of individuals with a record for the trait, a is the phenotypic variance, c is the additive genetic variance, o. is the enviromental variance due to the 874124_7 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 j^ 31. AUG. 2007 18:44 SPRUSON FERGUSON 92615486 NO. lbU8 P. 34/88 120- 0 b) random cage effect and h 2 is the heritability. All of the heritability and values in Table 7 are not significantly different to those displayed in Valdar et al. (2006), with the Sexception of Calcium, which they report to be 0.49 and 0.31 respectively.
0524] It should be noted, however, that due to the confounding between cage and genetic O effects and consequently the low effective population number, the maximum likelihood n estimates of the variance parameters in Table 7 are unreliable. This is suppored by the lglikelihood plots displayed in Figure 17, which show the Log-likelihood contours for CD8, CD4, growth and protein (LHS) and corresponding heritability plots IHS). Dotted contours o represent the 10% and 5% thresholds from the LRT. These plots show the contours as the additive genetic and cage variances change. The inner dotted contours 1701 on each plot is a significance region for the variance parameters (the outer dotted contours represent a significance region for the variance parameters). This significance threshold is obtained by applying the likelihood ratio test (LRT) to the maximum log-likelihood value for each trait. That is, for a point with log-likelihood In(LI), the ratio LR is defined as: 1R =L In(L)) 0525 which approximately follows a 2 distribution.
0526 The log-likelhood plot for CD8 is particularly flat and the confidence region for the variance parameters is particularly large. Any heritability between 0.75 and 1 is feasible for CD8. Similarly for CD4, growth and protein, there is a large range of heritabilities that these data support, [0527] Genome Wide Selection Description ofphenotypesfor
GWS
[0528] Five variations of phenotype were created: Raw: Raw phenotypes are predicted from genotypes only.
Cage: Phenotypes are adjusted for fixed effects including cage i.e.
Yage Yaw cCnD where D is the set of fixed effects including cage.
Adjusted: Phenotypes are adjusted for fixed effects excluding cage i.e.
yadJ Yra. E C cz(c), where C is the set of fixed effects excluding cage.
B74124_7 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 31, AUG. 2007 18:45 SPRUSON FERGUSON 92615486 NO. 1558 P. 35/88 -121- 0 0 e Adjusted,/ Phenotypes are adjusted for the cage.family interaction i.,e.
;Z Vacf f(cagefamily)i, en EBV: EBVs from animal model described in Equation The reliabilty of these EBVs is displayed in Figure 18. Most of the animals unreliable EBVs have missing phenotypic information so that the EBV is calculated from the animal's relations, S[0529] Partial least squares (PLS) was applied to all of these phenotypes with the Sgenotypic information acting as the predictor functions. In addition, PLS was applied to the raw data with both the SNPs and fixed effects excluding cage (sex, age, month, etc.) as o explanatory variables (raw 2), io [0530) Forward prediction [0531 The data are divided into a training set comprised of all animals in the first 3 generations and a test set comprised of all animals in the last generation. PLS was applied to the test set and the resultant parameters are used to predict phenotypes for the test set. The correlation between the predicted phenotype and actual phenotype is displayed in Table 8.
Table 8: Forward prediction-PLS, Trail Raw Raw 2 Ca Ad uted Adusted EBYs CD8 0.421 0.423 0.272 0.3766 0.265 0.434 CD4 0.282 0.281 0,167 0.300 0.161 0.286 Gowth 0.206 0.208 0.023 0.197 0.088 0.520 in 0112 0.181-. 0.002 001661- 0.001 0574 [0532 3 The accuracy of prediction is highest for the EBV phenotype for CD8, growth and protein. The adjusted phenotype yields the most accurate result for CD4. This would suggest that adding the pedigree information is advantageous. There is a large decline in accuracy when cage effects are corrected for as a fixed effect, with the accuracy of prediction for the 'adjusted' phenotype significantly higher than both the 'cage' and 'adjustedf I phenotypes.
This is further evidence that cage effects and genetic effects are confounded.
874124_7 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 31. AUG. 2007 18:45 SPRUSON FERGUSON 92615486 NO. 1558 P. 36/88 -122- 0 [0533 Fitting fixed effects in the PLS model does little to improve the prediction accuracy ;for the raw data for CD8, CD4 and growth. This is probably caused by some SNPs being confounded with the fixed effects in the training set due to random sampling. However, there is a large improvement in accuracy for protein.
[0534] Mirror test set prediction [0535 The data are randomly divided into a test set of 300 mice and the remaining mice form the training set. As before, PLS is applied to the test set and the resultant parameters are used to predict phenotypes for the test set, This process is repeated 50 times for each trait and ophenotype. The mean correlation and the standard deviation between the predicted phenotype 0 and actual phenotype for the 50 reptications is displayed in Table 9.
Table 9: Mirror prediction-PLS. Mean and SD of 50 replicates.
Trait Raw Raw 2 Cage Adjusted Adjustedcr
EBVS
CDS 0.689 (0.030) 0.690 (0.030) 0.236 (0.053) 0.688 (0.031) 0.235 (0.053) 0.723 (0,028) CD4 0.452 (0.043) 0.453 (0.043) 0.099 (0.044) 0.444 (0.045) 0.098 (0.042) 0O738 (0.026) Growth 0.078 (0.049) 0 .148 (0,041) 0.040 (0,050) 0.114 (0.055) 0.045 (0,050) 0,152 (0.060) Protein 0.158 (0.048) 0.273 (0.046) -0.077 (0.047) 0.173 (0.048) -0.071 (0.057) 0.737 (0.027) [0536 The accuracies for mirror set prediction are generally higher than accuracies for forward prediction. In the mirror prediction case, animals in the same cage can be used in the is training and test sets, so that the confounding of environmental and genetic effects has less influence. In the forward prediction set, fitting cage as a fixed effect has a large negative effect on accuracy due to the experimental design.
t 0537 J The 'EBVs' phenotype has the best accuracy of prediction when PLS is applied for all 4 traits, with CDS, CD4 and protein having accuracies around 0.73. However the accuracy for growth is significantly lower (0.152).
10538 1 Example 25 Application to Human Data [0539 The applicability of the whole genome analysis approach using partial least squares (PLS) and support vector machines (SVM) were tested on two human data sets with the aim 874124_7 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 31. AUG. 2007 18:45 SPRUSON FERGUSON 92615486 NO. 1558 P. 37/88 123 0 ON) to identify genetic predictors associated with increased or decreased risk for developing a particular disease (Parkinsn's disease and amyotrophic lateral sclerosis ALS). A description of the data is given below (Table 10), All DNA samples and raw genotype data are publcly Savailable, The authors of both studies analysed the data by testing each SNP individually and s both studies were unable to detect common genetic variants that exert an significant effect.
n Table 10: Description of Parkinson's disease and ALS data sets Cases Control SNP Reference SFung et al. Lancet Neurol o!Parkinson's disease 270 271 389 879 2006; 5: 911-16 S7 Schymick et al., Lancet ALS 276 271 503875 NNeol 2007; 6: 322- 2 8 [0540] SVM and PLS gave very similar results and we only report details of the PLS here.
Briefly, a PLS analysis was performed in the following steps; 1. Imputation of missing genotypes using the NIPALS algorithm 2. Splitting the data in validation and test set, The test set included 10 randomly selected cases and 10 randomly selected controls.
3. SNP selection by 10-fold external cross-validation using a 95% jackknife confidence interval 0541] The results are reported in form of the classification error and the number of selected SNP (Table 11). In a random data set we would expect an classification error of The final prediction model build with PLS results in smaller classification errors for both diseases, however the error is magnitudes too large for the model to have any utility as anrt disease diagnostic. Overall, the analyses confirm the findings of the original studies, that z0 neither for Parkinson's disease nor for ALS common genetic variants of larger effects can be identified The auhors of the studies discuss several reasons for the lack of associations idbetween markers and disease risk limited power because of sample size and age-matched and sex-mnatched controls, sporadic ALS may consist of diverse grop of inically indistinguishable genetic disorders, etc.) 874124_7 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 31. AUG. 2007 18:46 SPRUSON FERGUSON 92615486 NO, 1558 P. 38/88 124- 0 0 cil Oil Table 11: Results of partial least squares analysis (PLS) Sfor Parkinson's disease and ALS cn SNP Classification error Parkinson's disease 11 854 0.25 SALS 14891 0.33 s [5 0542 To increase the statistical power of the study would require to whole-genome scan additional patients and control. However, it may be cost-effective to do follow-up genotyping 0 of only the 3% of SNP markers identified by the whole-genome PLS analysis.
[0543 It will be appreciated that the methods and systems described above at least substantially provide a significantly improved genome based selection process.
0544] The systems and processes described herein, and/or shown in the drawings, are presented by way of example only and are not limiting as to the scope of the described methods. Unless otherwise specifically stated, individual aspects and components of the processes may be modified, or may have been substituted, therefore equivalents, or as yet unknown substitutes such as may be developed in the future or such as may be found to be is acceptable substitutes in the future. The processes may also be modified for a variety of applications while remaining within the scope and spirit of the claimed invention, since the range of potential applications is great, and since it is intended that the present processes be adaptable to many such variations.
0545 Example 26 Genetic Algorithm on Beef Data set 0546] The present example demonstrates a phenotype predictor using SNP identification of phenotype based on MBV as biomarker and highlights three applications of the above methods: a) GA-R used to predict top 50SNP in gene based association for complex polygenic trait expressed as age of onset of puberty/reproductive fitness in beef cattle.
874124_7 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 31, AUG. 2007 18:46 SPRUSON FERGUSON 92615486 NO. 1558 P. 39/88 -125b) Demonstration utility of phenotype predictor using GA-R predictor for Sprediction of age of onset of puberty/reproductive fitness with a correlation Of animals phenotype in heifers which could therefore be measured at birth to be predictive of animals subsequent lifetime performance.
c) The use of MBV in bull and cow selection to improve age of onset of puberty/reproductive fitness in heifers-an example of a sex limitd trait for genetic improvement when measured by markers and MBV predictors.
4.U ci 0 0 ci 0547 The GA-R module was used to find important SNP responsible for vanaon trait 'Age at First Corpus Lutenm' in 578 Brahman Heifers. 9775 SNPs were genotyped, and 10 5363 used in analysis after QC of data.
S0548 As the GA is not guaranteed to find a global optimum five analyses were undertaken to identify SNP that were important in all models. The list of the top 50 such SNP were identified and together with results from single SNP analyses and other methods have been used as the basis for gene identification.
s [0549 The phenotypes for this trait were direct observations on the heifers. After adjustment for systematic non-genetic effects they had a phenotypic standard deviation of 115.2 days, The correlation between MBVs and phenotypes from the five analyses ranged between 0.72-0.76 corresponding to a standard deviation of the MBVs ranging from 82-85 days and a heritability of approximately
REFERENCES
S 0550) References cited herein are listed on the following pages, and are incorporated herein by this reference: Gianola, R.L. Fernando and A, Stella, 2006: oenomi-assisted prediction of genetic value with semiparametric procedures. Genetics 173: 1761-1776 Bernardo R. and J. Yu, 2007 Prospects for Genomewide Selection for Quantitative Traits Maize. Crop Sci 2007 47: 1082-1090 874124_7 COMS ID No: ARCS-159284 Received by IP Australia: Time 18:47 Date 2007-08-31 31. AUG. 2007 19:54 SPRUSON FERGUSON 92615486 NO. 1558 P. 40/88 126- 0 0 Bellman, R. (1961), SAdaptive control processes a guided tour. Princeton, NJ. Princeton University Press.
Genetic Analysis Workshop 13; Analysis of Longitudinal Family Data for Complex Diseases and Related Risk Factors: L. Almasy, C. I. Amos, J. E. Bailey-Wilson,
R.M.
Cantor, C.E. Jaquish, M. Martinez, Neuman, J.M. Olson, L.J. Palmer, S. S. Rich, M. A. Spence and J.W. MacCluer BMC Genetics 2003, 4(Suppl 1):S1 SEfron, Tibshirani, R.J. (1993) SAn introduction to the bootstrap.
O Monographs on statistsics and applied probability 57 o io Chapman and Hall, NY Home, B. D. and Camps N, J. (2004).
Principal component analysis for selection of optimal SNP-sets that capture intragenic genetic variation.
Genetic Epidemiology, 26:11-21.
Johnson, R. A. and Wichern, D. editors (1988).
Applied multivariate statistical analysis. Prentice-Hall, Inc., Upper Saddle River, NJ,
USA.
Lin, Z. and Altman, B. (2004).
Finding haplotype tagging SNPs by use of principal components analysis.
American Journal of Human Genetics, 75:850- 861, Meuwissen, T. H. A. Karlsen, S. Lien, 1.01saker, and M. E, Goddard (2002) Fine Mapping of a Quantitative Trait Locus for Twinning Rate Using Combined Linkage and Linkage Disequilibrium Mapping Genetics 161, 373-379 Meuwissen, T. H. B.J.Hayes, and M.E.Goddard (2001) prediction of total genetic value using genome-wide dense marker maps Genetics 157 1819-1829 Roweis, S. (1998).
EM algorithms for pea and spca. In NIPS '97: 874124_7 COMS ID No: ARCS-159289 Received by IP Australia: Time 20:11 Date 2007-08-31 31, AUG, 2007 19:54 SPRUSON FERGUSON 92615486 NO. 1558 P. 41/88 -127
O
4 proceedings of the 1997 conference on Advances in neural information processing systems 10, pages 626-632, Cambridge, MA, USA. MIT Press.
Schaeffer, L.R. (2006).
SStrategy for applying genome-wide selection in dairy cattle J. Anim. Breed. Genet.123 218-223 SSharma, S. (1996).
¢zJ Applied multivariate techniques. John Wiley Sons, Inc., New York, NY, USA.
SValdar, Solberg, L. Gauguier, Cookson, W. Rawlins, J, N. Mott, and Flint, J (2006).
Sto Genetic and environmental effects on complex traits in mice, Genetics, 174:959-984 Zabaneh, D. and I. J. Mackay: Genome-wide linkage scan on estimated breeding values for a quantitative trait BMC Genetics 2003, 4(Suppl 1):S61 Zenger et. al (2007) K.R. Zenger, M.S. Khatkar, B.Tier, M.Hobbs, J.A,L. Cavanagh, J, Solkner, R.J. Hawken,
W.
Barris, H.W. Raadsma Qc analyses of sup array data: experiences from a large population of dairy sires with 23.8 million data points.
Association for the Advancement of animal breeding and Genetics
(AAABG)
Conference paper 17th Annual Conference 23 September 2007 874124 7 COMS ID No: ARCS-159289 Received by IP Australia: Time 20:11 Date 2007-08-31 31. AUG. 2001 19:54 SPRUSON FERGUSON 92615486 NO. 1558 P. 42/88 -128- ;Z Tablel2: Listing of Available SNP/Marker Data Sets <1 (*NationaI Centre for Biotechnology Information U.S. National Library of Medicine 8600 Rockville PI~e, I Bethesda, MD 20894 Pubmed Unique Identifier or Web address) Species PbiainoacespntUnique Identifier or e- .Pubicaionor cces pintWeb acddress* HUMAN Human Adverse drug Bresalier at al., N Engi J Med. 2005 Mar 15713943 reaction 17;352(11I):1092-102. Epub 2005 Feb (example of a Human~ taloism Wntt),BCent 05Dc3; Spl188 1413 Human Alcoholism Namn t al., BMC Genet. 2005 De 30;6 Suppl:2 1845167 Human Alzheimer's Australian Imaging Biomarkers and Lifestyle (AIOL) htto:/lwww~aibljinntco Flagship Study of Ageing; Edith Cohan UnIveristy, m.au/Dageihome 184 Hampton Rd Nedland Western Australia; ___________www~siblnnf-oom.$upagehome; Human A~zheimner's Coon et al, J Clin Psychiatry. 2007 Apr;6B(4):6138 17474819 Human Alzheimner's Grupe etal., 1: Hum Mol Genet. 2007 Apr 17317784 15:16(8):865-73 Hum a n ALS Shymick at al., Lanicet Neurol. 2007 Apr6(4):322-8 17362836 Amyotrophic lateral sclerosis ffnclyealNElJ ve 20Au Human ALS Duclyea. niJMd 07Ag17671248 Arryotrophlc 23;?57(8):775-88 lateral sclerosis Human Ankylosing The Wellcome Trust Case Control Consortum www.wtccc.org.uklnf spondylitis (WTCCC) The Wellcome Trust 215 Euston Road o/overviewshtrnl London NW1 26E 020 7611 7388; http://weiW.wtOcc.orq.uk Human Autoimmune The Wellcome Trust Case Control Consortium WWW.Wtooo.oro-URF'nf thyroid disease (WTOC) The Welicome Trust 215 Euston Road o/overiew.shtrnl London NWI 020 7611 7388; http:/A/ww.wtocc.org.uk HRuman Benign Le e et al., Hum Mal Genet. 2006 Jan 15;15(2):251-8 16330481 recurrent vertigo Humnan Bipolar Center for Human Genetic Research MGH Simohes hfttnlwww,mnassoene Disorder Research Center 185 Cambridge Street ROOM CPZN ral.orgfchgr/research 5.821 A Boston, MA, 02114 getnes-htri http:l/www.mossgeneral.orgchgr/research..genlesilt Human Bipolar I Marcheco-Teniel at al., Am J Med Genet B 16917938 Neuropsych-iatr Genet. 2006 Dee 6;1 41 (8):833-43 Human Bipolar The Wellcome Trust Case Control Consortium w-ww.w tccc. org. uk/in Disorder (WTCCC) The Wellcome Trust 216 Euston Road qooerview.shtml London NW1 21E 020 7611 7388; http://www~wtccc.org.uk Human Bipolar Baum et al., Mol Psychiatry. 2007 MayO 17486107 Human IBMVI Lyon et PLoS Genet. 2007 Apr 27:3(4):e61 I 17465681 874124_7 COMS ID No: ARCS-159289 Received by IP Australia: Time 20:11 Date 2007-08-31 31, AUG.2007 19:55 SPRUSON FERGUSON 92615486 NO. 1558 P 43/88 -129-
;Z
Species Publication or access paint Unique Identifier or Web address* Human Cancer Hu St al., Cancer Res. 2005 Apr 1 ;657):2542- 6 1 5805246 esophogeal Human Cancer The Wellcome Trust Case Control Consortium wwwVV.WtCCC.or.uk/int Breast (WTCCC) The Wellcome Trust 215 Euston Road olovervlewshtrnl London NW1 Fax: 020 7611 7388; hftp/fwww.wtocc.org.uk Human Cancer National Cancer Institue- Cancer Genetic: Markers of htto://caems..cancer.g breast Susceptibility (OGEMS). 6118 Executive Boulevard ov/data/ Room 3036A, Bethesda, MD 20892-8322 www.cancer.gov ogerns.oancerngov/datal Human Cancer Hunter et Nat Genet. 2007 Jul:39(7):870-4 17529973 breast Human Cancer Easton et al., Nature. 2007 Jun 28,447(7148):1087- 17529967 Human Cancer Kemp at aL,, Hum Mol Genet. 2006 Oct 16023799 1 colorectal 1 ;15(19):2903-10 Human Cancer- Tomlinson et al., Nat Genet, 2007 Aug;39(8);984-988 17618284 coloreotal Human Cancer OLL Sechick et ci., Am J Hum Genet 2005 Sep;? 7 3 42 O- 16080117 9 -Human Cancer CLL Schick et at., Blood. 2007 Aug 8 17687107 Human Cancer Lung Spinola et al, Cancer Left., 2007 Jun 28:251(2):311-6 7235 Hkuman Cancer Gudmundsson et al., Nat Genet. 2007 17401366 Mey;39(6):831-7 Human Cancer Yeager et al., Nat Genet. 2007 May;39(5):545-9 1740136 Prostate Human Cancer National Cancer Institute Cancer Genetic Markers bttenffcams.cancer.o Prostate of Susceptibility (CGEMS) 6116 Executive ov/data (CGEMS Isa) BoulevsrdRoom 3036A Bethesda, MO 20892-832 www.cancer.gov cgems.cancer~gov/dotaI Human Celiac van Heel et al., Nat Genet, 2007 Jul;39(7):627-9 17658408 -Human Chiari type I Boyles et a.
1 Am J Med Genet A. 2006 Dec 17103432 malformation I 15,140(24):2776-85 Hmn Coronary Heart The Wellcome Trust Case Control Consortium wwwwtccc~or9jkfinf Disease (WTCCC) The Wellcome Trust 215 Euston Road oloverview.shtml London NWI 213E Fax: 020 7611 7388; http://www.Wtocc.oro.uk Hfu-man Crohns Libioulle et al., PLoS Genet. 2007 Apr 20;3(4):e58 17447842 disease Human Crohns Hampe et al., Nat Genet. 2007 Feb;,39(2):207-1 1, 17200669 disease Epub 2006 Deo 31 Hu-man Crohns Rioux ea!a., Nat Genet. 207 May;39(5):596-604 17435756 disease Human Crobri's The Wellcome Trust Case Control Consortium wvw-wtc&ccgrg.ukliff Disease (WTCCC) The Wellcome Trust 215 Euston Road o/gvsrview.shtmnl London NWI Fax: 020 7611 7388; http:l/www.wtccc.org~uk Human Cleft liplCleft Rkiley et al., Am J Med Genet A. 2007 Apr 17366557 15:143(8):846-62 Human Diabetes type The Wellcome Trust Case Control Consortium wwwjwtccGoro.uk/inf 1 (WTCCC) The Wellcome Trust 215 Euston Road o/overview.shtmi London NW1 28E 020 7611 7388;,-http-./Iwww.wtccc.org Human Diabetes typeI Smyth et al., Nat Genet. 2006 Jun38(6):617-0g 16699517 874124_7 COMS ID No: ARCS-159289 Received by IP Australia: Time 20:11 Date 2007-08-31 I31. AUG. flU 19:55 SPRUSON FERGUSON 92615486 NO. 1558 P. 44/88 1~ _130-
;Z
Species Publication or access point Unique identifier or Web address* Human Diabetes type Diabetes Genetics Initiative of Broad Institute of 174153246 2 Harvard and MIT et al., Science. 2007 Jun 1 ;316(5829); 1331-6 HFuman Diabetes type Sladek et al., Nature. 2007 Feb 22;445(7130):881- 17293876 Human Diabetes type TeWicm rs Case Control Con sodtium wwAwcccqor.uin 2 (WTC CC) The Wellcomne Trust 215 Euston Road oloverview.shtnhi London NW1 2B31 Fax: 020 7611 7388: httP:llwww. wtoCC.Orq.Uk j 1'VJ Zeggii ~ta~ 4 Sience ZU~ftiuniRiO'2tA) Human Human Human Human Diabetes type 2 Diaef type Diabetes type 2 Complications Nepiiropattly Diabetes type 2.
Compilcations -Nephropathy 41g~n et l-,Scinc. 207 un1;31 oudd__1.35u 17634 Scot t a,,Scinc. 207Jun 1:3106(5829):1341-5 Maeda, Diabetes Res Olin Pract. 2004 Dec;66 Suppi 1:545-7 16563979 Maeda et al., Kidney nt Suppl. 2007 Aug;(108):843-6 17653210 Hu-man- Diabetes type 2 Complications Tanaka etal., Diabetes. 2003 Nov;,52(11):2848-53 14578306 Human Hfuman Human Human Human H uman Human HPum a n Human HWuman Hui-kman Human Human Human Diabetes Complications Refinopathy Framingham Heart Gallstone Disease Hypertension Hypertension Stroke Mental Retardation Multiple sclerosis Multiple Sclerosis Myocardial Inforction Nicotine dpepndence Nicotine dependence Obesity-related traits Looker at aL, Diabetes. 2007 Apr;56(4):1'160-6 17973 Herbert at al., Nat Genet. 2007 Fab;39(2)-135-6 172152019 Buch et al., NatGnt.20 Auwg;398;9-G Belle at al., Hypertension. 2007 Mar;.493:536 The Wellcome Trust Case Control Conisortium (WTCCC) The Wellcome Trust 215 Euston Road London NW] 2BE Fax: 020 7611738htp/WwtO .rQU Matarin at al., Lancet Neural. 2007 May;6(5):41 4 20 Hoyer et al., J Med Genet. 2007 Jun 29 Sawcer et al., Am J Hum Genet. 2006 _Sp77(3);454- 67 The Weilcoma _Trust case control Consortium (WTCCC) The Welicome Trust 215 Euston Road London NW1 Fa, 2076117388http/M ±wce.rgu~ Oz-aki and Tanaka, Gail Mot Life Sol. 2005 Aug;62(1 6):1 804-13 43 Bierut et aL, Hum Mol Genet. 2007 Jan 1;61:43 UNh et aL. BMC Genet. 2007 Apr 3;8-10 Scutri t ~oSGent. 007Jul 20; 3(7):e0115 NGF roject Manage6mnt, Pa!Si tier imDLR 17632509 wwwM~wtccc.o ro .uklinf o/overview html 17434096 17601928 16080120 www.wtcco.org ukfinf o/overview.shtml 15990958 17168188 17407593 wwwv.science.nafn~del 874124_7 COMS ID No: ARCS-i 59289 Received by IP Australia: Time 20:11 Date 2007-08-31 I T AUG. 2007 19:56 SI'RUSON FERGUSON 92615486 NO. 1558 P. 45/88 1 -131 0 0 ci en Species 1 1Publication or access point UnqeIdentifier or j Wb adress* 7zu Klin Z0lpicher Str 47 50674 K6Iri httP:/twww.scitflO0.nfl6'def6_1 78,htm b 110. M
I.-
Human Obesit (Lyon) Lyon at al., PLoS Genet. 2007 Ar 27;3(4):e61 Ducilication
I
I -I.
Human -uman Human Human H4uman HWuman HRu-man iHuman Olfactory sense Identification; Intensity; p leasantriosb Dstecarthritis Parkinsons Disease Rheumatoid Arthritis (Wellcome Trust) Rheumatoid Arthritis Rheumatoid Arthritis Rheumatoid -Arthritis Sarcoidosis Situ$ Defect (Gutierrez) Tuberculosis Knaapila at al., Eur J H urn Gmat. 2007 May;1 5(5):596-02 Abel t a7Autoinmfl Rev.Q20 ApqrS 4 2 56BO3 Fung et al. Lancet Neurol 2006: 6: 911-916 The Wellcome Trust Case Control Consortium (WTCCC) The Welioome Trust 215 Euston Road London NWI 2BE Fx020 76111 73887 hftpfWW.WtCO .Orguk Amros et al., Genes lmrnun. 2005 Jun;7(4):277-56
I
17342154 www.wtoaccorg .uklinf oloverviaw.shtml ,John et al., AmUHum Genet. 2004 Jul;75(1):54-64 11541 13 Tamiya at al., Hu6m Mal Genet. 2006 Aug 16000323 Institute of Human Genetics, University of L-6beck, htto:/Iwww.scienCa.nQ Ratzeburgsr Alle 160, 23538 L~bOCK, Germany fn.deldateel/NUW- 326T11 Schuernmann Gufierrez.Roelefls et al., Eur U Hum Genet. 2006 16639409 The Welloome Trust Case Control Consortium iww toc-orikif (WTOCC) The Wellcome Trust 215 Euston Road o/overvieshtml London NWII 29E Fax: 020 7611 7386' htt:/Awww.wtec.org.uk The Wellcome Trust Case Control Consortium www.wtcc.org.uk/linf (WTCCC) The Wellcome Trust 215 Euston Road o/overview~shtml London NWI 21BE Fax; 020 76117388;,http.:/Awww.wtccc.org.uk Human 1Malaria B3OVINE Bovine Example of markers Example of traits Natonal Animal Genome Research program Cattle Genome: Texas A&M Univeristy Akustralin Dairy Herd improvement Scheme Australian Daily Farmers Limited, Level 6 84 William Street Melbourne VIC 3000 BREEDPLAN at University of New England Arrnidale, NSW 2351 AUSTRALIA Beat Example of Itraits htt atwwSlmSa!Oa el daincfarmers.O rm.aJI du.aul htto'Jlosoan.wll.ox.G c.uk/fphaflotypes M OUSE Mouse- For access to Wellcome Trust Center for Human Genetics The markers and Genetic Architecture of Complex Traits in traits Heterogeneous Stock Mice Roosevelt Drive Oxford, 0)0 715N, United Kingdom, 874124A7 COMS ID No: ARCS-159289 Received by IP Australia: Time 20:11 Date 2007-08-31 31-AUG-2007 19:56 31. AU. 19:56SPRUSON FERGUSON 92615486NO153 P 468 NO-1558 P. 46/88 -132- 0 0 ci en
DOG
Dog Example of markers Publication or access point Dog Genome Broad Institute 7 Carobrid ge Center Cambridge. MA 02142 USA Unique Identifier or Web address* htp/.broad.nhil.
I edu/mammplSAd00 172102177cl Dog For accesto Agrafloti and Stumpf, Nucleic Acids Res. 2007 17202172 issue);D71_ Dog ]For markers Leegwater et J Hered, 200 7 Aug q 17548862 nd traits 27 Dog Example of ILindblad-Toh et al., Nature. 2005 Dec I Markers 1.8;438(7069):803- i9 16341006 For access to markers and traits Lindblad-ToI% K.A. W101 Trait Mapping Using A Canine SNIP Array; A Model For Equine Genetics.
Plant Animal Genonles XV Conference January 13-17, 2007 Town Country Convention Center San Diego, CA; http://www.lntpag.org/1S5/abstraots/FAG1 5 WI? _1 01.hlmlI oe-rg/1 PAG15 W17 101.ht ml HflRSE HORSE -L-4 H orse Horse Example of markers Example of markers Agafict and StumPf, Nucleic Acids Res. 2007- Jan 35(Datebese! issus):D71-5 Duplication '17202172 Horse Genome Project;, Cornell University- Collegeo httD://web.vtOrlel- Veterinary Medicine Ithaca, New York 14853-6401 ickupmblic/RsearchIz wanihntczskfl htm Horse Example of Hos eoeMIT Broad Institute 7 Cambridge htty:7/www.brOMd7mit markers Center edulrnammalslhorsel.
Cambridge, MA 02142 USA .unQL httD://www.brood.mit.edu/mammals/horsef H orse Example of National Mimal Genorne Research Program hto://wvw~uy.edu/A markers HorsGeome;Uliveris~tV fKntuky p/Hc-rsemnap/ Horse Example of a Dranchak PK,, J Am Vet Med Assoc. 2005 SeP 16178398 trait1:2 5:7 Horse Example of a Perrym~an LE, Torbeck RI.. J Am Vet Med Assoo. 7429919 trait .18Jun 1;1761)1I! 2 Horse Example of Mark Reid's Ozefom'n supported by Read Interactive http:ftwwvv.ozeform.G; traits owl Horse Example of a New Zealand's Thcroughbresd Breeders Association htto:llwwwmntboraufl trait Gate S, Derby Enclosure, Ellerslie Racecourse hbred.oo.nzlContact- Morrin Street, Elleralie, AUCKLAND Usasox Horse Example of Expert Formi.com 259A Kellar Rd httpwllwww.expertforrm traits Essendon 304 Vic .com/f Horse Example of Timeform, 25 Timefom House Northgate http://wwvw.tirnefbrm~c traits Halifax-OL HXI IXF SHEEP She Example of ItrainlSheep Genomics Consortium htp// i.sheephap 874124_7 COMS ID No: ARCS-i159289 Received by IP Australia: Time 20:11 Date 2007-08-31 I 31.AUG.2007 19:57 SPRUSON FERGUSON 92615486NO158 P 4/8 133 0 0 ci en Speces Vu-ricd-onor cces pintUnique Identifier or Speces ubliatin oraccss pintWeb address* markers httnII/www.sheephaopmarol Serty do SIO a12oo/iOc snochip Livestock Industries Queensland Bioscience Precinct .htm St Lucia Queensland Bioscience Precinct 306 Carnmody Road St Lucia OLD 4067 Australia Sheep Example of National Animal Genorne Research Program Sheep http:l/www.animslien markers Genome; Utah State University oMtoro/RhetOI/ Example of a Raadsma et al., Rev Sc' Tech. i1998 Apr,17(1)'315- 9638820 trait 2.Review.
Examples of Sheep Genetics Australia at University of New hi2ull]wvvw2&hee§f traits England etios.or-ftau/ PIG ~ht~/iw~iAnf National Anima :enome rtuiuuru"n~ Ii~in Pig Pig Example of markers Example of markers Example of ,fo Pig Genome; Iowa State University hlp~w.anmaenom.OM/ia Panilz at Ial., Bioinforrnatics. 2007 Jul 1;23(13):w37 91 Chen et al., fnt J Biol Sci. 2007 Feb 10;3C3):15l S
I
om orgpg Pig Exampf6eof a trait Example of traits S-c-hneider el al., Animn Reprod Sci. i998 Feb PIOBLUP at University of New England Arrnldele, NSW 2351 AUSTRALIA 17384734 9615181 httPm/Jacibu.unu s uiplosloiabluptindexl.
htto://ooultrv~moh.ms u.edu/ 16977841
CHICKFN
4FE Chicken Chitcken Aquacultur e
OYSTE-RS
Oysters Oysters Oysters
SALMONI
salmon Example of markers Example of a trait Example of a trait Example of markers Example of markers National Animal Genome Research Program Chicken Genuine; Michigan State Univeristy Ye, X. et al., Poult Sal .2006 Sep;88(Q):lSSS-GS Z. 1. Liu, and J. F. Cordeeb.. Aquaculture Volume 238, Issues 14, 1 September 2004, Pages 1-37 Evans, at al 2004. Aquaculture 230: 8"S9.
Qullarig et BMC -Genomics. 2007 Jun 8;8,157 NAGRP Aquacutre (3enome Projects College of Marine Studies, University of Delaware 700 Pilottowrl Road, Lewes, DE 19958 Salmon Genlorne Project Address Department of Informiatics and Cormputationael Biology Unit, Bergn Centre for Computational Science University of Bergen HIB N5020 BERGEN NORWAY 17559679 htto://www.anirnalgerl ome-om/eouacuture nomono/ot- Example of imarkers 16387880 salmon Example of Anderson et al., Genetics. 206Apr;1 72(4):2561-82.
markers Epub 2005 Dec
I
874124?' COMS ID No: ARCS-159289 Received by IP Australia: Time 20:11 Date 2007-08-31 31. AUG, 2007 19:57 SPRUSON FERGUSON 92615486 NO. 1553 P. 48/88 134amn Example of Hes 8Ja atirdaly eter 4 or;Co(l126 En Col 166852W83 lmae markers and 20ter Ma~ tr lie 10flRa trutEample of i Mohdal HK 7 o! Gten 1at G'lenom a ics. 2007 17308931 tra T u ;2 II I I 47-16.' E: ub 2007 Feb 17 oladCl mark an Wialog AqadGeoI' eeri aoa~y 2 ~~eknh Keaney~ile, ~es Vigina 543 Phne 04-24-2a2On traits 84ox2129r A 1I trak~p it Exampl of Km :t el.. E Ge bt 2007 Aug 51 srley Example Of NaG 1-Ion a lTueorAp Genat. 200 Au 22re712 taraesoitce isr eia nvest fSuhCaoia -m.
WHEATlnsMrieLbrar 3 Fr onsn sr 874~Road COMIS ID No: ARCS-159289 Received by IP Australia: Time 20:11 Date 2007-08-31 I 3 AUG. 2007 19:58 SPRUSON FERGUSON 92615486 NO. 1558 P. 49/88 -135- 0 ci Species Wheat Wheat Wheat Wheat
RICE
Rice Rice Rice Rice Rice Rice
PINE
Pine Example of markes Example of n'ities Example of a trait Example of a trait Publication or access point Intemational wheat genome sequencing project cd- Eversole Associates, 5207 Wyoming Road Bethesda, MD 20816 USA Wheat SNP datacase University of California, Davis Dept. of Plant Sciences, University of Califomia, One Shields Avenue, Davis, CA 95616 Kuchel H, etal., Theor Appl Genet, 2007 Aug 23 Marza F.19at al Theoretical and Applied Genetics Volume 19, Number 2 1 February, 2007 163-177 Unique Identifier or Web address:* http://wyw.wheatqen ome.gorqkontact htmL bLUDo:lwheat~w.udaoolNPlnew/indem~ html 17713765 http:/IwwwsouifOnerli jcgomlcntenlv 025-W 20728476081 .xample of Zhangetal.,DNA Res. 2007 Feb 28;14(1):37-45 17452422 markers Example of Feltus et al., Genome Res. 2004 Sep;14(9):1812-9 15342564 markers Example of Plant Physiol. 2004 Jul;135(3):1196-205. 15260053 markers Example of Liu, C et al., Yi Chuan. 2006 Jun;28(6);737- 44 16818440 markers Example of a Cho etal., Mol Cells. 20 0 7 Feb 28;23(1):72-9 17464214 trait -9 Example of a Lin X et sl., eor Appl Genet. 005 Dec;112(1):85- 16189659 trait 96. Epub 2005 Sep 28 Example of Tree Genes A forest tree genome database http:/ldendrUme-ucda markers University of California, Davis Dept. of Plant Vis.edultreegenes/ Sciences, University of Califomia One Shields Avenue Davis, CA 95616 Example of The Pine Genome Initiative The institute of Forest htto:H/ineenomeiniti markers Biotechnology 920 Main Campus Drive, Suite 101 ative.orqldeiver.html RaeihN 27606 Exampleofa Brown R, tal Genetics. 2003 Aug;164(4):1537-46 12930758 trait
I
Example of a Southern Tree Breeding Association, 2 Eleanor www.stbacom.au trait Street C b PO Box 1811 Mount Gambier, SA 5290 Australia Pine Pine Pine 874124 7 COMS ID No: ARCS-159289 Received by IP Australia: Time 20:11 Date 2007-08-31
Claims (3)
- 31. AUG. 2001 19:58 SPRUSON FERGUSON 92615486 NO. 1558 P. 50/88 -136- CLAMS A method for the Prediction of the merit Of at least one individual in a population, en the method comprising the steps of: in the population, where information of individuals are known, using dimension Va reduction on the information to project the information to a low dimensional space whilst retaining the complexity of the information to generate a set of explanatory Variables; c-i utilising the explanatory variables to generate a predictor fnction with respect 0 to merit; and utilising the predictor function to predict the merit of the individual. 2, A method as claimed in claim I for a prediction of a merit of at least one idividual, the method comprising the steps of: in a first population, where genotype and phenotype information of individuals in the first population are known, using dimension reduction on the genotype and phenotype information to determine the complexity of the genotype and phenotype information to minimise prediction error for at least one marker in the first population and thereby generate a set of explanatory variables with respect to the at least one marker; utilising the explanatory variables to the first population to generate a predictor function with respect to merit; generating a genotype for the at least one marker in at least one individual of interest from a second population; and utilising the predictor function to the genotype of the at least one individual of interest to determine the genetic merit of the individual of interest with respect to the at least one marker. 3. A method for the prediction of the merit of at least one individual in a population, the method comprising the steps of:, in the population, where information of individuals are known, using a genetic algorithm process on the infonnation to generate a set of explanatory variables for all the information, the explanatory variables comprising weightd averages for components of the information; and 874124_7 COMS ID No: ARCS-159289 Received by IP Australia: Time 20:11 Date
- 2007-08-31 3tAG.20 1:9 SPRUSON FERGUSON 92615486 NO.,1558 P. 51/88 -137- utilising the explanatory variables to genierate a predictor fuinction wit respect ;Z to merit; utilising the predictor function to predict the merit of the individual. 4. A method as claimed in either claim 1 or claim 3 wherein step comprises s utilising the explanatory variables to generate a plurality of predictor functions for the individuals of the population. A method as claimed in any one of claims 1 to 4 wherein the information comprises c-i information for at least one marker. 6. A method as claimed in claim 5 wherein the information comprises information for a plurality of marker s. 7. A method as claimed in either claim I or claim 3 wherein for a plurality of individuals of interest from the population where information is unknown, generating genotype for at least one individual of interest from population. 8. A method according to any one of claims I to 7, further comprising the steps of:, determining additional information on the explanatory variables for the at least one individual; combining the additional information for the at least one individual with the information on the explanatory variables for the individuals of the population; and (hi) repeating steps and for at least one further individual to predict the merit of the fuxrther individual. 9, A method according to claim 8 wherein step comprises determining additional information on the explanatory variables on a plurality of individuals. A method according to any one of thte preceding claims, wherein the utilisation of the predictor function is performed on the basis of a desired outcome. 11. A method according to claim 4 wherein the genotype information comprises genetic markers or blo-markers or epigenetic markers. 12. A method according to any one of the preceding claims, wherein the merit is a genetic merit selected from the group of a molecular breeding value, a quantitative trait locus, or a quantitative trait nueleotide. 87417,47 COMS ID No: ARCS-159289 Received by IP Australia: Time 20:11 Date 2007-08-31 I 31, AUG. 2007 19:59 SPRUSON FERGUSON 92615486 NO. 1558 P, 52/88 (13 0 0 (c en 0 0 0s ci -138- 13. A method of predicting trait performance for at least one individual in a population, the method comprising the steps of: in the population, where information of individuals are known, using dimension reduction on the information to project the information to a low dimensional space whilst 5 retaining the complexity of the information to generate a set of explanatory variables; and utilising the explanatory variables to generate a predictor function with respect to merit; individual utilising the predictor function to predict the trait performance for the L. lo 14. A method as claimed in claim 13 further comprising the steps of: for an individual of interest from the population where information is unlknown, generating genotype for at least one individual of interest from population; and applying the predictor function to the genotype of the at least one individual of interest to predict the predict the trait performance for the individual. 15. A method as claimed in claim 13 or claim 14 wherein the information is selected from the group of genotype, phenotype or genotype and phenotype information on individuals in the population, 16. A method as claimed in claim 13 wherein the trait is a quantitative trait. 17. A method for selecting at least one individual in a population, the method comprising the steps of: in the population, where information of individuals are known, using dimension reduction on the information to project the information to a low dimensional space whilst retaining the complexity of the information to generate a set of explanatory variables; and utilising the explanatory variables to generate a predictor function; utilising the predictor function to select an individual. 18. A method as claimed in claim 17 further comprising the steps of: for an individual of interest from the population where information is unknown, generating genotype for at least one individual of interest from population; and applying the predictor function to the genotype of the at least one individual of interest to select an individual.
- 874124-7 COMS ID No: ARCS-159289 Received by IP Australia: Time 20:11 Date 2007-08-31 31.AUG.2007 20:00 31. UG. 8Q7 ~:GUSPRUSON FERGUSON 92615436NO153 P 5/3 NO. 155 P. 53/88 17- -139- 19. A method as claimed in claim 17 or claim 18 wherein the information is selected ;Z from the group of genotype, phenotype or genotype and phenotype information on individuals in the population. A method of diagnosing a condition in at least one individual of interest in a population, the method comprising the steps of: in the population, where information of individuals are known, using dimension reduction on the information to project the infornation to a low dimensional space whilst retaining the complexity of the information to generate a set of explanatory variables; and o utilising the explanatory variables to generate a predictor function; o t utilising the predictor function to diagnose a condition in the individual. 21. A method as claimed in claim 20 further comprising the steps of: for an individual of interest from the population where information is unknown, generating genotype for at least one individual of interest from population; and applying the predictor function to the genotype of the at least one individual of interest to diagnose a condition in the individual of interest. 22. A method as claimed in claim 20 or claim 21 wherein the information is selected from the group of genotype, phenotype or genotype and phenotype information on individuals in the population. 23. A method of prediction of a susceptibility to an outcome of at least one individual 2o of interest in a population, the method comprising the steps of;. in the population, where information of individuals are known, using dimension reduction on the information to project the information to a low dimensional space whist retaining the complexity of the information to generate a set of explanatory variables; and utilising the explanatory variables to generate a predictor fraction; utilising the predictor function to predict the susceptibility of the individual to an outcome, 24, A method as claimed in claim 23 further comprising the steps of: for an individual of interest from the population where information is unknown, generating genotype for at least one individual of interest from population; and 874124_7 COMS ID No: ARCS-159289 'Received by IP Australia: Time 20:11 Date 2007-08-31 AUG. 2GQ7 20:0U SPRUSON FERGUSON 92615486 NO. 1558 P. 54/88 -140- on applying tho predictor function to toe genotype of the at least one individual of ;Z interest to predict toe susceptibility of the individual to an outcome. A method as claimed in claim 23 or claim 24 wherein the information is selected from the group of genotype, phenotype or genotype and phenotype information on individuals in the population. 26. A method as claimned in claim 23 wherein the outcome is toe susceptibility of the Cfl individual of interest to- a disease. ci27. A method as claimed in claim 23 wherein the outcome is the susceptibility of the o individual of interest to a response to a stimulus. 1o 28. A method as claimed in claim 27 wherein the stimulus is selected from, the group of a medicament, toxin, or an environmental condition. 29. A method as claimed in claim 28 wherein the environmental condition comprises water shortage, feed shortage, stress, sunlight, or other environmental condition. A method of breeding at least one individual in a population, the method comprising the steps of: in the population, where informnation of individuals are known, using dimension reduction on the information to project the information to a low dimensional space whilst retaining the complexity of the information to generate a set of explanatory variables; and utilising the explanatory variables to generate a predictor fanction wit respect 2o to merit of the individual; utilising the predictor function to predict the merit of the individual and breeding from the individual of interest on the basis of the merit of the individual. 31. A method accordig to claim 30, further comprising the steps of:, determnining information for the descendants of the at lea st one individual; correlating toe information for the descendants of the at least one individual to the predictor function; and selecting descendants of said inidividual on the basis of the relationship between the information for the descendants and the predictor function. 8741M72 COMS ID No: ARCS-i159289 Received by IP Australia: Time 20:11 Date 2007-08-31 31. AUG. 2007 20:01 SPRUSON FERGUSON 92615486 NO. 1558 P. 55/88 32. A method as claimed in claim 30 or 31 wherein the information is selected from the ;Z group of genotype, phenotype or genotype and phenotype information on individuals in the popl. on system for the prediction of merit of an individual in a population, the system comprising: Va in the population, where information of individuals are known, means fur using dimension reduction on the information to project the information to a low dimensional space c-i whilst retaining the complexity of the information to generate a set of explanatory variables; o and means for utilising the explanatory variables to generate a predictor function with respect to merit; means for utilising the predictor function to predict the merit of the individual. 34, A system for predicting trait performance of at least one individual in a population, the system comprising; a) in the population, where information of individuals are known, means for using dimension reduction on the infonmation to project the information to a low dimensioflal space whilst retaining the complexity of the information to generate a set of explanatory variables; and means for utilising the explanatory variables to generate a predictor function; and means for utilising the predictor function to predict performance of said trait for the individual of interest. A system as claimed in claim 34 wherein the trait is a quantitative trait. 36. A system for selecting at least one individual in a population, the system comprising; a) in the population, where information of individuals are known, means for using dimension reduction on the information to proj ect the infonnation to a low dimensional spaice whilst retaining the complexity of the information to generate a set of explanatory variables; and 874124_7 COMS ID No: ARCS-159289 Received by IP Australia: Time 20:11 Date 2007-08-31 31,AUG. 200 20:01 SPRUSON FERGUSON 92615486 NO. 1553 P. 56/88 -142- means for utilising the explanatory variables to generate a predictor function; ;Z and means for -utilising the predictor function to select the individual. en37. A system for diagnosing a condition in at least one individual of interest in a population, the system comprising: in the population, where information of individuals are known, means for using dimension reduction on the information to project the information to a low dimensional space whilst retaining the complexity of tbe information to generate a set of explanatory variables; o and o to means for utilising the explanatory variables to generate a predictor function; means for utilising the predictor function to diagnose a condition in the individual. 38. A system for prediction of a susceptibility to an outcome of at least one individual of interest in a population, the system comprising; Is in the population, where information of individuals are known, means for using dimension reduction on the information to proj ect the information to a low dimensional space whilst retaining the complexity of the information to generate a set of explanatory variables; and means for utilising the explanatory variables to generate a predictor function; means for utilising the predictor function to predict the susceptibility of the at least one individual of interest to an outcome. 39. A system for breeding at least one individual in a population, the system comprisig: in the population, where information of individuals are known, means for using dimension reduction on the information to project the information to a low dimensional space whilst retaining the complexity of the information to generate a set of explanatory variables; and means for utilising the explanatory variables to generate a predictor function with respect to merit of the individual; 874124_7 COMS ID No: ARCS-159289 Received by IP Australia: Time 20:11 Date 2007-08-31 31. AUG. 2007 20:01 SPRUSON FERGUSON 92615486 NO. 1558 P. 57/88 143 0 0 means for utilising the predictor function to predict the merit of the individual Sand _1 means for breeding from the individual of interest on the basis of the merit of C the individual, 40. A system as claimed in claim 39, further comprising the steps of: S(f) means for determining information for the descendants of the at least one individual; S(g) means for correlating the information for the descendants of the at least one Sindividual to the predictor function; and 0 10 means for selecting descendants of said individual on the basis of the relationship between the information for the descendants and the predictor function. 41. A method or system according to any one of the preceding claims, wherein the information comprises genetic information consisting essentially of marker genotypes. 42. A method or system according to claim 41 wherein the genetic markers are 1 distributed substantially across the genome. 43. A method or system according to claim 41, wherein the number of genetic markers genotyped is greater than 1000, greater than 1500, greater than 2500, greater than 5000, greater than 10000, greater than 15000, greater than 20000, greater than 25000, greater than 30000, greater than 35000, greater than 40000, greater than 45000, greater than 50000, 2o greater than 100000, greater than 250000, greater than 500000, or greater than 1000000, greater than 5000000, greater than 10000000 or greater than 15000000. 44. A method or system according to claim 41, wherein the genetic markers are selected from the group consisting of single nucleotide polymorphism (SNP), tag SNP, microsatellite (simple tandem repeat STR, simple sequence repeat SSR), restriction fragment length polymorphism (RFLP), amplified fragment length polymorphism (AFLP), insertion- deletion polymorphism (INDEL), random amplified polymorphic DNA (RAPD), ligase chain reaction, insertion/deletions and direct sequencing of the gene or a simple sequence conformation polymorphisms (SSCP). A method or system according to claim 44 wherein the genetic marker is a SNP. 874124_7 COMS ID No: ARCS-159289 Received by IP Australia: Time 20:11 Date 2007-08-31 31.AUG.2007 20:02 SPRUSON FERGUSON 92615486 NO. 1558 P. 58/88 -144- 0 0 Ol 46. A method or system according to any one of the preceding claims, wherein the Sinformation comprises at least one of the pedigree of the individual; an estimated breeding Svalue of the individual; data on genetic markers across the genome for the individual or for relatives of the individual at least one index of phenotype for the individual or for relatives S of the individual; at least one marker predictive ofphenotype for the individual or for Srelatives of the individual; and at least one index of epigenetic modification or status for the individual, or a combination thereof. S47, A method according to claim 13 or claim 14 or a system according to claim 34, 0 wherein the individual is a dairy cow or bull, and wherein the quantitative trait is selected Sto from the group consisting of APR, ASI, protein kg, protein percent, milk yield, fat kg, fat percent, overall type, mammary system, stature, udder texture, bone quality, angularity, muzzle width, body depth, chest width, pin set, pin sign, foot angle, set sign, rear leg view, udder depth, fore attachment, rear attachment height, rear attachment width, centre ligament, teat placement, teat length, loin strength, milking speed, temperament, like-ability, survival, calving ease, somatic cell count, cow fertility, and gestation length, or a combination of one or more of these traits, 48. A method or system according to any one of the preceding claims, wherein the dimension reduction is selected from the a technique i the group consisting of principal component analysis (PCA), a genetic algorithm, a neural network, partial least squares (PLS), inverse least squares, kernel PCA, LLE, Hessian LLE, Laplacian Eigenmaps, LTSA, isomap, maximum variance unfolding, Bolzman machines, projection pursuit, a hidden Markov model support vector machines, kernel regression, discriminant analysis and classification, k- nearest-neighbour analysis, fuzzy neural networks, Bayesian networks, or cluster analysis. 49. A method or system according to claim 48, wherein the dimension reduction technique is principal component analysis. A method or system according to claim 48, wherein the dimension reduction technique is supervised principal component analysis. 51. A method or system according to any one of claims 48 to 50 wherein the number of principal components is between about 10 and about 874124_7 COMS ID No: ARCS-159289 Received by IP Australia: Time 20:11 Date 2007-08-31 31. AUG. 2001 20:02 SPRUSON &FERGUSON 92615486 NO. 1558 P. 59/88 52. A method or system according to any one of claims 48 to 50 wherein the number of ;Z principal components is about 53. A method or system according to claim 48 wherein the dimension reduction en technique is partial least squares analysis, s 54. A method or system according to claim S3wherein the number of latent components is between about 4 and about A method or system ac-cording to claim 48 or 54 wherein the number of latent components is about 6. 056. A method or system according to claim 48 wherein the dimension reduction o o technique is support vector machine analysis. 57. A mnethod or system according to any one of the preceding claims wherein the information does not include the pedigree of the individual. 58. A breeders product comprising at least one gamete with a high prediction of merit for at least one marker, the breeders product selected by a method for the prfediction of the merit of at least one individual, the method comprising the steps of:, in a first population, where genotype and phenotype information of individuals in the first population are known, using dimension reduction on the genotype and phenotype information to determine the complexity of the genotype and phenotype information to minimise prediction error for at least one marker in the first population and thereby generate a set of explanatory variables with respect to the at least one marker; applying the explanatory variables to the first population to generate a predictor function; generating genotype for the at least one marker in at least one individual of interest from a second population; applying the predictor function to the genotype of the at least one individual of interest to determine the genetic merit of the individual of interest with respect to the at least one marker. 59. A computer system comprising a computer processor and memory, the memory comprising software code stored therein for execution by the computer processor of a method 874124? COMS ID No: ARCS-i159289 Received by IP Australia: Time (i-tm) 20:11 Date 2007-08-31 31.AUG. 2001 20:03 SPRUSON FERGUSON 92615486 NO. 1558 P. 60/88 146- on for the prediction of the merit of at least one individual in a population, the method ;Z comprising the steps Of: in a database comprising information about the population, where information of individuals are known, using dimension reduction on the information to project the information to a low dimensional space whilst retaining the complexity of the information to generate a set of explanatory variables; utilising the explanatory variables to generate a predictor function with respect to merit; and o utilising the predictor fuction to predict the merit of the individual. 60. A computer readable medium, having a program recorded thereon, where the program is configured to make a computer execute a procedure for the prediction of the merit of at least one individual in a population, the software product comprising: in a database comprising information about the population, where information of individuals are known, code for using dimension reduction ona the information to project is the information to a low dimensional space whilst retaining the complexity of the information to generate a set of explanatory variables; code for utilising the explanatory variables to generate a predictor function with respect to merit; and code for utilising the predictor function to predict the merit of the individual. 61. An information database product comprising information for individuals of a population, the information database for use with a method for the selection of at least one individual in the population, the method comprising the steps of: in the population, where information of individuals are known, using dimension reduction on the information to project the information to a low dimensional space whilst retaining the complexity of the information to generate a set of explanatory variables; and utilising the explanatory variables to generate a predictor function with respect to merit; utilising the predictor function to predict the merit of the individual. 874124'7 C-OMS ID No: ARCS-i159289 Received by IP Australia: Time 20:11 Date 2007-08-31 31.AUG. 2001 20:03 SPRUSON FERGUSON 92615486 NO. 1558 P. 61/88 -147- on62. An information database product for use with a breeding program, the database ;Z comprising information for individuals of a population and a prediction of the merit of the individuals in the population. 63. An information database product comprising information for individuals of a s population according to claim 62 wherein a prediction of a maerit of the individuals in the population is provided by a dimension reduction method on the genotype and phenotype information of individuals in the population comprising the steps of. using a dimension reduction method, determining the complexity of genotype and phenotype infon-nation of individuals in the population to minimise prediction error and oto thereby generate a set of explanatory variables; applying the explanatory variables to the first population to generate a predictor function; genieratinig genotype for the at least one marker in at least one individual of interest from a second population; is applying the predictor function to the genotype of the individuals of the second population thereby to determine the genetic merit of individuals in the second population individuals with respect to tbe at least one mariar. 64, An information database product according to claim 62 or claim 63 wherein individuals of interest from the second population are selected for use in a breeding program based upon the prediction of merit for the at least one marker. A system or method as claimed in any of the preceding claims wherein the predictor function is a predictor function with having minimal prediction error. 874124_7 COMS 10 No: ARCS-159289 Received by IP Australia: Time 20:11 Date 2007-08-31
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| AU2007214360A AU2007214360A1 (en) | 2006-09-01 | 2007-08-31 | Whole genome based genetic evaluation and selection process |
Applications Claiming Priority (9)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US84189806P | 2006-09-01 | 2006-09-01 | |
| US60/841898 | 2006-09-01 | ||
| AU2007901355 | 2007-03-15 | ||
| AU2007901355A AU2007901355A0 (en) | 2007-03-15 | Genome based genetic evaluation and selection process | |
| US91917807P | 2007-03-20 | 2007-03-20 | |
| US60/919178 | 2007-03-20 | ||
| AU2007901501A AU2007901501A0 (en) | 2007-03-20 | Genome-based genetic evaluation and selection process | |
| AU2007901501 | 2007-03-20 | ||
| AU2007214360A AU2007214360A1 (en) | 2006-09-01 | 2007-08-31 | Whole genome based genetic evaluation and selection process |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| AU2007214360A1 true AU2007214360A1 (en) | 2008-03-20 |
Family
ID=39246928
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| AU2007214360A Abandoned AU2007214360A1 (en) | 2006-09-01 | 2007-08-31 | Whole genome based genetic evaluation and selection process |
Country Status (1)
| Country | Link |
|---|---|
| AU (1) | AU2007214360A1 (en) |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106604229A (en) * | 2016-12-27 | 2017-04-26 | 东南大学 | Indoor positioning method based on manifold learning and improved support vector machine |
| US20190317951A1 (en) * | 2012-12-19 | 2019-10-17 | International Business Machines Corporation | Indexing of large scale patient set |
| CN112640849A (en) * | 2021-01-13 | 2021-04-13 | 东辽县树安村万力养鸡专业合作社 | Anti-season egg laying method for northeast local chickens |
| CN114863991A (en) * | 2022-06-21 | 2022-08-05 | 沈阳农业大学 | Method for improving whole genome prediction precision based on two-step prediction model establishment |
| CN115689389A (en) * | 2022-11-21 | 2023-02-03 | 黑龙江省水利科学研究院 | Method and device for river and lake health assessment in cold regions based on whale algorithm and projection tracking |
| CN116052889A (en) * | 2023-03-31 | 2023-05-02 | 四川无限智达科技有限公司 | A sFLC prediction system based on blood routine index detection |
| CN116863998A (en) * | 2023-06-21 | 2023-10-10 | 扬州大学 | Genetic algorithm-based whole genome prediction method and application thereof |
-
2007
- 2007-08-31 AU AU2007214360A patent/AU2007214360A1/en not_active Abandoned
Cited By (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11860902B2 (en) * | 2012-12-19 | 2024-01-02 | International Business Machines Corporation | Indexing of large scale patient set |
| US20190317951A1 (en) * | 2012-12-19 | 2019-10-17 | International Business Machines Corporation | Indexing of large scale patient set |
| CN106604229B (en) * | 2016-12-27 | 2020-02-18 | 东南大学 | An indoor localization method based on manifold learning and improved support vector machine |
| CN106604229A (en) * | 2016-12-27 | 2017-04-26 | 东南大学 | Indoor positioning method based on manifold learning and improved support vector machine |
| CN112640849A (en) * | 2021-01-13 | 2021-04-13 | 东辽县树安村万力养鸡专业合作社 | Anti-season egg laying method for northeast local chickens |
| CN112640849B (en) * | 2021-01-13 | 2022-07-08 | 东辽县树安村万力养鸡专业合作社 | Anti-season egg laying method for northeast local chickens |
| CN114863991A (en) * | 2022-06-21 | 2022-08-05 | 沈阳农业大学 | Method for improving whole genome prediction precision based on two-step prediction model establishment |
| CN114863991B (en) * | 2022-06-21 | 2024-10-29 | 沈阳农业大学 | Method for improving whole genome prediction precision based on two-step prediction model establishment |
| CN115689389A (en) * | 2022-11-21 | 2023-02-03 | 黑龙江省水利科学研究院 | Method and device for river and lake health assessment in cold regions based on whale algorithm and projection tracking |
| CN115689389B (en) * | 2022-11-21 | 2023-07-14 | 黑龙江省水利科学研究院 | Method and device for river and lake health assessment in cold regions based on whale algorithm and projection tracking |
| CN116052889B (en) * | 2023-03-31 | 2023-07-04 | 四川无限智达科技有限公司 | A sFLC prediction system based on blood routine index detection |
| CN116052889A (en) * | 2023-03-31 | 2023-05-02 | 四川无限智达科技有限公司 | A sFLC prediction system based on blood routine index detection |
| CN116863998A (en) * | 2023-06-21 | 2023-10-10 | 扬州大学 | Genetic algorithm-based whole genome prediction method and application thereof |
| CN116863998B (en) * | 2023-06-21 | 2024-04-05 | 扬州大学 | Genetic algorithm-based whole genome prediction method and application thereof |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20080163824A1 (en) | Whole genome based genetic evaluation and selection process | |
| Van Eenennaam et al. | Applied animal genomics: results from the field | |
| Hayes et al. | Genome-wide association and genomic selection in animal breeding | |
| Weigel et al. | A 100-Year Review: Methods and impact of genetic selection in dairy cattle—From daughter–dam comparisons to deep learning algorithms | |
| Lopes et al. | A genome-wide association study reveals dominance effects on number of teats in pigs | |
| Spelman et al. | Use of molecular technologies for the advancement of animal breeding: genomic selection in dairy cattle populations in Australia, Ireland and New Zealand | |
| AU2007214360A1 (en) | Whole genome based genetic evaluation and selection process | |
| Ibáñez-Escriche et al. | Promises, pitfalls and challenges of genomic selection in breeding programs | |
| Nguyen et al. | Multivariate genomic prediction for commercial traits of economic importance in Banana shrimp Fenneropenaeus merguiensis | |
| Gorssen et al. | Breeding for resilience in finishing pigs can decrease tail biting, lameness and mortality | |
| Samaraweera et al. | Genetic parameters for milk yield in imported Jersey and Jersey-Friesian cows using daily milk records in Sri Lanka | |
| Das et al. | Genomic selection: a molecular tool for genetic improvement in livestock | |
| Blasco | Animal breeding methods and sustainability | |
| Khatkar | Genomic selection in aquaculture breeding programs | |
| Vaishnav et al. | Breeding management in commercial pig farms | |
| Blasco | Animal breeding methods and sustainability | |
| Massender et al. | Sustainable Genetic Improvement in Dairy Goats | |
| Elzo et al. | Genetic parameters and predictions for direct and maternal growth traits in a multibreed Angus–Brahman cattle population using genomic–polygenic and polygenic models | |
| Lee et al. | Genomic evaluations of sheep in New Zealand | |
| KR20230032434A (en) | Method for predicting carcass traits of Hanwoo population using genomic breeding value based on the reference population of 30 months steers and use thereof | |
| Brito et al. | Genomics and phenomics: Who will be the dairy cows of the future? | |
| Schlicht et al. | Genetic analysis of production traits in turbot (Scophthalmus maximus) using random regression models based on molecular relatedness | |
| Peters et al. | Defining breeding goals and breeding strategies for chicken production systems in Africa | |
| Muir | Incorporating molecular information in breeding programmes: applications and limitations. | |
| JP2011520198A (en) | Methods for generating genetic predictors employing DNA markers and quantitative trait data |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| MK4 | Application lapsed section 142(2)(d) - no continuation fee paid for the application |