TWI326431B - Method and system of analyzing gene sequence - Google Patents
Method and system of analyzing gene sequence Download PDFInfo
- Publication number
- TWI326431B TWI326431B TW96115338A TW96115338A TWI326431B TW I326431 B TWI326431 B TW I326431B TW 96115338 A TW96115338 A TW 96115338A TW 96115338 A TW96115338 A TW 96115338A TW I326431 B TWI326431 B TW I326431B
- Authority
- TW
- Taiwan
- Prior art keywords
- sequence
- analysis
- feature vector
- analysis object
- segment
- Prior art date
Links
- 108090000623 proteins and genes Proteins 0.000 title claims description 44
- 238000000034 method Methods 0.000 title claims description 33
- 239000013598 vector Substances 0.000 claims description 81
- 238000004458 analytical method Methods 0.000 claims description 79
- 239000012634 fragment Substances 0.000 claims description 27
- 238000012300 Sequence Analysis Methods 0.000 claims description 22
- 230000001105 regulatory effect Effects 0.000 claims description 20
- 238000010586 diagram Methods 0.000 claims description 13
- 239000013604 expression vector Substances 0.000 claims description 13
- 230000002018 overexpression Effects 0.000 claims description 13
- 238000013178 mathematical model Methods 0.000 claims description 10
- 230000014509 gene expression Effects 0.000 claims description 5
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 101150066838 12 gene Proteins 0.000 claims 1
- 238000012163 sequencing technique Methods 0.000 claims 1
- 239000011159 matrix material Substances 0.000 description 5
- 230000011218 segmentation Effects 0.000 description 3
- 238000010170 biological method Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000011144 upstream manufacturing Methods 0.000 description 2
- 206010011469 Crying Diseases 0.000 description 1
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical group OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 1
- 108700005075 Regulator Genes Proteins 0.000 description 1
- 206010039740 Screaming Diseases 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 239000013043 chemical agent Substances 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000008267 milk Substances 0.000 description 1
- 210000004080 milk Anatomy 0.000 description 1
- 235000013336 milk Nutrition 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000002773 nucleotide Substances 0.000 description 1
- 125000003729 nucleotide group Chemical group 0.000 description 1
- 108091008025 regulatory factors Proteins 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000009182 swimming Effects 0.000 description 1
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Description
1326431 九、發明說明: 【發明所屬之技術領域】 本發明係有關於基因序列分析方法,特別是有關於一 種藉由分析基因序列之特徵,來尋找該基因序列上被基因 • 調控因子所調控的結合區域。 【先前技術】 基因調控因子與基因之間的交互作用,控制著發育及 # 環境變化時等等重要的生理作用。因此,尋找在基因上被 調控因子作用的片段’不論就生物學理上和醫療上而言, 都是至關重要的一項研究主題。 傳統上’對於基因上被調控因子作用的片段或基因上 調控因子之調控區域(binding site)的尋找,係採用生物學 的方法,例如基因序列足跡法(DNA footprint)。足跡法是 利用化學藥劑,將作為研究對象之一段基因序列切割成片 段,再針對這些切割出的片段,分析其是否為調控區域。 ^ 此方法相當費時和費力。 近年來,除了傳統的生物學方法,也開始運用電腦資 訊技術來輔助基因序列的分析。例如,利用位置權重矩陣 法(position weight matrix)的基因序列分析方法。位置權 重矩陣法是針對已發現的調控區域做分析,統計每個位置 出現某個核甘酸(nucleotide)的次數來得知每個位置的權 重值並將之建立成一矩陣。依據位置權重矩陣法,所有的 調控區域長度都假設是固定的,且每個權重片段位置對於 基因調控區域的結合吸引力(binding affinity)影響是各自1326431 IX. Description of the invention: [Technical field to which the invention pertains] The present invention relates to a method for analyzing a gene sequence, and more particularly to a method for analyzing a gene sequence to be regulated by a gene regulatory factor by analyzing the characteristics of the gene sequence. Combine the area. [Prior Art] The interaction between gene regulatory factors and genes controls important physiological roles such as development and #environmental changes. Therefore, finding a fragment that is genetically regulated by a regulatory factor is a critical research topic both biologically and medically. Traditionally, the search for a gene-regulated fragment or a regulatory site for a gene regulatory factor has been carried out using biological methods such as the DNA footprint. The footprint method is to use a chemical agent to cut a segment of a gene sequence into a fragment, and then analyze whether the fragment is a regulatory region. ^ This method is quite time consuming and laborious. In recent years, in addition to traditional biological methods, computer information technology has also been used to assist in the analysis of gene sequences. For example, a gene sequence analysis method using a position weight matrix method. The position weight matrix method analyzes the discovered regulatory regions, counts the number of nucleotides appearing at each location to know the weight value of each location and establishes it into a matrix. According to the position weight matrix method, the lengths of all regulatory regions are assumed to be fixed, and the influence of each weight segment position on the binding affinity of the gene regulatory region is
Client's Docket No. 0950067tw TT’s Docket N〇:09I2-0912-A5087 丨 TWfl:送件版本)/a 丨 icewu 5 1326431 獨立的。 然而’位置權重矩陣法的上述假設,就基因序列 ^互相影響的觀點而言,並不正確。基因序列常ί 子結構結合的特徵序列通常為小片段 ^ 都出現於基因上游序列,且可能是位於基=二周 歹j中任何部份的小片段,故其長度也可能不同。 序 分析=:需要一種快速,且符合生物學原理的基因序列 【發明内容】 ^發明之目的為提供H由分析基因心 找該基因序列上被基因調控因子所調控的結合區域。“ 為達成上述目的,本發明提供一種基因序 7b. ΐ用以找尋一分析對象片段中的調控因子結合區域 〜dlngslte)。-子片段產生器,其接收該分析對象片 &,並切㈣分析對象片段以產生複數分析對 =寺徵向量產生H’其依據該分析對象子又 ==量。:聚類器,其將該分析對象特 妝^ M线應之—分析對象特徵向量分佈 I群組’以及每-向量群組在該分析對象片段中1 里。一對映器,其將該分析對象特徵向量分 ^一 預設的-背景序列特徵向量做對映,將 ^徵: 量分佈狀態圖中該向量群組之該表現量與。 徵向量之差異大於-預定值者部份標示為過度表現向旦 TT's Docke. No:0912-0912-A50871TWfl;^tS^)/a|icewu 1326431 及排序,並據以篩選出包含調控因子結合區域的該向量群 組。 本發明另提供一種基因序列分析方法,其用以找尋一 分析對象片段中的調控因子結合區域(binding site )。接 • 收該分析對象片段,並切割該分析對象片段以產生複數分 _ 析對象子片段。依據該分析對象子片段產生對應之分析對 象特徵向量。將該分析對象特徵向量予以聚類、分群,以 產生對應之一分析對象特徵向量分佈狀態圖,用以表現該 φ 分析對象特徵向量劃分而成之複數向量群組,以及每一向 ' 量群組在該分析對象片段中的表現量。將該分析對象特徵 - 向量分佈狀態圖與一預設的一背景序列特徵向量做對 映,將該分析對象特徵向量分佈狀態圖中該向量群組之該 表現量與該背景序列特徵向量之差異大於一預定值者部 份標示為過度表現向量群組。針對該過度表現向量群組加 以評分及排序,並據以篩選出包含調控因子結合區域的該 向量群組。 為讓本發明之上述和其他目的、特徵、和優點能更明 ® 顯易懂,下文特舉出較佳實施例,並配合所附圖式,作詳 細說明如下: 【實施方式】 為了讓本發明之目的、特徵、及優點能更明顯易懂, 下文特舉較佳實施例,並配合所附圖示第1圖到第4圖, 做詳細之說明。 本發明說明書提供不同的實施例來說明本發明不同 實施方式的技術特徵。其中,實施例中的各元件之配置係Client's Docket No. 0950067tw TT’s Docket N〇: 09I2-0912-A5087 丨 TWfl: delivery version) /a 丨 icewu 5 1326431 Independent. However, the above assumptions of the position weight matrix method are not correct in terms of the mutual influence of the gene sequences. The signature sequence of the gene sequence often has a small fragment ^ which appears in the upstream sequence of the gene, and may be a small fragment located in any part of the base = two weeks ,j, so its length may also be different. Sequence Analysis =: A fast, biologically-compliant gene sequence is required. [Invention] The purpose of the invention is to provide H for the binding region of the gene sequence to be regulated by a gene regulatory factor. "To achieve the above object, the present invention provides a gene sequence 7b. ΐ for finding a regulatory factor binding region in the analysis target fragment ~ dlngslte). - a sub-segment generator that receives the analysis object slice & and cuts (4) The analysis object segment is generated to generate a complex analysis pair = temple sign vector to generate H' according to the analysis object sub-== quantity.: clusterer, which analyzes the object of the analysis object M line-analyze object feature vector distribution I The group 'and each-vector group are in the analysis object segment. The pair of mappers, which map the analysis object feature vector to a preset-background sequence feature vector, The distribution of the vector group in the distribution state diagram and the difference between the eigenvectors greater than the predetermined value are marked as excessive performance to TT's Docke. No:0912-0912-A50871TWfl;^tS^)/a|icewu 1326431 and sorting, and screening the vector group including the regulatory factor binding region. The present invention further provides a gene sequence analysis method for finding a regulatory factor binding site in an analysis target fragment.Receiving the analysis target segment, and cutting the analysis target segment to generate a complex analysis target sub-segment. The corresponding analysis object feature vector is generated according to the analysis object sub-segment. The analysis object feature vector is clustered and grouped to generate Corresponding to one of the analysis object feature vector distribution state maps, which is used to represent the complex vector group divided by the φ analysis object feature vector, and the performance amount of each of the 'quantity group' in the analysis target segment. The feature-vector distribution state map is mapped with a preset background sequence feature vector, and the difference between the representation amount of the vector group and the background sequence feature vector in the analysis object feature vector distribution state diagram is greater than a predetermined value Partially labeled as an over-expressed vector group, the over-expressed vector group is scored and ranked, and the vector group containing the regulatory factor binding region is selected accordingly. To make the above and other objects and features of the present invention And advantages can be more clearly understood, and the preferred embodiments are described below, and in conjunction with the drawings, The detailed description is as follows: [Embodiment] In order to make the objects, features, and advantages of the present invention more comprehensible, the preferred embodiments of the present invention will be described in detail with reference to Figures 1 through 4 of the accompanying drawings. The description of the present invention provides various embodiments to illustrate the technical features of various embodiments of the present invention, wherein the configuration of each component in the embodiment is
Client's Docket No. 0950067tw TTs Docket No:0912-0912-A50871TWf(送件版本)/alicewu 7 丄」仂431 :=用以限制本發明。且實施例中圖式標號 的關聯係、為了簡化說明’並非意指不同實施例之間 -立第1圖顯示依據本發明實施例之基因序列分析 圖:基因序列分析系統1〇係用以找尋一分析對、象 二中的調控因子結合區域(binding site) , jl包括.、 ”面U、背景序列產生器12、子片段產生哭、13蛀輪入 量遙哇哭u取丄 门仅座生為13、特徵向 出介“。器15、對映器16、比較排序器17、輸 輸入介面11接收對應於該分析對 象片段資料110。該分析對象片段資料110“ ^ =對應於其所欲分析處理之基因序列片段的基因序』 月厅、序列產生器12依據一數學模—北 列片段資料⑽。該數學模型可以為 厅、序 使用者輸入數學模型。在此,該預設數學 =,,其產生與該分析對以= ⑶。由於基因的上游序列中,通序列片段資料 因子結合區域,其餘縣在分析均可^段是調控 故經由數學模型隨機產生的f景 )為雜帅〇叫, 突出的序列片段。亦即,背景序列片‘交出特別 除雜訊資料,以利找出分析對象片段資料L係用以消 的部份。 、料110中過度表現Client's Docket No. 0950067tw TTs Docket No: 0912-0912-A50871TWf (delivery version)/alicewu 7 丄"仂431:= is used to limit the present invention. In the embodiment, the schematic reference numerals are used to simplify the description. 'It is not intended to mean that between different embodiments. FIG. 1 shows a gene sequence analysis diagram according to an embodiment of the present invention: a gene sequence analysis system 1 is used to find An analysis pair, the binding factor binding site in the second, jl includes., "face U, background sequence generator 12, sub-segment crying, 13 蛀 round-in volume, wow, cry, u take the door only Born to 13, the feature is introduced. The processor 15, the mapper 16, the comparison sequencer 17, and the input interface 11 receive the corresponding analysis object segment data 110. The analysis target fragment data 110 "^ = the gene sequence corresponding to the gene sequence fragment to be analyzed and processed", the sequence generator 12 according to a mathematical model - the northern column fragment data (10). The mathematical model can be the hall, the order The user inputs a mathematical model. Here, the preset math =, which is generated with the analysis pair = (3). Because of the upstream sequence of the gene, the sequence of the fragment data is combined with the region, and the remaining counties are analyzed. The f-view that is randomly generated by the mathematical model is a screaming, prominent sequence fragment. That is, the background sequence piece 'excludes special noise-removing data to facilitate the analysis of the segment of the analysis object L is used to eliminate Part of the material
ClienVs Dockei No. Q950067tw TT;s Docket No:09l2-09}2-A50871TW^^^)/aljcev .子片段產生器13其切割該分析對象片段資料U0以ClienVs Dockei No. Q950067tw TT;s Docket No: 09l2-09}2-A50871TW^^^)/aljcev. The sub-segment generator 13 cuts the analysis target fragment data U0 to
Clients Dockei No. Q950067tw 丄326431 產生複數分析對象子片段資料131,並且割該背景序歹^ μ 段資料120以產生複數背景序列子片段資料133。子片# 產生斋13可以使用滑動視窗(sliding window)來執行上 述切割的程序’其中該滑動視窗的大小係為使用者指定。 特徵向量產生器14依據該分析對象子片段資料131 • 產生對應之分析對象特徵向量141,並且處理該背景序列 子片段資料133,以產生對應之該背景序列特徵向量 143 °其中該特徵向量產生器14計算分段模數(K-mer;^ • 項目頻率(term frequency),並依據該分段模數的項目頻率 界定該分析對象特徵向量141及該背景序列特徵向量 143。 聚類器15將該分析對象特徵向量141予以聚類、分 群’以產生對應之一分析對象特徵向量分佈狀態圖151 , 用以表現該分析對象特徵向量141劃分而成之複數向量 群組,以及每一向量群組在該分析對象片段中的表現量。 该聚類器14使用類神經網路(Neural Networks)之自我組 織圖分群(Self-Organizing Map Clustering)方法將該分析 9 對象特徵向量予以聚類、分群,其中該分析對象特徵向量 分佈狀態圖151可以為二維方式表現。 對映器16將該分析對象特徵向量分佈狀態圖151與 該背景序列特徵向量143做對映,將該分析對象特徵向量 分佈狀態圖151中該向量群組之該表現量與該背景序列 特徵向量143之差異大於一預定值者部份標示為過度表 現向量群組,並將過度表現向量群組161的資料傳送給比 較排序器17。Clients Dockei No. Q950067tw 丄 326431 generates a plurality of analysis object sub-segment data 131, and cuts the background sequence μ^ segment data 120 to generate a plurality of background sequence sub-segment data 133. The sub-slices # generate a fast 13 can use a sliding window to perform the above-described cutting procedure 'where the size of the sliding window is specified by the user. The feature vector generator 14 generates a corresponding analysis object feature vector 141 according to the analysis object sub-segment data 131, and processes the background sequence sub-segment data 133 to generate a corresponding background sequence feature vector 143 °, wherein the feature vector generator 14 calculating a segmentation modulus (K-mer; ^ • term frequency, and defining the analysis object feature vector 141 and the background sequence feature vector 143 according to the item frequency of the segmentation modulus. The clusterer 15 The analysis object feature vector 141 is clustered and grouped to generate a corresponding one of the analysis object feature vector distribution state maps 151 for representing the complex vector group divided by the analysis object feature vector 141, and each vector group. The amount of expression in the segment of the analysis object. The clusterer 14 clusters and groups the analysis object eigenvectors using a Neural Networks-based Self-Organizing Map Clustering method. The analysis object feature vector distribution state diagram 151 can be expressed in a two-dimensional manner. The imager 16 analyzes the object of the analysis object. The quantity distribution state diagram 151 is mapped to the background sequence feature vector 143, and the difference between the representation amount of the vector group and the background sequence feature vector 143 in the analysis object feature vector distribution state diagram 151 is greater than a predetermined value. The shares are marked as an over-expression vector group, and the data of the over-expression vector group 161 is transmitted to the comparison sequencer 17.
Client’s Docket No. 0950067tw TT’s Docket No:〇912-0912-A50871 TWfl;送件版本)/alicewu 分及排1排17針對該過度表現向量群組161加以評 群組171 :並==出包含調控因子結合區域的該向量 二ί:=排序? 17計算該過度表現向量群組二 表;向:群徵向|143的相對表現量’並計算該過度 里群、,且和该分析對象特徵向量之相對表現晋,if e 據上述2结果對該過度表現向量群組心:;。亚依 流程圖。圖顯示依據本發明實施例之基因序列分析方法的 #資=ΓG1 ’純對應於分析對象諸之—分析對象片 應於其所欲分❹者提供對 步驟S209 ^ 列片段的基因序列資料。 料!20。該數學Hf;模型’ Μ 一背景序列片段資 入數學模型。在此,該預t免數風n學拉型或一使用者輸 且有象片段資料相同數量長度,且 =序=Γ、,背景序列片峨12〇。由於基因的 游序列中,通常只有一小片段 在分析上都可視為雜訊㈣,二= 即,^二序列資料來比較出特別突出的序列片段。亦 出分斤^12〇係用以消除雜訊資^ 刀析對象片&貢料1U)中過度表現的部份。 八;^^ S211’切吾“亥分析縣片段資料110以產生複數 刀析對象子片段資料131。 座生獲数 步驟奶2’切割該背景序列片段資料12〇以產生複數 __ 1〇 1326431 月景序列子片段資料]33。 可以使用滑動視窗(siidi 的程序’其令該滑動視窗的大小係為使仏定蝴割 第3圖顯示滑動視窗的切割方 窗寬度為L,滑動距離為d的 下來 ::切二?所示,依據滑動視窗首==二 Ί圍為點A和點c之間,其中點A到點 段2二滑動視窗向下游處滑動距離d,產生子片Client's Docket No. 0950067tw TT's Docket No: 〇912-0912-A50871 TWfl; delivery version) /alicewu points and row 1 row 17 for the over-expression vector group 161 to be evaluated group 171: and == out contain control factors Combine the region's vector two ί:= sort? 17 calculating the over-expression vector group two table; to: the relative sign of the group sign to | 143 and calculating the over-middle group, and the relative performance of the eigenvector of the analysis object, if e according to the above 2 results The over-expression vector group heart:;. Yayi flow chart. The figure shows that the gene sequence analysis method according to the embodiment of the present invention corresponds to the analysis object - the analysis target piece should provide the gene sequence data of the step S209 ^ column in its desired branch. material! 20. The mathematical Hf; model' Μ a background sequence fragment is assigned to the mathematical model. Here, the pre-t-free wind n-type pull type or a user input has the same number of lengths as the clip data, and = order = Γ, and the background sequence is 〇 12〇. Because of the gene's swimming sequence, usually only a small fragment can be visually analyzed as noise (4), and two = ie, ^ two sequence data to compare the particularly prominent sequence fragments. Also, it is used to eliminate the over-expressed part of the miscellaneous information and the 1U).八;^^ S211 '切吾" Hai analysis county fragment data 110 to generate a plurality of knife analysis object sub-segment data 131. Block acquisition step milk 2' cut the background sequence fragment data 12 〇 to generate a complex number __ 1〇1326431 The moon sequence sub-segment data]33. You can use the sliding window (the siidi program's size is such that the size of the sliding window is such that the cutting window is 3, the width of the cutting window is L, and the sliding distance is d. Down:: cut two?, according to the sliding window first == two circumferences between point A and point c, wherein point A to point 2 two sliding window sliding distance d downstream, resulting in sub-pieces
ί為L Λ!,B和點D之間,其中點B到,點D的距 為L ’而點衫至丨丨^JL P«5^ J ] 的距離為4。同樣地,滑動禎窑五 3處:動距離d’產生子片段33,其範圍;點= 距㈣"^到點£的距離為[,而點C_D的 ㈣S231,依據該分析對象子片段資料⑶ 之分析對象特徵向量141。 座生對應 步驟S232,處理該背景序列子片段資料133,以產生 對應之该背景序列特徵向量143。 在此,上述特徵向量之產生,係可以計算分段模數 (:)的項目頻率(term叫咖⑼,並依據 it定該分析對象特一及該背』 以第4圖為例’說明上述特徵向量產生方法中分段模 =的項目頻率的計算。以第4圖所示之片段『aaattcg t例。當分段模數⑴為1時(1儒),計算得出,在 AAATTCG』這個片段裡,^贿的項目1,出現3次,ί is L Λ!, between B and point D, where point B to point D is at a distance L ́ and the distance from the shirt to 丨丨^JL P«5^ J ] is 4. Similarly, the sliding kiln five places: the moving distance d' produces a sub-segment 33, the range; point = distance (four) " ^ to point £ distance [, and point C_D (four) S231, according to the analysis object sub-fragment data (3) The analysis object feature vector 141. The background sequence sub-segment data 133 is processed to generate the corresponding background sequence feature vector 143. Here, the generation of the feature vector described above can calculate the item frequency of the segmentation modulus (:) (term is called coffee (9), and according to the specific object and the back of the analysis object], taking FIG. 4 as an example to illustrate the above The calculation of the item frequency of the piecewise mode = in the eigenvector generation method. The fragment shown in Fig. 4 is aaattcg t. When the piecewise modulus (1) is 1 (1 Confucian), it is calculated in AAATTCG. In the snippet, the item 1 of bribes appeared 3 times.
Client's Docket Mo. 〇950067tw TTsD〇ctoMO:09l2-09 丨 2-A5087!TWfl[送件版本)/aHcewuClient's Docket Mo. 〇950067tw TTsD〇ctoMO: 09l2-09 丨 2-A5087!TWfl[送件版)/aHcewu
Client's Docket Mo. 〇950067tw 11 1326431 項目‘丁,出現2次,項目s 其對應之項目頻率為:項項目‘G ’各出現1次。因此, ‘C,及項目‘G|A1典、、八為3,項目‘T,為2,項目 、 各為虽分段模數 2-mer的項目,AA,出a數⑴為2~(2-mer), ‘⑶,各出現!次。因此,其=靡項目,AT,、‘TT,、‘TC,及 為2,項目,AT’、‘ττ,、% J應之項目頻率為··項目’AA’ 項目的項目頻率則均二及^各為其他的2-贿 (一),3.二=目〇2分=數⑴為3時Client's Docket Mo. 〇950067tw 11 1326431 Item ‘Ding, appears 2 times, item s The corresponding project frequency is: item ‘G ’ appears once each. Therefore, 'C, and the project 'G|A1 code, eight is 3, the item 'T is 2, the item, each is a segmental modulus 2-mer item, AA, the number of a (1) is 2~( 2-mer), '(3), each appears! Times. Therefore, its =靡 project, AT,, 'TT, 'TC, and 2, project, AT', 'ττ, and % J should be the project frequency of the project. The project frequency of the project 'AA' is equal to And ^ each for the other 2 bribes (1), 3. 2 = witness 2 points = number (1) is 3
各出現 i 二欠。因此,二L T、ATT,、‘TTC,及‘丁CG, ‘ATT,、‘TTC,及‘TCG:各:應之項目頻率為:項目‘AAT、 頻率則均為0。 為1,其他的her項目的項目 步驟S24 ’將該分析對象特徵向量141予以聚類、分 田,以產生對應之一分析對象特徵向量分佈狀態圖151, 2表現齡析對象特徵向4 141劃分而成之複數向量 、且以及母向1群組在該分析對象片段中的表現量。 上述程序係可以使用類神經網路(Neural Networks)之自我 組織圖分群(Self-〇rganizing Map Clustering)方法實現 之,其中該分析對象特徵向量分佈狀態圖151可以二 方式表現。 匕步驟S25 ’將該分析對象特徵向量分佈狀態圖ι51與 該背景序列特徵向量143做對映,將該分析對象特徵向量 分佈狀態圖151中該向量群組之該表現量與該背景序列 特徵向量143之差異大於一預定值者部份標示為過度表 現向量群組。 步驟S26 ’針對該過度表現向量群組ι61加以評分及Each appears i owes. Therefore, two L T, ATT, ‘TTC, and ‘Ding CG, ‘ATT, ‘TTC, and ‘TCG: each: The frequency of the project should be: the item ‘AAT, the frequency is 0. For example, the other project item S24 of the her project is to cluster and divide the analysis object feature vector 141 to generate a corresponding one of the analysis object feature vector distribution state maps 151, and 2 to represent the age-resolved object features to 4 141. The complex vector and the amount of representation of the parent group 1 in the analysis target segment. The above program can be implemented using a Neural Networks-based Self-〇rganizing Map Clustering method, wherein the analysis object feature vector distribution state map 151 can be expressed in two ways. Step S25 ' analyze the analysis object feature vector distribution state map ι51 and the background sequence feature vector 143, and the expression quantity of the vector group in the analysis object feature vector distribution state diagram 151 and the background sequence feature vector The portion where the difference of 143 is greater than a predetermined value is indicated as an over-expression vector group. Step S26' scores the over-expression vector group ι61 and
Client^ Docket No. 0950067tw TT's Docket N〇:0912-0912-A5087,TWf(iSiTO)/alicewu 1326431 排序。上述比較排序程序,係可以藉由計算該過度表現向 量群組161和該背景序列特徵向量143的相對表現量,並 計算該過度表現向量群組和該分析對象特徵向量之相對 表現量,並依據上述結果對該過度表現向量群組161評分 及排序。 在步驟S27中,依據上述評分排序的結果,篩選出包 含調控因子結合區域的該向量群組171。 雖然本發明已以較佳實施例揭露如上,然其並非用以 φ 限定本發明,任何熟悉此項技藝者,在不脫離本發明之精 神和範圍内,當可做些許更動與潤飾,因此本發明之保護 • 範圍當視後附之申請專利範圍所界定者為準。Client^ Docket No. 0950067tw TT's Docket N〇: 0912-0912-A5087, TWf(iSiTO)/alicewu 1326431 Sort by. The comparison sorting program may calculate the relative performance of the over-expression vector group 161 and the background sequence feature vector 143, and calculate the relative performance of the over-expression vector group and the analysis object feature vector, and The above results score and rank the over-expression vector group 161. In step S27, the vector group 171 containing the regulatory factor binding region is selected based on the result of the ranking of the above scores. While the present invention has been described above in terms of the preferred embodiments thereof, it is not intended to limit the invention, and any one skilled in the art can make some modifications and refinements without departing from the spirit and scope of the invention. Protection of the invention • The scope is subject to the definition of the scope of the patent application.
Client's Docket No. 0950067tw 13 TT’s Docket No:0912-0912-A50871TWf(送件版本)/alicewu 1326431 【圖式簡單說明】 第1圖顯示依據本發明實施例之基因序列分析系統之 示意圖。 第2圖顯示依據本發明實施例之基因序列分析方法的 流程圖。 第3圖顯示滑動視窗的切割方式的示意圖。 第4圖顯示依據本發明實施例特徵向量產生方法之示 意圖。 【主要元件符號說明】 輸入介面11 ; 子片段產生器13 ; 聚類器15 ; 比較排序器17 ; 基因序列分析系統10 ; 背景序列產生器12 ; 特徵向量產生器14 ; 對映器16 ; 輸出介面18。Client's Docket No. 0950067tw 13 TT's Docket No: 0912-0912-A50871TWf (delivery version) / alicewu 1326431 [Simplified Schematic] FIG. 1 is a schematic diagram showing a gene sequence analysis system according to an embodiment of the present invention. Fig. 2 is a flow chart showing a method of analyzing a gene sequence according to an embodiment of the present invention. Figure 3 shows a schematic diagram of the way the sliding window is cut. Figure 4 is a diagram showing the eigenvector generation method according to an embodiment of the present invention. [Major component symbol description] Input interface 11; sub-segment generator 13; clusterer 15; comparison sorter 17; gene sequence analysis system 10; background sequence generator 12; feature vector generator 14; Interface 18.
Client's Docket No. 0950067tw TT’s Docket No:0912-0912-A50871TWf(送件版本)/alicewu 14Client's Docket No. 0950067tw TT’s Docket No:0912-0912-A50871TWf (Send version)/alicewu 14
Claims (1)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW96115338A TWI326431B (en) | 2007-04-30 | 2007-04-30 | Method and system of analyzing gene sequence |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW96115338A TWI326431B (en) | 2007-04-30 | 2007-04-30 | Method and system of analyzing gene sequence |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| TW200842735A TW200842735A (en) | 2008-11-01 |
| TWI326431B true TWI326431B (en) | 2010-06-21 |
Family
ID=44822099
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| TW96115338A TWI326431B (en) | 2007-04-30 | 2007-04-30 | Method and system of analyzing gene sequence |
Country Status (1)
| Country | Link |
|---|---|
| TW (1) | TWI326431B (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TWI420007B (en) * | 2011-03-04 | 2013-12-21 | Hsueh Ting Chu | System and method of assembling dna reads |
| TWI582631B (en) * | 2015-11-20 | 2017-05-11 | 財團法人資訊工業策進會 | Dna sequence analyzing system for analyzing bacterial species and method thereof |
-
2007
- 2007-04-30 TW TW96115338A patent/TWI326431B/en not_active IP Right Cessation
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TWI420007B (en) * | 2011-03-04 | 2013-12-21 | Hsueh Ting Chu | System and method of assembling dna reads |
| TWI582631B (en) * | 2015-11-20 | 2017-05-11 | 財團法人資訊工業策進會 | Dna sequence analyzing system for analyzing bacterial species and method thereof |
Also Published As
| Publication number | Publication date |
|---|---|
| TW200842735A (en) | 2008-11-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Roshan et al. | Probalign: multiple sequence alignment using partition function posterior probabilities | |
| Shahmuradov et al. | bTSSfinder: a novel tool for the prediction of promoters in cyanobacteria and Escherichia coli | |
| Medvedev et al. | Error correction of high-throughput sequencing datasets with non-uniform coverage | |
| AU2014340461B2 (en) | Systems and methods for using paired-end data in directed acyclic structure | |
| Degroeve et al. | SpliceMachine: predicting splice sites from high-dimensional local context representations | |
| Katoh et al. | Recent developments in the MAFFT multiple sequence alignment program | |
| Ramírez-Sánchez et al. | Plant proteins are smaller because they are encoded by fewer exons than animal proteins | |
| US9779205B2 (en) | Systems and methods for rational selection of context sequences and sequence templates | |
| Delport et al. | Datamonkey 2010: a suite of phylogenetic analysis tools for evolutionary biology | |
| CN104584022B (en) | A kind of system and method generating biomarker signature | |
| US20090076735A1 (en) | Method, system and software arrangement for comparative analysis and phylogeny with whole-genome optical maps | |
| AU2014340461A1 (en) | Systems and methods for using paired-end data in directed acyclic structure | |
| WO2017120128A1 (en) | Systems and methods for adaptive local alignment for graph genomes | |
| Morgado et al. | Computational tools for plant small RNA detection and categorization | |
| Ma et al. | DNA sequence classification via an expectation maximization algorithm and neural networks: a case study | |
| Oh et al. | Landscape of gene transposition–duplication within the Brassicaceae family | |
| Ritchie et al. | Mireval: a web tool for simple microRNA prediction in genome sequences | |
| Bruneau et al. | A clustering package for nucleotide sequences using Laplacian Eigenmaps and Gaussian Mixture Model | |
| Oğul et al. | A discriminative method for remote homology detection based on n-peptide compositions with reduced amino acid alphabets | |
| TWI326431B (en) | Method and system of analyzing gene sequence | |
| Opperdoes | Phylogenetic analysis using | |
| Middleton et al. | NoFold: RNA structure clustering without folding or alignment | |
| KR20200102182A (en) | Method and apparatus of the Classification of Species using Sequencing Clustering | |
| De Clercq et al. | Deep learning for classification of DNA functional sequences | |
| US20210158896A1 (en) | Information processing system, mutation detection system, storage medium, and information processing method |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| MM4A | Annulment or lapse of patent due to non-payment of fees |