TWI396103B - Method for data clustering - Google Patents
Method for data clustering Download PDFInfo
- Publication number
- TWI396103B TWI396103B TW98110848A TW98110848A TWI396103B TW I396103 B TWI396103 B TW I396103B TW 98110848 A TW98110848 A TW 98110848A TW 98110848 A TW98110848 A TW 98110848A TW I396103 B TWI396103 B TW I396103B
- Authority
- TW
- Taiwan
- Prior art keywords
- point
- points
- data
- seed
- expansion
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 45
- 238000012216 screening Methods 0.000 claims description 11
- 238000010586 diagram Methods 0.000 description 4
- 239000003550 marker Substances 0.000 description 3
- 230000005484 gravity Effects 0.000 description 2
- 230000001788 irregular Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Description
本發明係關於一種資料分群方法,特別是關於一種基於密度式演算法之資料分群方法。The present invention relates to a data grouping method, and more particularly to a data grouping method based on a density algorithm.
習知密度式資料分群方法主要係以資料點密度做為分群之依據。例如在給定的參數ε(掃描半徑)及MinPts(最少包含點)下,若某一區域之資料點密度滿足設定之條件,即以該區域進行擴張搜尋,並逐步合併其他同樣密度滿足設定條件之區域,進而得到最終分群結果,代表性演算法包括有DBSCAN、IDBSCAN或KIDBSCAN等。該習知分群方法雖可有效的偵測不規則之圖形及濾除雜訊點,惟其分群所需時間亦相對增加。The conventional density data grouping method mainly uses data point density as the basis for grouping. For example, given a parameter ε (scanning radius) and MinPts (minimum inclusion point), if the data point density of a certain area satisfies the set condition, the expansion search is performed in the area, and the other same density is gradually merged to satisfy the set condition. The region, and finally the final clustering results, representative algorithms include DBSCAN, IDBSCAN or KIDBSCAN. Although the conventional grouping method can effectively detect irregular patterns and filter out noise points, the time required for grouping is relatively increased.
以下針對幾種較具代表性的習用密度式資料分群技術進行說明:The following is a description of several representative custom density data grouping techniques:
1、DBSCAN資料分群方法:步驟一係由一資料集中之數個資料點預先隨機選擇其中一資料點做為初始種子點;步驟二係判斷目前初始種子點的半徑ε範圍內是否有超過半徑MinPts的資料點,若達到門檻值則將目前範圍內的資料點歸類到同一群集內並作為種子點,並從範圍內的其它種子點一一進行擴張;步驟三係持續前述步驟二,直到該資料集中所有資料點都被歸類完畢為止。該習知DBSCAN資料分群方法是以較為合乎邏輯的密度判斷方式來進行分群,故可用以濾除雜訊及適用於不規則圖樣的資料點等;但因為必須對每個資料點進行繁複的密度判斷,故造成分群時間較為冗長。1. DBSCAN data grouping method: Step 1 selects one of the data points in advance as one of the initial seed points by a plurality of data points in the data set; and the second step determines whether the current radius of the initial seed point exceeds the radius MinPts The data points, if the threshold is reached, classify the data points in the current range into the same cluster and serve as seed points, and expand from other seed points in the range; Step 3 continues the foregoing step two until the All data points in the data set are classified. The conventional DBSCAN data grouping method is grouped in a more logical density judgment manner, so it can be used to filter out noise and data points suitable for irregular patterns, etc., but because of the complicated density of each data point. Judging, it is more cumbersome to group.
2、IDBSCAN資料分群方法:此法係由B.Borah等學者於2004年所提出之密度式資料分群技術,其主要針對前述習知DBSCAN資料分群方法係循序判斷資料點進行擴散而耗時的行為進行改良,而採用經由減少查詢次數而提升分群速度的策略。該習知IDBSCAN資料分群方法係於擴張種子點半徑ε之掃描範圍邊界上等距設置8個標記邊界點,該擴張種子點半徑ε之掃描範圍內的資料點僅選取最靠近該8個標記邊界點之資料點作為種子點,如此減少種子點之數量,便可減少重複的擴張動作,以克服DBSCAN資料分群方法中種子點數量過多而造成速度緩慢之缺點,惟所能減少的分群時間仍相當有限。2. IDBSCAN data grouping method: This method is a density data grouping technique proposed by B. Borah and other scholars in 2004. It mainly focuses on the above-mentioned conventional DBSCAN data grouping method, which is based on the sequential judgment of data points for time-consuming behavior. Improvements were made to adopt a strategy of increasing the speed of grouping by reducing the number of queries. The conventional IDBSCAN data grouping method is to set eight mark boundary points equidistantly on the scan range boundary of the expanded seed point radius ε, and the data points in the scan range of the expanded seed point radius ε are only selected closest to the eight mark boundaries. The data point of the point is used as the seed point. By reducing the number of seed points, the repeated expansion action can be reduced to overcome the shortcomings of the excessive number of seed points in the DBSCAN data grouping method, but the reduced grouping time is still quite limited. .
一般而言,該IDBSCAN資料分群方法雖可將一擴張種子點掃描範圍內之種子點數量減少至不大於8個,然而,由於離該擴張種子點較近的資料點覆蓋面積較大,若將該些資料點納入作為種子點,則會增加搜尋的時間成本。再且,即使該擴張種子點掃描範圍內之種子點數量係不大於8個,然而相鄰之種子點其掃描範圍重疊比例較高,造成其重複擴張之比例亦相當高,進而增加時間成本。In general, the IDBSCAN data grouping method can reduce the number of seed points in an expanded seed point scanning range to no more than 8. However, since the data points close to the expanded seed point cover a large area, if The inclusion of these data points as seed points will increase the time cost of the search. Moreover, even if the number of seed points in the scanning range of the expanded seed is not more than 8, the adjacent seed points have a higher overlapping ratio of scanning ranges, and the proportion of repeated expansion is also relatively high, thereby increasing the time cost.
3、KIDBSCAN資料分群方法:KIDBSCAN資料分群方法係針對在IDBSCAN資料分群方法尋找標記邊界點時,若依序從資料集中所讀進來的資料點是位於低密度的區域時,將會造成多餘種子點搜尋的動作,而造成擴張不佳的效果。為了減少取樣次數,該KIDBSCAN資料分群方法提出加入菁英點來進行擴張的概念。其所要設定的三個參數為:菁英點(K)、半徑(ε)與最少包含點(MinPts)。而執行步驟說明如下:(1)先利用K-Means演算法找到資料集中的K個重心點,再尋找距離這些重心點最近的K個資料點將其定義為該菁英點;(2)將K個菁英點移到資料集的最前端;(3)執行IDBSCAN資料分群方法。如此便可加快資料分群之速度。基於上述原因,有必要進一步改良上述習知資料分群方法。3, KIDBSCAN data grouping method: KIDBSCAN data grouping method is for the IDBSCAN data grouping method to find the marking boundary point, if the data points read in order from the data set are located in the low density area, it will cause redundant seed points Search for movements that cause poor expansion. In order to reduce the number of sampling times, the KIDBSCAN data grouping method proposes the concept of adding elite points for expansion. The three parameters to be set are: elite point (K), radius (ε) and minimum inclusion point (MinPts). The execution steps are as follows: (1) Firstly, use K-Means algorithm to find the K center of gravity points in the data set, and then find the K data points closest to these center of gravity points to define it as the elite point; (2) K The elite points are moved to the forefront of the data set; (3) the IDBSCAN data grouping method is implemented. This speeds up data grouping. For the above reasons, it is necessary to further improve the above-described conventional data grouping method.
本發明係提供一種資料分群方法,其主要目的是減少基於密度式演算法之資料分群方法中擴張種子點之數目。The present invention provides a data grouping method whose main purpose is to reduce the number of expanded seed points in a data grouping method based on a density algorithm.
本發明之次要目的是篩選去除較靠近擴張核心點之資料點。A secondary objective of the present invention is to screen for data points that are closer to the expanded core point.
為達到前述發明目的,本發明所運用之技術手段及藉由該技術手段所能達到之功效包含有:一種資料分群方法,係包含係包含一參數設定步驟設定一掃描半徑參數及一最少包含點參數,並由數筆資料點中讀取其中一筆資料點作為一擴張核心點,且位於該擴張核心點之掃描範圍內的資料點係為鄰近資料點;一擴張詢問步驟係搜尋位於該擴張核心點之掃描範圍內的資料點,並定義為鄰近資料點;一雜訊判斷步驟係判斷該鄰近資料點之數目是否小於該最少包含點參數;一邊界點標記步驟係於該擴張核心點之掃描範圍外設定數個標記邊界點,並分別選取位於各該標記邊界點之掃描範圍內,且最靠近各該標記邊界點之鄰近資料點作為擴張種子點;一篩選步驟係判斷是否任連續數個標記邊界點皆分別有對應之擴張種子點,若判斷為「是」,則刪除該連續數個標記邊界點所對應之其中一擴張種子點,並將剩餘之擴張種子點加入一種子列表中,再進行一種子選擇步驟;若判斷為「否」,則將該擴張種子點加入該種子列表中,再進行該種子選擇步驟;該種子選擇步驟係以該種子列表中之所有擴張種子點分別作為該擴張核心點,並一一進行該擴張詢問步驟;及一第一判斷步驟係判斷是否所有資料點皆已被分群或者被定義為雜訊,若判斷為「否」,則讀取另一筆資料點作為該擴張核心點並進行該擴張詢問步驟;若判斷為「是」則終止。藉此,本發明可於具有高分群準確度之前提下,有效降低搜尋之運算成本。In order to achieve the foregoing object, the technical means and the functions achievable by the technical method include: a data grouping method, wherein the system includes a parameter setting step of setting a scan radius parameter and a minimum inclusion point. a parameter, and one of the data points is read as a diverging core point, and the data points located in the scanning range of the expansion core point are adjacent data points; an expansion query step is searching for the expansion core a data point within the scanning range of the point and defined as a neighboring data point; a noise judging step is to determine whether the number of the neighboring data points is less than the minimum containing point parameter; a boundary point marking step is to scan the expanded core point Setting a plurality of mark boundary points outside the range, and respectively selecting the scanning data points located in the boundary points of the mark points, and the neighboring data points closest to each mark boundary point as the expansion seed points; a screening step is to determine whether there are consecutive numbers The marker boundary points respectively have corresponding expansion seed points. If the determination is "Yes", the consecutive number of labels are deleted. One of the expansion points corresponding to the boundary point, and adding the remaining expanded seed points to a sub-list, and then performing a sub-selection step; if the determination is No, the expanded seed point is added to the seed list. And performing the seed selection step; the seed selection step is performed by using all the expanded seed points in the seed list as the expansion core points, and performing the expansion inquiry step one by one; and a first determining step determining whether all the data points are determined All have been grouped or defined as noise. If the judgment is "No", another data point is read as the expansion core point and the expansion inquiry step is performed; if it is judged as "Yes", it is terminated. Thereby, the present invention can be provided before having high grouping accuracy, effectively reducing the computational cost of the search.
為讓本發明之上述及其他目的、特徵及優點能更明顯易懂,下文特舉本發明之較佳實施例,並配合所附圖式,作詳細說明如下:請參照第1及2圖所示,本發明較佳實施例之資料分群方法係藉由一電腦系統連接至少一資料庫作為執行架構,該資料庫中係存有數筆資料點11,本發明之資料分群方法係進行一參數設定步驟S1,藉由於該電腦系統設定一掃描半徑R(Eps)參數及一最少包含點(Minpts)參數,並由該資料庫中讀取其中一筆資料點11作為一擴張核心點12。為方便後續說明,於此將〝掃描範圍〞一詞定義為以該掃描半徑R參數作為半徑進行掃描所涵蓋之範圍。The above and other objects, features and advantages of the present invention will become more <RTIgt; The data grouping method according to the preferred embodiment of the present invention is a computing system in which at least one database is connected by a computer system. The database has a plurality of data points 11 stored therein. The data grouping method of the present invention performs a parameter setting. Step S1, by the computer system setting a scan radius R (Eps) parameter and a minimum inclusion point (Minpts) parameter, and reading one of the data points 11 as an expansion core point 12 from the database. For the convenience of the following description, the term "scanning range" is defined herein as the range covered by scanning with the scanning radius R parameter as a radius.
請參照第1及3圖所示,本發明較佳實施例之資料分群方法係進行一擴張詢問步驟S2,搜尋位於該擴張核心點12之掃描範圍內的資料點11,並定義為鄰近資料點13。更詳言之,本實施例係透過該電腦系統對該資料庫內之資料點11進行搜尋,並將位於該擴張核心點12之掃描範圍內的資料點11定義為鄰近資料點13。Referring to Figures 1 and 3, the data grouping method of the preferred embodiment of the present invention performs an expansion query step S2 to search for data points 11 located within the scan range of the expanded core point 12 and define them as neighboring data points. 13. In more detail, in this embodiment, the data point 11 in the database is searched through the computer system, and the data point 11 located in the scanning range of the expanded core point 12 is defined as the neighboring data point 13.
請參照第1及3圖所示,本發明較佳實施例之資料分群方法係進行一雜訊判斷步驟S3,判斷該鄰近資料點13之數目是否小於該最少包含點參數,若判斷為「否」,則該擴張核心點12及鄰近資料點13皆分為同一群集,並進行一邊界點標記步驟S4;若判斷為「是」,則該擴張核心點12及鄰近資料點13皆認定為雜訊,並進行一第一判斷步驟S7。Referring to Figures 1 and 3, the data grouping method of the preferred embodiment of the present invention performs a noise judging step S3 to determine whether the number of adjacent data points 13 is less than the minimum inclusion point parameter. The expansion core point 12 and the adjacent data point 13 are all divided into the same cluster, and a boundary point marking step S4 is performed; if the determination is YES, the expansion core point 12 and the adjacent data point 13 are all identified as miscellaneous. And perform a first determining step S7.
請參照第1及4圖所示,本發明較佳實施例之資料分群方法之邊界點標記步驟S4係於該擴張核心點12之掃描範圍外設定數個標記邊界點14,再分別選取位於各該標記邊界點14之掃描範圍內,且最靠近各該標記邊界點14之鄰近資料點13作為擴張種子點15,完成後進行一篩選步驟S5。舉例而言,本實施例係以該擴張核心點12為圓心,以大於該掃描半徑R之邊界半徑R’為半徑畫出圓形之邊界A,且較佳係於該邊界A上依順時針方向依序等距設置8個標記邊界點14a、14b、14c、14d、14e、14f、14g、14h’使得任一標記邊界點14皆係位於該擴張核心點12之掃描範圍外,該標記邊界點14與該擴張核心點12之距離係大於該掃描半徑參數R,且小於該掃描半徑參數R之兩倍。接著,再以該標記邊界點14a、14b、14c、14d、14e、14f、14g、14h進行掃描,舉例而言,先以該標記邊界點14a進行掃描,由該數個鄰近資料點13中,選取位於該標記邊界點14a之掃描範圍內,且最靠近該標記邊界點14a之鄰近資料點13作為該擴張種子點15,該標記邊界點14b、14c、14d、14e、14f、14g、14h亦以相同方式進行掃描,並分別選取對應之鄰近資料點13作為該擴張種子點15。如第4圖所示,該標記邊界點14a、14b、14c、14d、14f、14g、14h皆有相對應之擴張種子點15,分別為擴張種子點15a、15b、15c、15d、15f、15g、15h,僅該標記邊界點14e無對應之擴張種子點15。Referring to FIGS. 1 and 4, the boundary point marking step S4 of the data grouping method according to the preferred embodiment of the present invention sets a plurality of marking boundary points 14 outside the scanning range of the expanding core point 12, and then respectively selects each of the marking boundary points. The adjacent data points 13 closest to each of the mark boundary points 14 are within the scanning range of the mark boundary point 14 as the expanded seed point 15, and a screening step S5 is performed after completion. For example, in this embodiment, the extended core point 12 is centered, and the boundary A of the circle is drawn with a radius larger than the boundary radius R′ of the scanning radius R, and preferably is clockwise on the boundary A. The direction is equidistantly set with 8 mark boundary points 14a, 14b, 14c, 14d, 14e, 14f, 14g, 14h' such that any mark boundary point 14 is located outside the scan range of the expanded core point 12, the mark boundary The distance between the point 14 and the expanded core point 12 is greater than the scan radius parameter R and less than twice the scan radius parameter R. Then, scanning is performed by the mark boundary points 14a, 14b, 14c, 14d, 14e, 14f, 14g, 14h. For example, the mark boundary point 14a is first scanned, and among the plurality of adjacent data points 13, A neighboring data point 13 located within the scanning range of the marking boundary point 14a and closest to the marking boundary point 14a is selected as the expanded seed point 15, and the marking boundary points 14b, 14c, 14d, 14e, 14f, 14g, 14h are also The scanning is performed in the same manner, and the corresponding adjacent data points 13 are respectively selected as the expanded seed points 15. As shown in Fig. 4, the mark boundary points 14a, 14b, 14c, 14d, 14f, 14g, 14h have corresponding expanded seed points 15, which are expanded seed points 15a, 15b, 15c, 15d, 15f, 15g, respectively. At 15h, only the marker boundary point 14e has no corresponding expanded seed point 15.
請再參照第1及4圖所示,一般而言,靠近該擴張核心點12之資料點11的掃描範圍與該擴張核心點12之掃描範圍重複比例較高,又由於該擴張核心點12內之資料點11數量已大於該最小包含點之參數,因此靠近該擴張核心點12之資料點11為雜訊點之機會通常較低,若將較靠近該擴張核心點12之資料點11納入擴張種子點15,則將增加搜尋的時間成本。而本發明之邊界點標記步驟S4中,該標記邊界點14係位於該擴張核心點12之掃描範圍外,因此當該些標記邊界點14進行掃描時,可篩選去除掉較靠近該擴張核心點12之資料點11,可有效降低搜尋之時間成本。Referring again to FIGS. 1 and 4, in general, the scanning range of the data point 11 near the expanded core point 12 is higher than the scanning range of the expanded core point 12, and since the expanded core point 12 is The number of data points 11 is greater than the minimum inclusion point parameter, so the chance that the data point 11 near the expansion core point 12 is a noise point is generally lower, if the data point 11 closer to the expansion core point 12 is included in the expansion Seed point 15, which will increase the time cost of the search. In the boundary point marking step S4 of the present invention, the marking boundary point 14 is located outside the scanning range of the expanding core point 12, so when the marking boundary points 14 are scanned, the screening can be removed to be closer to the expanding core point. The data point 11 of 12 can effectively reduce the time cost of the search.
請參照第1、4及5圖所示,本發明較佳實施例之資料分群方法之篩選步驟S5係判斷該邊界A上是否任連續數個標記邊界點14皆分別有對應之擴張種子點15,若判斷為「是」,則刪除該連續數個標記邊界點14所對應之其中一擴張種子點15,並將剩餘之擴張種子點15加入一種子列表中,再進行一種子選擇步驟S6;若判斷為「否」,則將該擴張種子點15加入該種子列表中,再進行該種子選擇步驟S6。更詳言之,以第4圖為例,本實施例係選擇由該標記邊界點14a起,依順時針方向起算連續四個標記邊界點14a、14b、14c、14d皆有相對應之擴張種子點15a、15b、15c、15d,並刪除該擴張種子點15a、15b、15c、15d之其中一個。如第5圖所示,該四個擴張種子點15a、15b、15c、15d中,位於中間之擴張種子點15b之掃描範圍僅面積較小之區域151沒被該擴張種子點15a、15c之掃描範圍覆蓋到;而位於中間之另一擴張種子點15c之掃描範圍亦僅面積較小之區域152沒被該擴張種子點15b、15d之掃描範圍覆蓋到。因此,若刪除該擴張種子點15b、15c中之其中一個,則可以降低搜尋之時間成本,亦可保有相當高之分群準確度。本實施例係選擇將最靠近該連續四個擴張種子點15a、15b、15c、15d中第一個擴張種子點15a之擴張種子點15b刪除。Referring to Figures 1, 4 and 5, the screening step S5 of the data grouping method according to the preferred embodiment of the present invention determines whether any consecutive number of marking boundary points 14 on the boundary A have corresponding expanded seed points 15 respectively. If the determination is YES, delete one of the consecutive number of marker boundary points 14 corresponding to the expansion seed point 15, and add the remaining expansion seed point 15 to a sub-list, and then perform a sub-selection step S6; If the determination is "NO", the expanded seed point 15 is added to the seed list, and the seed selection step S6 is performed. More specifically, taking FIG. 4 as an example, the present embodiment selects from the mark boundary point 14a, and successively four consecutive mark boundary points 14a, 14b, 14c, and 14d have corresponding expansion seeds in a clockwise direction. Points 15a, 15b, 15c, 15d are deleted and one of the expanded seed points 15a, 15b, 15c, 15d is deleted. As shown in Fig. 5, among the four expanded seed points 15a, 15b, 15c, 15d, the scanning area of the expanded seed point 15b located in the middle is only scanned by the expanded seed points 15a, 15c. The range is covered; and the scanning range of the other expanded seed point 15c located in the middle is also only the area 152 having a smaller area is not covered by the scanning range of the expanded seed points 15b, 15d. Therefore, if one of the expanded seed points 15b, 15c is deleted, the time cost of the search can be reduced, and a relatively high grouping accuracy can be maintained. This embodiment selects to delete the expanded seed point 15b closest to the first expanded seed point 15a of the successive four expanded seed points 15a, 15b, 15c, 15d.
請參照第1、6及7圖所示,在刪除該擴張種子點15b後,仍然有連續四個標記邊界點14f、14g、14h、14a皆分別有對應之擴張種子點15f、15g、15h、15a,因此亦以相同方式,刪除位於中間之擴張種子點15g、15h之其中一個,於此,由於該位於中間之擴張種子點15g較靠近該連續四個擴張種子點15f、15g、15h、15a中第一個擴張種子點15f,本實施例係選擇刪除該擴張種子點15g,如此,便可如第7圖所示,該擴張種子點15可由原本的7個減少為5個,可降低擴張種子點15之數量。於確認該邊界A上並無任連續四個標記邊界點14皆有對應的擴張種子點15的情況後,便將剩餘之擴張種子點15a、15c、15d、15f、15h加入該種子列表中,再進行該種子選擇步驟S6。如此,便可於維持高分群準確度之前提下,有效降低擴張種子點15之數量,進而降低後續搜尋之時間成本。Referring to Figures 1, 6 and 7, after deleting the expanded seed point 15b, there are still four consecutive mark boundary points 14f, 14g, 14h, 14a respectively having corresponding expanded seed points 15f, 15g, 15h, 15a, therefore, in the same manner, one of the expanded seed points 15g, 15h located in the middle is deleted, since the intermediate expanded seed point 15g is closer to the consecutive four expanded seed points 15f, 15g, 15h, 15a In the first expansion seed point 15f, in this embodiment, the expansion seed point 15g is selected to be deleted. Thus, as shown in Fig. 7, the expanded seed point 15 can be reduced from the original 7 to 5, which can reduce the expansion. The number of seed points 15 . After confirming that there are no consecutive four mark boundary points 14 on the boundary A with corresponding expanded seed points 15, the remaining expanded seed points 15a, 15c, 15d, 15f, 15h are added to the seed list. This seed selection step S6 is performed again. In this way, it is possible to reduce the number of expanded seed points 15 before the high-segment accuracy is maintained, thereby reducing the time cost of subsequent searches.
請參照第1及7圖所示,本發明較佳實施例之資料分群方法之種子選擇步驟S6以該種子列表中之所有擴張種子點15分別作為該擴張核心點12,並一一進行該擴張詢問步驟S2,直至所有擴張種子點15皆已作為該擴張核心點12進行該擴張詢問步驟S2,則進行該第一判斷步驟S7。Referring to Figures 1 and 7, the seed selection step S6 of the data grouping method of the preferred embodiment of the present invention uses the expanded seed points 15 in the seed list as the expansion core points 12, respectively, and performs the expansion one by one. The first determining step S7 is performed by inquiring step S2 until all of the expanded seed points 15 have performed the expansion inquiring step S2 as the expanded core point 12.
請參照第1及2圖所示,本發明較佳實施例之資料分群方法之第一判斷步驟S7係判斷是否所有資料點11皆已被分群或者被定義為雜訊,若判斷為「否」,則進行一讀取步驟S8;若判斷為「是」則終止。更詳言之,若所有資料點11皆已被分群或者被定義為雜訊,則終止,便可獲得該數個資料點11之分群結果。Referring to FIGS. 1 and 2, the first determining step S7 of the data grouping method according to the preferred embodiment of the present invention determines whether all the data points 11 have been grouped or defined as noise, and if the determination is "No". Then, a reading step S8 is performed; if the determination is "YES", the processing is terminated. More specifically, if all the data points 11 have been grouped or defined as noise, then the grouping results of the plurality of data points 11 can be obtained.
請參照第1及2圖所示,本發明較佳實施例之資料分群方法之讀取步驟S8係由該資料集內讀取另一筆資料點11,並進行一第二判斷步驟S9。Referring to FIGS. 1 and 2, the reading step S8 of the data grouping method according to the preferred embodiment of the present invention reads another data point 11 from the data set and performs a second determining step S9.
請參照第1及2圖所示,本發明較佳實施例之資料分群方法之第二判斷步驟S9,係判斷該讀取步驟S8中所讀取之資料點11是否未被分群,且未被判斷為雜訊,若判斷為「是」,則以該筆資料點11作為該擴張核心點12進行該擴張詢問步驟S2;若判斷為「否」,則重新進行該讀取步驟S8。Referring to FIGS. 1 and 2, the second determining step S9 of the data grouping method according to the preferred embodiment of the present invention determines whether the data point 11 read in the reading step S8 is not grouped, and is not If it is judged as noise, if the determination is YES, the expansion request step S2 is performed using the data point 11 as the expansion core point 12; if the determination is "NO", the reading step S8 is re-executed.
如上所述,本發明之資料分群方法藉由於該擴張核心點12之掃描範圍外設置該數個標記邊界點14,可篩選去除太過靠近該擴張核心點12的鄰近資料點13,可減少後續搜尋的時間成本;再且,本發明透過將任連續數個標記邊界點14所對應之其中一個擴張種子點15刪除,以於維持高分群準確度之前提下,減少擴張種子點的數目,進而降低後續搜尋的時間成本。As described above, the data grouping method of the present invention can remove the adjacent data points 13 which are too close to the expansion core point 12 by setting the plurality of mark boundary points 14 outside the scanning range of the expansion core point 12, thereby reducing the subsequent The time cost of the search; furthermore, the present invention removes one of the expanded seed points 15 corresponding to any of the consecutive number of mark boundary points 14 to reduce the number of expanded seed points before maintaining high group accuracy. Reduce the time cost of subsequent searches.
雖然本發明已利用上述較佳實施例揭示,然其並非用以限定本發明,任何熟習此技藝者在不脫離本發明之精神和範圍之內,相對上述實施例進行各種更動與修改仍屬本發明所保護之技術範疇,因此本發明之保護範圍當視後附之申請專利範圍所界定者為準。While the invention has been described in connection with the preferred embodiments described above, it is not intended to limit the scope of the invention. The technical scope of the invention is protected, and therefore the scope of the invention is defined by the scope of the appended claims.
11...資料點11. . . Data point
12...擴張核心點12. . . Expansion core point
13...鄰近資料點13. . . Proximity data point
14...標記邊界點14. . . Mark boundary point
14a至14b...標記邊界點14a to 14b. . . Mark boundary point
15...擴張種子點15. . . Expanded seed point
15a至15d...擴張種子點15a to 15d. . . Expanded seed point
15f至15h...擴張種子點15f to 15h. . . Expanded seed point
A...邊界A. . . boundary
第1圖:本發明較佳實施例之資料分群方法的流程圖Figure 1 is a flow chart of a data grouping method in accordance with a preferred embodiment of the present invention
第2圖:本發明較佳實施例之參數設定步驟的示意圖Figure 2: Schematic diagram of the parameter setting steps of the preferred embodiment of the present invention
第3圖:本發明較佳實施例之擴張詢問步驟的示意圖Figure 3: Schematic diagram of the expansion query step of the preferred embodiment of the present invention
第4圖:本發明較佳實施例之邊界點標記步驟的示意圖Figure 4: Schematic diagram of the boundary point marking step of the preferred embodiment of the present invention
第5圖:本發明較佳實施例之篩選步驟的示意圖Figure 5: Schematic diagram of the screening step of the preferred embodiment of the present invention
第6圖:本發明較佳實施例之篩選步驟的示意圖Figure 6 is a schematic illustration of the screening steps of a preferred embodiment of the invention
第7圖:本發明較佳實施例完成該篩選步驟的示意圖Figure 7 is a schematic view of the preferred embodiment of the present invention to complete the screening step
12...擴張核心點12. . . Expansion core point
13...鄰近資料點13. . . Proximity data point
14...標記邊界點14. . . Mark boundary point
14a至14b...標記邊界點14a to 14b. . . Mark boundary point
15...擴張種子點15. . . Expanded seed point
15a至15d...擴張種子點15a to 15d. . . Expanded seed point
15f至15h...擴張種子點15f to 15h. . . Expanded seed point
Claims (8)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW98110848A TWI396103B (en) | 2009-04-01 | 2009-04-01 | Method for data clustering |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW98110848A TWI396103B (en) | 2009-04-01 | 2009-04-01 | Method for data clustering |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| TW201037534A TW201037534A (en) | 2010-10-16 |
| TWI396103B true TWI396103B (en) | 2013-05-11 |
Family
ID=44856710
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| TW98110848A TWI396103B (en) | 2009-04-01 | 2009-04-01 | Method for data clustering |
Country Status (1)
| Country | Link |
|---|---|
| TW (1) | TWI396103B (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TWI463341B (en) * | 2012-12-22 | 2014-12-01 | Univ Nat Pingtung Sci & Tech | Data clustering method based on density |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030144994A1 (en) * | 2001-10-12 | 2003-07-31 | Ji-Rong Wen | Clustering web queries |
| TW200741533A (en) * | 2006-04-26 | 2007-11-01 | Univ Nat Pingtung Sci & Tech | Method of fast intuitive data clustering |
| TW200743018A (en) * | 2006-05-12 | 2007-11-16 | Univ Nat Pingtung Sci & Tech | Data grouping processing method |
| US20080205776A1 (en) * | 2007-02-27 | 2008-08-28 | Gal Shafirstein | Image processing apparatus and method for histological analysis |
-
2009
- 2009-04-01 TW TW98110848A patent/TWI396103B/en not_active IP Right Cessation
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030144994A1 (en) * | 2001-10-12 | 2003-07-31 | Ji-Rong Wen | Clustering web queries |
| TW200741533A (en) * | 2006-04-26 | 2007-11-01 | Univ Nat Pingtung Sci & Tech | Method of fast intuitive data clustering |
| TW200743018A (en) * | 2006-05-12 | 2007-11-16 | Univ Nat Pingtung Sci & Tech | Data grouping processing method |
| US20080205776A1 (en) * | 2007-02-27 | 2008-08-28 | Gal Shafirstein | Image processing apparatus and method for histological analysis |
Non-Patent Citations (2)
| Title |
|---|
| 劉志偉,KIDBSCAN:一個有效率的資料分群演算法,屏東科技大學資訊管理系碩士論文,2006年。 * |
| 吳宗諭,一種以密度為基礎之改良型快速分群演算法,國立台灣科技大學自動化及控制研究所碩士論文,2007年。 * |
Also Published As
| Publication number | Publication date |
|---|---|
| TW201037534A (en) | 2010-10-16 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| TWI385544B (en) | Density-based data clustering method | |
| CN108920462B (en) | Map-based POI retrieval method and device | |
| CN102693266B (en) | Search for method, the navigation equipment and method of generation index structure of database | |
| CN112686902B (en) | A Two-Stage Computational Method for the Identification and Segmentation of Midbrain Gliomas in Magnetic Resonance Imaging | |
| CN110490893B (en) | Rapid Euclidean distance point cloud segmentation method | |
| CN109872396B (en) | Rapid cross-section contour generation method suitable for triangular mesh model | |
| WO2014109127A1 (en) | Index generating device and method, and search device and search method | |
| TWI391837B (en) | Data clustering method based on density | |
| TWI460680B (en) | Data clustering method based on density | |
| CN107515931B (en) | A Cluster-Based Duplicate Data Detection Method | |
| CN103714153A (en) | Density clustering method based on limited area data sampling | |
| CN103955514A (en) | Image feature indexing method based on Lucene inverted index | |
| CN104820840B (en) | The arest neighbors hyperspectral image classification method recombinated based on dictionary and wave band | |
| CN103679195A (en) | Method and system for classifying texture images on basis of local edge pattern | |
| JP4937395B2 (en) | Feature vector generation apparatus, feature vector generation method and program | |
| CN106815824A (en) | A kind of image neighbour's optimization method for improving extensive three-dimensional reconstruction efficiency | |
| CN105279524A (en) | High-dimensional data clustering method based on unweighted hypergraph segmentation | |
| CN103617163A (en) | Quick target association method based on clustering analysis | |
| CN103218808A (en) | Method for tracking binary image profile, and device thereof | |
| TWI396103B (en) | Method for data clustering | |
| CN110097636B (en) | Site selection planning method based on visual field analysis | |
| CN103164487A (en) | Clustering algorithm based on density and geometrical information | |
| Boumaiza et al. | Symbol recognition using a galois lattice of frequent graphical patterns | |
| JP6705764B2 (en) | Generation device, generation method, and generation program | |
| JP3938815B2 (en) | Node creation method, image search method, and recording medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| MM4A | Annulment or lapse of patent due to non-payment of fees |