[go: up one dir, main page]

TWI470545B - Apparatus,processor,system,method,instruction,and logic for performing range detection - Google Patents

Apparatus,processor,system,method,instruction,and logic for performing range detection Download PDF

Info

Publication number
TWI470545B
TWI470545B TW98136966A TW98136966A TWI470545B TW I470545 B TWI470545 B TW I470545B TW 98136966 A TW98136966 A TW 98136966A TW 98136966 A TW98136966 A TW 98136966A TW I470545 B TWI470545 B TW I470545B
Authority
TW
Taiwan
Prior art keywords
range
vector
input
logic
complex
Prior art date
Application number
TW98136966A
Other languages
Chinese (zh)
Other versions
TW201030607A (en
Inventor
Asaf Hargil
Evgeny Fiksman
Artiom Myaskouvskey
Doron Orenstein
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp filed Critical Intel Corp
Publication of TW201030607A publication Critical patent/TW201030607A/en
Application granted granted Critical
Publication of TWI470545B publication Critical patent/TWI470545B/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/17Function evaluation by approximation methods, e.g. inter- or extrapolation, smoothing, least mean square method
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/57Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Complex Calculations (AREA)
  • Executing Machine-Instructions (AREA)
  • Advance Control (AREA)
  • Length Measuring Devices With Unspecified Measuring Means (AREA)

Description

用以執行範圍檢測之設備,處理器,系統,方法,指令,及邏輯Apparatus, processor, system, method, instruction, and logic for performing range detection

本發明之具體實施例大致上有關資訊處理之領域,且更特別地是有關在計算系統及微處理器中執行範圍檢測之領域。The specific embodiments of the present invention are generally related to the field of information processing, and more particularly to the field of performing range detection in computing systems and microprocessors.

電腦硬體、諸如微處理器中之數學函數的性能可視於一些位置、諸如快取記憶體或主記憶體中所儲存之查詢表(LUTs)的使用而定。單一指令多數資料(SIMD)指令可執行多數記憶體操作,以當執行數學函數時於硬體中存取LUTs。譬如,對於若干輸入運算元之每一個,執行一基於該等輸入運算元之函數的SIMD指令可存取一LUT,以便對該SIMD函數獲得一結果輸出,因為一些處理器架構不提供對若干LUTs的平行存取,但反之使用該相同的記憶體存取邏輯,以存取一或多個LUTs,這些LUT存取可串連地發生,而非一平行之方式,藉此限制執行該SIMD函數之性能。The performance of computer hardware, such as mathematical functions in a microprocessor, can be determined by the use of some locations, such as cache memory or lookup tables (LUTs) stored in the main memory. A single instruction majority (SIMD) instruction can perform most memory operations to access LUTs in hardware when executing mathematical functions. For example, for each of a number of input operands, a SIMD instruction executing a function based on the input operands can access a LUT to obtain a result output for the SIMD function, as some processor architectures do not provide for a number of LUTs. Parallel access, but conversely using the same memory access logic to access one or more LUTs, these LUT accesses may occur in series, rather than in a parallel manner, thereby limiting execution of the SIMD function Performance.

數學函數可在一些演算法中使用曲線方程或其他以多項式為基礎之技術被評估。於一些先前技藝範例中,被用於評估數學函數之曲線方程函數需要多數軟體操作,以執行目標,像範圍檢測、係數匹配、及多項式計算。曲線方程之使用以評估數學函數可因此為計算密集及在性能中相當低的,如此限制曲線方程計算於電腦程式中之有用性。Mathematical functions can be evaluated in some algorithms using curve equations or other polynomial-based techniques. In some prior art examples, the curve equation function used to evaluate mathematical functions requires a majority of software operations to perform targets such as range detection, coefficient matching, and polynomial calculations. The use of curve equations to evaluate mathematical functions can therefore be computationally intensive and relatively low in performance, thus limiting the usefulness of curve equation calculations in computer programs.

【發明內容及實施方式】SUMMARY OF THE INVENTION AND EMBODIMENT

本發明之具體實施例可被用來改善微處理器及電腦中之數學計算性能。於一些具體實施例中,曲線方程計算可在比一些先前技藝曲線方程計算較大之性能層次被用來執行各種數學運算。於至少一具體實施例中,曲線方程計算性能可藉由加速執行曲線方程計算中所涉及的最費時及耗資源操作之至少一個而被改善。於一具體實施例中,一範圍檢測指令及對應的硬體邏輯被提供,以在曲線方程內加速範圍之檢測,其對應於曲線方程計算中所使用之各種多項式。Particular embodiments of the present invention can be used to improve the mathematical performance of microprocessors and computers. In some embodiments, the curve equation calculations can be used to perform various mathematical operations at a higher performance level than some prior art curve equations. In at least one embodiment, the curve equation calculation performance can be improved by accelerating at least one of the most time consuming and resource consuming operations involved in the calculation of the curve equation. In one embodiment, a range detection command and corresponding hardware logic are provided to detect the acceleration range within the curve equation, which corresponds to the various polynomials used in the calculation of the curve equation.

圖1說明一微處理器,其中本發明之至少一具體實施例可被使用。特別地是,圖1說明具有一或更多處理器核心105及110之微處理器100,每一處理器核心分別與一本地快取記憶體107及113有關聯。亦在圖1中說明者係一共享快取記憶體115,其可儲存該等本地快取記憶體107及113的每一個中所儲存之至少部份資訊的版本。於一些具體實施例中,微處理器100亦可包括在圖1未示出之另一邏輯,諸如一整合型記憶體控制器、整合型繪圖控制器、以及另一邏輯以在一電腦系統內執行其他函數、諸如輸入/輸出控制。於一具體實施例中,多處理器系統中之每一微處理器或多核心處理器中之每一處理器核心可包括邏輯119或以別的方式為與邏輯119有關聯,以根據一具體實施例回應於一指令執行範圍檢測。Figure 1 illustrates a microprocessor in which at least one embodiment of the present invention can be utilized. In particular, Figure 1 illustrates a microprocessor 100 having one or more processor cores 105 and 110, each associated with a local cache memory 107 and 113, respectively. Also illustrated in FIG. 1 is a shared cache memory 115 that stores a version of at least a portion of the information stored in each of the local cache memories 107 and 113. In some embodiments, the microprocessor 100 can also include another logic not shown in FIG. 1, such as an integrated memory controller, an integrated graphics controller, and another logic in a computer system. Perform other functions such as input/output control. In one embodiment, each of the microprocessors or multi-core processors in the multi-processor system may include logic 119 or otherwise associated with logic 119 to Embodiments respond to an instruction execution range detection.

圖2譬如說明一前側匯流排(FSB)電腦系統,其中本發明的一具體實施例可被使用。在該等處理器核心223、227、233、237、243、247、253、257之一內或以別的方式與該處理器核心有關聯,任何處理器201、205、210或215可由任何本地一階(L1)快取記憶體220、225、230、235、240、245、250、255存取資訊。再者,任何處理器201、205、210或215可由共享二階(L2)快取記憶體203、207、213、217之任一個或由系統記憶體260經由晶片組265存取資訊。圖2中之處理器的一或多個可包括邏輯219或以別的方式與邏輯219有關聯,以根據一具體實施例執行一範圍檢測指令。2 illustrates a front side busbar (FSB) computer system in which a particular embodiment of the present invention can be used. In one of the processor cores 223, 227, 233, 237, 243, 247, 253, 257 or otherwise associated with the processor core, any processor 201, 205, 210 or 215 can be any local The first-order (L1) cache memory 220, 225, 230, 235, 240, 245, 250, 255 accesses information. Moreover, any processor 201, 205, 210 or 215 can access information via any of the shared second-order (L2) caches 203, 207, 213, 217 or from system memory 260 via chipset 265. One or more of the processors in FIG. 2 may include logic 219 or otherwise be associated with logic 219 to perform a range of detection instructions in accordance with a particular embodiment.

除了圖2所說明之FSB電腦系統以外,其他系統組構可會同本發明之各種具體實施例被使用,包括點對點(P2P)互連系統及環互連系統。圖3之P2P系統譬如可包括數個處理器,且譬如僅只顯示其中之二處理器370、380。處理器370、380之每一個可包括一本地記憶體控制器集線器(MCH)372、382,以與記憶體32、34連接。處理器370、380可經由點對點(PtP)介面350使用PtP介面電路378、388交換資料。處理器370、380之每一個可與一晶片組390經由個別PtP介面352、354使用點對點介面電路376、394、386、398交換資料。晶片組390亦可與一高性能繪圖電路338經由一高性能繪圖介面339交換資料。本發明之具體實施例可為位在具有任何數目之處理核心的任何處理器內、或在圖3之每一PtP匯流排代理器內。於一具體實施例中,任何處理器核心可包括一本地快取記憶體(未示出)或以別的方式與該本地快取記憶體有關聯。再者,一共享快取記憶體(未示出)可被包括於兩處理器外面之任一處理器中,又經由p2p互連與該等處理器連接,使得如果一處理器被放置進入一低功率模式,該任一個或兩處理器之本地快取記憶體資訊可被儲存於該共享快取記憶體中。圖3中之處理器或核心的一或多個可包括邏輯319或以別的方式與邏輯319有關聯,以根據一具體實施例執行一範圍檢測指令。In addition to the FSB computer system illustrated in Figure 2, other system configurations can be utilized with various embodiments of the present invention, including point-to-point (P2P) interconnect systems and ring interconnect systems. The P2P system of FIG. 3 can include, for example, a number of processors, and for example, only two of the processors 370, 380 are shown. Each of the processors 370, 380 can include a local memory controller hub (MCH) 372, 382 for connection to the memory 32, 34. Processors 370, 380 can exchange data using PtP interface circuitry 378, 388 via point-to-point (PtP) interface 350. Each of the processors 370, 380 can exchange data with a chipset 390 via the individual PtP interfaces 352, 354 using point-to-point interface circuits 376, 394, 386, 398. The chipset 390 can also exchange data with a high performance graphics circuit 338 via a high performance graphics interface 339. Particular embodiments of the invention may be located in any processor having any number of processing cores, or within each PtP bus agent of FIG. In one embodiment, any processor core may include or otherwise be associated with a local cache memory (not shown). Furthermore, a shared cache memory (not shown) can be included in any processor external to both processors and connected to the processors via a p2p interconnect such that if a processor is placed into a processor In the low power mode, the local cache memory information of the one or two processors can be stored in the shared cache memory. One or more of the processors or cores of FIG. 3 may include logic 319 or otherwise be associated with logic 319 to perform a range of detection instructions in accordance with a particular embodiment.

曲線方程計算能夠否定使用查詢表(LUTs)及與其有關聯之昂貴記憶體存取的需要。圖4譬如說明一階曲線方程函數。於圖4中,讓“X”係8元素輸入向量,其元素包括資料,該向量X中之256位元,“Xin”每一個藉由32位元所代表。用於任何給定輸入“Xin”,該曲線方程函數之向量Y的元素“Yout”可導致一向量W=Y(X)。該向量W之元素可使用包括範圍檢測、係數匹配、及多項式計算的曲線方程計算操作而被評估。至少一具體實施例包括一指令及邏輯,以於評估該曲線方程函數中執行範圍檢測。於一些具體實施例中,向量X之元素尺寸可為8位元,然而,於其他具體實施例中,它們可為16位元、32位元、64位元、128位元等。再者,於一些具體實施例中,X之元素可為整數、浮點數值、單一或雙精確度浮點數值等。Curve equation calculations negate the need to use lookup tables (LUTs) and the expensive memory access associated with them. Figure 4 illustrates the first-order curve equation function. In Figure 4, let "X" be a 8 element input vector whose elements include data, 256 bits in the vector X, and "Xin" each represented by 32 bits. For any given input "Xin", the element "Yout" of the vector Y of the curve equation function may result in a vector W = Y(X). The elements of the vector W can be evaluated using a curve equation calculation operation including range detection, coefficient matching, and polynomial calculation. At least one embodiment includes an instruction and logic to evaluate execution range detection in the curve equation function. In some embodiments, the element size of vector X can be 8 bits, however, in other embodiments, they can be 16 bits, 32 bits, 64 bits, 128 bits, and the like. Moreover, in some embodiments, the elements of X may be integers, floating point values, single or double precision floating point values, and the like.

於一具體實施例中,範圍檢測邏輯可包括解碼及執行邏輯,以執行具有一指令格式之範圍檢測指令,及控制領域以執行該表示式,“範圍向量(R)=範圍_檢測(輸入向量(X),範圍限制向量(RL))”,在此R係一藉由圖5中所敘述之邏輯所產生的範圍向量,X係該輸入向量,且RL係包含該曲線方程函數之每一範圍的第一Xin之向量。譬如,於一具體實施例中,該向量RL包含圖4之每一範圍的第一Xin(0,10,30,50,70,80,255),於一些順序中對應於該輸入向量X。In a specific embodiment, the range detection logic can include decoding and execution logic to execute a range detection instruction having an instruction format, and control the field to perform the representation, "Range Vector (R) = Range_Detection (Input Vector) (X), Range Limit Vector (RL)), where R is a range vector generated by the logic described in Figure 5, X is the input vector, and RL is each of the curve equation functions The vector of the first Xin of the range. For example, in one embodiment, the vector RL includes a first Xin (0, 10, 30, 50, 70, 80, 255) for each range of FIG. 4, corresponding to the input vector X in some order.

於一具體實施例中,根據該輸入向量X內所提供之每一輸入點,範圍檢測匹配圖4中所說明之曲線方程函數的一特定範圍,且將該結果儲存於SIMD暫存器中。以下之範例顯示一輸入向量X及一對應於圖4中所敘述之曲線方程的範圍檢測器向量。該給定之範例敘述在16位元固定點輸入上之操作;然而相同之技術係可適用於8、32位元固定及浮點數值,以及用於目前及未來向量延伸中所使用之不同資料型式。In one embodiment, a range of matching curve function functions illustrated in FIG. 4 is detected based on each input point provided within the input vector X, and the result is stored in a SIMD register. The following example shows an input vector X and a range detector vector corresponding to the curve equation described in FIG. The given example describes the operation on a 16-bit fixed-point input; however, the same technique can be applied to 8- and 32-bit fixed and floating-point values, as well as different data used in current and future vector extensions. Type.

讓X為以下之輸入向量,在此每一元素沿著圖4之x軸包含一Xin值:Let X be the following input vector, where each element contains a Xin value along the x-axis of Figure 4:

基於上面之輸入向量X與圖4中所描述之曲線方程,該範圍檢測向量將包含以下:Based on the above input vector X and the curve equation depicted in Figure 4, the range detection vector will contain the following:

於一具體實施例中,一指令可被執行,以藉由根據圖4之曲線方程在該輸入向量上之操作產生上面之範圍檢測向量。於一具體實施例中,該指令造成該等輸入向量元素將與該等範圍限制(圖4中之0,10,30,50,70,80)的每一個比較。於一具體實施例中,每一範圍限制可被傳播至SIMD暫存器及與該輸入向量X比較。於一具體實施例中,在此比較操作導致0或-1,以指示該比較之結果,該等比較結果之減去及累積產生該曲線方程之範圍,其中該輸入向量X中之每一輸入點被包含。執行該等比較操作之邏輯被說明在圖5中,在此xi 標示一在輸入向量X內之輸入點,ti 敘述圖4之曲線方程的範圍限制,且ri 敘述範圍檢測向量R內之結果的範圍,對應於輸入點xi 。於其他具體實施例中,該比較操作可導致其他值(例如1及0),其可使用該等比較值之比較、相加或減去、及累積而被執行,以產生範圍檢測向量R。In one embodiment, an instruction can be executed to generate the above range detection vector by operation on the input vector according to the curve equation of FIG. In one embodiment, the instructions cause the input vector elements to be compared to each of the range limits (0, 10, 30, 50, 70, 80 in Figure 4). In one embodiment, each range limit can be propagated to and compared to the SIMD register. In a specific embodiment, the comparison operation results in 0 or -1 to indicate the result of the comparison, and the subtraction and accumulation of the comparison results yields a range of the curve equation, wherein each input of the input vector X Points are included. The logic for performing these comparison operations is illustrated in Figure 5, where x i indicates an input point within the input vector X, t i describes the range limit of the curve equation of Figure 4, and r i describes the range detection vector R The range of results corresponds to the input point x i . In other embodiments, the comparison operation may result in other values (eg, 1 and 0) that may be performed using the comparison, addition or subtraction, and accumulation of the comparison values to produce a range detection vector R.

圖5a說明邏輯,其可根據一具體實施例被用來回應於執行一範圍檢測指令而產生一範圍檢測向量R。於一具體實施例中,邏輯500a包括輸入向量X501a,其藉由比較邏輯505a與範圍限制向量510a比較,其在每一元素中包括該曲線方程範圍之範圍限制對應於該輸入向量X之第“i”個元素。於一具體實施例中,輸入向量501a的一元素係藉由比較邏輯505a與範圍限制向量510a之對應元素比較。於一具體實施例中,零向量515a之元素係將517a加至輸入向量501a及範圍限制向量510a之比較結果的負值,以於該比較結果之結果的每一元素中產生0或-1。該輸入向量501a係接著與範圍限制向量520a之對應元素比較,其負的結果被加至該先前之比較結果。對於範圍限制向量510a之每一元素持續此過程,於範圍檢測向量525a中告終。Figure 5a illustrates logic that may be used to generate a range detection vector R in response to execution of a range detection instruction in accordance with a particular embodiment. In one embodiment, logic 500a includes an input vector X501a that is compared to range limit vector 510a by comparison logic 505a, which includes a range limit for each curve element corresponding to the range of the input vector X. i" elements. In one embodiment, an element of input vector 501a is compared by a comparison element 505a with a corresponding element of range limit vector 510a. In one embodiment, the element of zero vector 515a adds 517a to the negative of the comparison of input vector 501a and range limit vector 510a to produce 0 or -1 in each element of the result of the comparison. The input vector 501a is then compared to the corresponding element of the range limit vector 520a, and the negative result is added to the previous comparison result. This process continues for each element of the range limit vector 510a, ending in the range detection vector 525a.

於一具體實施例中,圖5a之邏輯可會同一程式使用至少一指令集架構而被使用,並藉由以下之虛擬碼所說明:In one embodiment, the logic of FIG. 5a may be used by the same program using at least one instruction set architecture and illustrated by the following virtual code:

用於決定範圍檢測向量R之其他技術可被使用於其他具體實施例中,包括邏輯,以在該等範圍限制向量元素上執行二進位搜尋。圖5b說明二進位搜尋樹枝狀圖,根據一具體實施例,其可被用來產生範圍檢測向量R。於圖5b之二進位搜尋樹枝狀圖500b中,輸入向量X501b之每一元素係與該範圍限制向量之每一元素510b比較,在一中間向量元素(T4,在該8元素輸入與範圍限制向量之案例中)開始及持續至每一半向量(T5-T8、及T3-T1)。於一具體實施例中,以下之虛擬碼說明圖5b之二進位搜尋樹枝狀圖的作用,並使用來自一指示集架構之指令。Other techniques for determining the range detection vector R can be used in other embodiments, including logic to perform binary search on the range of restricted vector elements. Figure 5b illustrates a binary search dendrogram, which may be used to generate a range detection vector R, according to a particular embodiment. In the binary search bar graph 500b of FIG. 5b, each element of the input vector X501b is compared with each element 510b of the range restriction vector, in an intermediate vector element (T4, at the 8-element input and range limit vector In the case of the case, start and continue to each half vector (T5-T8, and T3-T1). In one embodiment, the following virtual code illustrates the effect of the binary search for the dendrogram of Figure 5b and uses instructions from an indicator set architecture.

於上面之虛擬碼中,T代表該範圍限制向量,I代表該輸入向量X及範圍限制向量T之第i個元素。In the above virtual code, T represents the range limit vector, and I represents the i-th element of the input vector X and the range limit vector T.

於一具體實施例中,一指令及對應的邏輯被使用於產生範圍檢測向量R。當該範圍檢測向量R被決定時,能執行與評估該曲線方程函數有關聯之其他操作,該函數與所討論之特別數學運算有關聯,包括該係數匹配及多項式計算操作。In one embodiment, an instruction and corresponding logic are used to generate the range detection vector R. When the range detection vector R is determined, other operations associated with evaluating the curve equation function can be performed, the function being associated with the particular mathematical operation in question, including the coefficient matching and polynomial calculation operations.

於一具體實施例中,對應於圖4中之曲線方程的每一範圍之每一多項式具有一對應的係數。係數匹配將係數向量元素匹配至在本發明的一具體實施例中所產生之範圍檢測向量元素。於圖4中所說明之範例中,有六個範圍,其可藉由以下多項式所敘述:In one embodiment, each polynomial of each range corresponding to the equation of the curve in FIG. 4 has a corresponding coefficient. Coefficient matching matches coefficient vector elements to range detection vector elements produced in a particular embodiment of the invention. In the example illustrated in Figure 4, there are six ranges, which can be described by the following polynomial:

範圍1:y=2*x (0<=X<10)Range 1: y=2*x (0<=X<10)

範圍2:y=0*x+20 (10<=X<30)Range 2: y=0*x+20 (10<=X<30)

範圍3:y=-2*x+20 (30<=X<50)Range 3: y=-2*x+20 (30<=X<50)

範圍4:y=0*x-20 (50<=X<70)Range 4: y=0*x-20 (50<=X<70)

範圍5:y=2*x-20 (70<=X<80)Range 5: y=2*x-20 (70<=X<80)

範圍6:y=0 (80<=X<255)Range 6: y = 0 (80 <= X < 255)

係數匹配係基於該範圍檢測階段之結果。該結果之係數向量的數目等於該多項式最高次數+1。持續上面之範例,用於圖4中所敘述之輸入向量X的結果之係數向量C1 及C2 係在下文說明:The coefficient matching is based on the results of the range detection phase. The number of coefficient vectors for this result is equal to the highest number of times the polynomial is +1. Continuing the above example, the coefficient vectors C 1 and C 2 for the results of the input vector X described in FIG. 4 are described below:

上面範例中之所有多項式的次數為一,因此結果係數向量之數目為二。於一具體實施例中,該等C1 及C2 向量係使用一混合指令基於圖5a及5b中所敘述之範圍檢測階段的輸出計算,其在該二係數向量C1 及C2 之對應的元素中儲存該適當係數。The number of times of all polynomials in the above example is one, so the number of result coefficient vectors is two. In a specific embodiment, the C 1 and C 2 vectors are calculated using an output command based on the output of the range detection phase described in FIGS. 5a and 5b, which corresponds to the two coefficient vectors C 1 and C 2 . The appropriate coefficient is stored in the element.

在計算對應於輸入向量X之多項式的係數之後,可對於該輸入向量X中之每一輸入值執行該多項式評估計算。於一具體實施例中,多項式計算可被分成二主要操作。該第一操作包括發現每一輸入值由該曲線方程之範圍的開始之偏置。於一具體實施例中,發現該等偏置可藉由譬如使用一混合指令將每一範圍之開始匹配至每一輸入點而被達成。由圖4之曲線方程的每一範圍之開始的偏置係接著藉由自該對應的輸入向量元素減去每一範圍的最初值所計算。譬如,圖4的曲線方程中之點77將被分派至範圍5。既然範圍5之開始係在70,由其之分派範圍之開始的偏置為7。該第二操作包括對於每一輸入向量元素計算該輸出向量元素。為了計算該最後之輸出向量,一範圍的開始中所發現之偏置被發現及設定為一用於該有關多項式的輸入元素。譬如,該範圍5多項式係藉由以下之公式所敘述:y=2*x-20。對於該輸入向量元素77,我們獲得7之偏置,且如此用於點77之最後值將為y=2*(偏置)-20=2*(7)-20=-6。在計算對應於該等輸入向量元素的剩餘多項式之後,該結果可被儲存於一結果向量中。下文說明用於該最初範圍值B之向量值、偏置向量值O、及輸出向量值Y:After calculating the coefficients of the polynomial corresponding to the input vector X, the polynomial evaluation calculation can be performed for each of the input vectors X. In one embodiment, the polynomial calculation can be divided into two main operations. The first operation includes finding that each input value is offset by the beginning of the range of the curve equation. In one embodiment, the offsets are found to be achieved by, for example, using a blending instruction to match the beginning of each range to each input point. The offset from the beginning of each range of the curve equation of Figure 4 is then calculated by subtracting the initial value of each range from the corresponding input vector element. For example, point 77 in the curve equation of Figure 4 will be assigned to range 5. Since the start of range 5 is at 70, the offset from the beginning of its dispatch range is 7. The second operation includes calculating the output vector element for each input vector element. To calculate the final output vector, the offset found in the beginning of a range is found and set as an input element for the polynomial of interest. For example, the range 5 polynomial is described by the following formula: y = 2 * x -20. For this input vector element 77, we get the offset of 7 and the final value so used for point 77 would be y=2*(offset)-20=2*(7)-20=-6. After calculating the residual polynomials corresponding to the input vector elements, the results can be stored in a result vector. The vector value, offset vector value O, and output vector value Y for the initial range value B are described below:

該輸出向量Y係根據一具體實施例藉由該表示式計算。在此範例中,該輸出向量Y係藉由“Y=O*C1+C2”所計算。The output vector Y is calculated by the expression according to a specific embodiment. In this example, the output vector Y is calculated by "Y=O*C1+C2".

圖6說明可會同本發明之至少一具體實施例而被使用的操作之流程圖。於一具體實施例中,在操作601,範圍檢測向量被產生。於一具體實施例中,該範圍檢測向量係根據過程、諸如在此中所說明之二進位搜尋及邏輯對於每一輸入向量元素而產生。在操作605,係數匹配被執行,以根據該等輸入向量元素產生對應於該曲線方程之每一範圍的多項式之係數。在操作610,多項式計算被執行,對於該輸入向量中之每一元素,且該結果被儲存於一結果向量中。Figure 6 illustrates a flow diagram of operations that may be used in conjunction with at least one embodiment of the present invention. In one embodiment, at operation 601, a range detection vector is generated. In one embodiment, the range detection vector is generated for each input vector element according to a process, such as the binary search and logic described herein. At operation 605, coefficient matching is performed to generate coefficients of polynomials corresponding to each range of the curve equation based on the input vector elements. At operation 610, a polynomial calculation is performed for each element in the input vector, and the result is stored in a result vector.

至少一具體實施例之一或更多態樣可藉由儲存在一電腦可讀媒體上之代表性資料所提供,該資料代表該處理器內之各種邏輯,當藉由一機器所讀取時造成該機器製造邏輯,以執行在此中所敘述之技術。此等代表、已知為於“IP核心”可被儲存在一實質、電腦可讀媒體(“磁帶”)上,且供給至各種客戶或製造設備,以載入真正地製成該邏輯或處理器之成形機器。One or more aspects of at least one embodiment may be provided by representative material stored on a computer readable medium, the data representing various logic within the processor, when read by a machine The machine manufacturing logic is caused to perform the techniques described herein. Such representatives, known as "IP cores", can be stored on a physical, computer readable medium ("tape") and supplied to various customers or manufacturing equipment to load the logic or process that is actually made. Forming machine.

如此,一用於引導微架構記憶體區域存取之方法及設備已被敘述。當然該上面之敘述係意欲為說明性及非限制性。於閱讀及了解該上面之敘述時,許多其他具體實施例對於熟諳此技藝者將變得明顯。因此,隨著此等申請專利所給予之同等項的整個範圍,本發明之範圍將參考所附申請專利範圍被決定。Thus, a method and apparatus for directing access to a micro-architectural memory region has been described. The above description is intended to be illustrative and not limiting. Many other specific embodiments will become apparent to those skilled in the art upon reading this description. Therefore, the scope of the invention will be determined with reference to the appended claims.

5...範圍5. . . range

32...記憶體32. . . Memory

34...記憶體34. . . Memory

77...點77. . . point

100...微處理器100. . . microprocessor

105...處理器核心105. . . Processor core

107...區域快取記憶體107. . . Area cache memory

110...處理器核心110. . . Processor core

113...區域快取記憶體113. . . Area cache memory

115...共享快取記憶體115. . . Shared cache memory

119...邏輯119. . . logic

201...處理器201. . . processor

203...快取記憶體203. . . Cache memory

205...處理器205. . . processor

207...快取記憶體207. . . Cache memory

210...處理器210. . . processor

213...快取記憶體213. . . Cache memory

215...處理器215. . . processor

217...快取記憶體217. . . Cache memory

219...邏輯219. . . logic

220...快取記憶體220. . . Cache memory

223...處理器核心223. . . Processor core

225...快取記憶體225. . . Cache memory

227...處理器核心227. . . Processor core

230...快取記憶體230. . . Cache memory

233...處理器核心233. . . Processor core

235...快取記憶體235. . . Cache memory

237...處理器核心237. . . Processor core

240...快取記憶體240. . . Cache memory

243...處理器核心243. . . Processor core

245...快取記憶體245. . . Cache memory

247...處理器核心247. . . Processor core

250...快取記憶體250. . . Cache memory

253...處理器核心253. . . Processor core

255...快取記憶體255. . . Cache memory

257...處理器核心257. . . Processor core

260...系統記憶體260. . . System memory

265...晶片組265. . . Chipset

319...邏輯319. . . logic

338...繪圖電路338. . . Drawing circuit

339...繪圖介面339. . . Drawing interface

350...點對點介面350. . . Point-to-point interface

352...點對點介面352. . . Point-to-point interface

354...點對點介面354. . . Point-to-point interface

370...處理器370. . . processor

372...記憶體控制器集線器372. . . Memory controller hub

376...點對點介面電路376. . . Point-to-point interface circuit

378...點對點介面電路378. . . Point-to-point interface circuit

380...處理器380. . . processor

382...記憶體控制器集線器382. . . Memory controller hub

386...點對點介面電路386. . . Point-to-point interface circuit

388...點對點介面電路388. . . Point-to-point interface circuit

390...晶片組390. . . Chipset

394...點對點介面電路394. . . Point-to-point interface circuit

394...點對點介面電路394. . . Point-to-point interface circuit

500a‧‧‧邏輯500a‧‧‧Logic

500b‧‧‧二進位搜尋樹枝狀圖500b‧‧‧ binary search for dendrites

501a‧‧‧輸入向量501a‧‧‧ input vector

501b‧‧‧輸入向量501b‧‧‧ input vector

505a‧‧‧比較邏輯505a‧‧‧Comparative logic

510a‧‧‧範圍限制向量510a‧‧‧ Range Limit Vector

510b‧‧‧元素510b‧‧‧ elements

515a‧‧‧零向量515a‧‧‧zero vector

520a‧‧‧範圍限制向量520a‧‧‧ Range Limit Vector

525a‧‧‧範圍限制向量525a‧‧‧ Range Restriction Vector

601‧‧‧操作601‧‧‧ operation

605‧‧‧操作605‧‧‧ operation

610‧‧‧操作610‧‧‧ operation

C1 ‧‧‧係數向量C 1 ‧ ‧ coefficient vector

C2 ‧‧‧係數向量C 2 ‧‧‧ coefficient vector

I‧‧‧元素I‧‧‧ elements

R‧‧‧範圍檢測向量R‧‧‧ range detection vector

T‧‧‧範圍限制向量T‧‧‧ Range Limit Vector

T1‧‧‧半向量T1‧‧‧ half vector

T2‧‧‧半向量T2‧‧‧ half vector

T3‧‧‧半向量T3‧‧‧ half vector

T4‧‧‧中間向量元素T4‧‧‧ intermediate vector element

T5‧‧‧半向量T5‧‧‧ half vector

T6‧‧‧半向量T6‧‧‧ half vector

T7‧‧‧半向量T7‧‧‧ half vector

T8‧‧‧半向量T8‧‧‧ half vector

X‧‧‧輸入向量X‧‧‧ input vector

本發明之具體實施例係藉由所附圖面的圖示中之範例所說明,且不通過限制,及其中類似參考數字意指類似元件,且其中:The specific embodiments of the present invention are illustrated by way of example in the accompanying drawings, and not by way of limitation,

圖1說明一微處理器之方塊圖,其中本發明之至少一具體實施例可被使用;1 illustrates a block diagram of a microprocessor in which at least one embodiment of the present invention can be used;

圖2說明一共享匯流排電腦系統之方塊圖,其中本發明之至少一具體實施例可被使用;2 illustrates a block diagram of a shared busbar computer system in which at least one embodiment of the present invention can be utilized;

圖3說明一點對點互連電腦系統之方塊圖,其中本發明之至少一具體實施例可被使用;3 illustrates a block diagram of a point-to-point interconnect computer system in which at least one embodiment of the present invention can be used;

圖4根據一具體實施例說明被分成各區域之曲線方程。Figure 4 illustrates a curve equation divided into regions according to an embodiment.

圖5係回應於區域檢測指令可根據一具體實施例被用來在曲線方程內加速區域檢測之邏輯的概要圖。Figure 5 is a schematic diagram of logic that can be used to accelerate region detection within a curve equation in response to a region detection command in accordance with a particular embodiment.

圖6係可被用於執行本發明之至少一具體實施例的操作之流程圖。6 is a flow diagram of operations that may be used to perform at least one embodiment of the present invention.

100...微處理器100. . . microprocessor

105...處理器核心105. . . Processor core

107...區域快取記憶體107. . . Area cache memory

110...處理器核心110. . . Processor core

113...區域快取記憶體113. . . Area cache memory

115...共享快取記憶體115. . . Shared cache memory

119...邏輯119. . . logic

Claims (23)

一種用以執行範圍檢測之設備,包括:範圍檢測邏輯,包括:解碼邏輯,以解碼一範圍檢測指令,以從一輸入向量X以及一範圍限制向量T,執行單一指令多數資料(single instruction multiple data;SIMD)範圍檢測;以及執行邏輯,回應於該範圍檢測指令,以執行SIMD比較以使一曲線方程函數的範圍匹配該輸入向量X的每一輸入點元素,並累積SIMD比較結果以產生一對應的範圍向量R,包含該曲線方程函數的一範圍值,其中包括該輸入向量X的該對應的輸入點,以及以儲存該範圍向量R在一SIMD暫存器中。 An apparatus for performing range detection, comprising: range detection logic, comprising: decoding logic to decode a range detection instruction to execute a single instruction multiple data from an input vector X and a range restriction vector T (single instruction multiple data) ; SIMD) range detection; and execution logic responsive to the range detection instruction to perform SIMD comparison to match the range of a curve equation function to each input point element of the input vector X and to accumulate the SIMD comparison result to produce a correspondence The range vector R includes a range of values of the curve equation function, including the corresponding input point of the input vector X, and storing the range vector R in a SIMD register. 如申請專利範圍第1項之設備,其中該執行邏輯包括比較邏輯,以比較該輸入向量之每一元素與一對應的範圍限制向量元素。 The device of claim 1, wherein the execution logic includes comparison logic to compare each element of the input vector with a corresponding range limit vector element. 如申請專利範圍第1項之設備,其中該執行邏輯包括二進位搜尋邏輯,以比較該輸入向量之每一元素與一對應的範圍限制向量元素。 The device of claim 1, wherein the execution logic comprises binary search logic to compare each element of the input vector with a corresponding range limit vector element. 如申請專利範圍第1項之設備,其中該範圍檢測邏輯包括一範圍向量儲存器,以儲存包括該等範圍值之該範圍向量。 The apparatus of claim 1, wherein the range detection logic includes a range vector storage to store the range vector including the range values. 如申請專利範圍第4項之設備,其中該範圍檢測邏輯包括一輸入向量儲存器,以儲存包括該等輸入點元素之該輸入向量。 The device of claim 4, wherein the range detection logic comprises an input vector storage to store the input vector comprising the input point elements. 如申請專利範圍第5項之設備,其中該設備包括至少一係數向量儲存器,以儲存對應於該等輸入向量元素之複數係數向量元素。 The device of claim 5, wherein the device comprises at least one coefficient vector storage to store a complex coefficient vector element corresponding to the input vector elements. 如申請專利範圍第6項之設備,其中該設備包括至少一偏置向量儲存器,以儲存對應於該等輸入向量元素之複數偏置向量元素。 The device of claim 6, wherein the device includes at least one offset vector storage to store a plurality of offset vector elements corresponding to the input vector elements. 如申請專利範圍第7項之設備,其中該設備包括一輸出向量儲存器,以儲存對應於該等輸入向量元素之複數輸出向量元素。 The device of claim 7, wherein the device includes an output vector storage to store a plurality of output vector elements corresponding to the input vector elements. 一種用以執行範圍檢測之方法,包括:執行範圍檢測,以產生對應於曲線方程函數之複數輸入值的複數範圍值;執行一係數匹配操作,以產生對應於複數多項式之複數係數,該複數多項式對應於該曲線方程函數之輸入值;執行一多項式評估計算,以產生對應於該複數輸入值之複數輸出值;其中執行該範圍檢測包括:解碼一範圍檢測指令,以從一輸入向量X以及一範圍限制向量T,執行單一指令多數資料(SIMD)範圍檢測;以及回應於該範圍檢測指令,執行SIMD比較以使該曲線方程函數的範圍匹配該輸入向量X的每一輸入點元素,並累積SIMD比較結果以產生一對應的範圍向量R,包含該曲線方程函數的一範圍值,其中包括該輸入向量X的該對應的輸入點,以及 儲存該範圍向量R在一SIMD暫存器中作為該範圍檢測的結果。 A method for performing range detection, comprising: performing range detection to generate a complex range value corresponding to a complex input value of a curve equation function; performing a coefficient matching operation to generate a complex coefficient corresponding to a complex polynomial, the complex polynomial Corresponding to an input value of the curve equation function; performing a polynomial evaluation calculation to generate a complex output value corresponding to the complex input value; wherein performing the range detection comprises: decoding a range detection instruction from an input vector X and a Range limit vector T, performing single instruction majority data (SIMD) range detection; and in response to the range detection instruction, performing SIMD comparison to match the range of the curve equation function to each input point element of the input vector X, and accumulating SIMD Comparing the results to produce a corresponding range vector R comprising a range of values of the curve equation function, including the corresponding input point of the input vector X, and The range vector R is stored as a result of the range detection in a SIMD register. 如申請專利範圍第9項之方法,其中該範圍檢測指令將造成範圍檢測邏輯產生一包括該複數範圍值之該範圍向量。 The method of claim 9, wherein the range detection instruction causes the range detection logic to generate the range vector including the complex range value. 如申請專利範圍第10項之方法,其中該範圍檢測邏輯包括比較邏輯,以比較該複數輸入值之每一個與一對應的限制範圍向量元素。 The method of claim 10, wherein the range detection logic includes comparison logic to compare each of the complex input values with a corresponding limit range vector element. 如申請專利範圍第10項之方法,其中該邏輯包括二進位搜尋邏輯,以比較該複數輸入值之每一個與一對應的限制範圍向量元素。 The method of claim 10, wherein the logic comprises binary search logic to compare each of the complex input values with a corresponding limit range vector element. 一種用以執行範圍檢測之系統,包括:一儲存器,以儲存執行指令;一處理器,以決定對應於複數輸入向量元素用之每一曲線方程多項式的範圍值;其中該處理器包括:解碼邏輯,以解碼包括一範圍檢測指令之該執行指令,以從一輸入向量X以及一範圍限制向量T,執行單一指令多數資料(SIMD)範圍檢測;以及執行邏輯,回應於該範圍檢測指令,以執行SIMD比較以使該曲線方程多項式的範圍匹配該輸入向量X的每一輸入向量元素,並累積SIMD比較結果以產生一對應的範圍向量R,包含該曲線方程多項式的一範圍值,其中包括該輸入向量X的該對應的輸入向量元素。 A system for performing range detection, comprising: a memory to store execution instructions; a processor to determine a range value corresponding to each curve equation polynomial for a complex input vector element; wherein the processor comprises: decoding Logic to decode the execution instruction including a range detection instruction to perform single instruction majority data (SIMD) range detection from an input vector X and a range restriction vector T; and execution logic responsive to the range detection instruction to Performing a SIMD comparison such that the range of the curve equation polynomial matches each input vector element of the input vector X and accumulating the SIMD comparison result to produce a corresponding range vector R comprising a range of values of the curve equation polynomial, including the Enter the corresponding input vector element of vector X. 如申請專利範圍第13項之系統,其中該處理器包括比較邏輯,以比較該輸入向量之每一元素與一對應的範圍限制向量元素。 The system of claim 13, wherein the processor includes comparison logic to compare each element of the input vector with a corresponding range limit vector element. 如申請專利範圍第13項之系統,其中該處理器包括二進位搜尋邏輯,以比較該輸入向量之每一元素與一對應的範圍限制向量元素。 The system of claim 13, wherein the processor includes binary search logic to compare each element of the input vector with a corresponding range limit vector element. 如申請專利範圍第13項之系統,其中該處理器包括一範圍向量儲存器,以儲存包括該等範圍值之該範圍向量。 The system of claim 13, wherein the processor includes a range vector storage to store the range vector including the range values. 如申請專利範圍第16項之系統,其中該處理器包括一輸入向量儲存器,以儲存該等輸入向量元素。 A system as in claim 16 wherein the processor includes an input vector storage for storing the input vector elements. 如申請專利範圍第17項之系統,其中該處理器包括至少一係數向量儲存器,以儲存對應於該等輸入向量元素之複數係數向量元素。 The system of claim 17, wherein the processor includes at least one coefficient vector storage to store complex coefficient vector elements corresponding to the input vector elements. 如申請專利範圍第18項之系統,其中該處理器包括至少一偏置向量儲存器,以儲存對應於該等輸入向量元素之複數偏置向量元素。 The system of claim 18, wherein the processor includes at least one offset vector storage to store complex offset vector elements corresponding to the input vector elements. 一種處理器,包括:第一邏輯,以執行範圍檢測,以產生對應於曲線方程函數之複數輸入值的複數範圍值;第二邏輯,以執行一係數匹配操作,以產生對應於複數多項式之複數係數,該等多項式對應於該曲線方程函數之輸入值;第三邏輯,以執行一多項式評估計算,以產生對應於 該複數輸入值之複數輸出值;其中該第一邏輯包括:解碼邏輯,以解碼包括一範圍檢測指令之執行指令,以從一輸入向量X以及一範圍限制向量T,執行單一指令多數資料(SIMD)範圍檢測;以及執行邏輯,回應於該範圍檢測指令,以執行SIMD比較以使該曲線方程多項式的範圍匹配該輸入向量X的每一輸入向量元素,並累積SIMD比較結果以產生一對應的範圍向量R,包含該曲線方程多項式的一範圍值,其中包括該輸入向量X的該對應的輸入向量元素。 A processor comprising: first logic to perform range detection to generate a complex range value corresponding to a complex input value of a curve equation function; second logic to perform a coefficient matching operation to generate a complex number corresponding to a complex polynomial a coefficient, the polynomial corresponding to an input value of the curve equation function; a third logic to perform a polynomial evaluation calculation to generate a corresponding a complex output value of the complex input value; wherein the first logic includes: decoding logic to decode an execution instruction including a range detection instruction to execute a single instruction majority data (SIMD) from an input vector X and a range restriction vector T Range detection; and execution logic responsive to the range detection instruction to perform SIMD comparison to match the range of the curve equation polynomial to each input vector element of the input vector X and to accumulate the SIMD comparison result to produce a corresponding range A vector R comprising a range of values of the polynomial of the curve equation, including the corresponding input vector element of the input vector X. 如申請專利範圍第20項之處理器,其中該範圍檢測指令將造成該第一邏輯產生包括該複數範圍值之該範向量R,以及儲存該範圍向量R在一SIMD暫存器中作為該範圍檢測指令的結果。 The processor of claim 20, wherein the range detection instruction causes the first logic to generate the norm vector R including the complex range value, and storing the range vector R as a range in a SIMD register The result of the test instruction. 如申請專利範圍第21項之處理器,其中該第一邏輯包括比較邏輯,以比較該複數輸入值之每一個與一對應的範圍限制向量元素。 The processor of claim 21, wherein the first logic comprises comparison logic to compare each of the complex input values with a corresponding range limit vector element. 如申請專利範圍第21項之處理器,其中該第一邏輯包括二進位搜尋邏輯,以比較該複數輸入值之每一個與一對應的範圍限制向量元素。The processor of claim 21, wherein the first logic comprises binary search logic to compare each of the complex input values with a corresponding range limit vector element.
TW98136966A 2008-10-31 2009-10-30 Apparatus,processor,system,method,instruction,and logic for performing range detection TWI470545B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US12/290,565 US8386547B2 (en) 2008-10-31 2008-10-31 Instruction and logic for performing range detection

Publications (2)

Publication Number Publication Date
TW201030607A TW201030607A (en) 2010-08-16
TWI470545B true TWI470545B (en) 2015-01-21

Family

ID=42063259

Family Applications (1)

Application Number Title Priority Date Filing Date
TW98136966A TWI470545B (en) 2008-10-31 2009-10-30 Apparatus,processor,system,method,instruction,and logic for performing range detection

Country Status (7)

Country Link
US (1) US8386547B2 (en)
JP (2) JP5518087B2 (en)
KR (1) KR101105474B1 (en)
CN (1) CN101907987B (en)
DE (1) DE102009051288A1 (en)
TW (1) TWI470545B (en)
WO (1) WO2010051298A2 (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9454366B2 (en) 2012-03-15 2016-09-27 International Business Machines Corporation Copying character data having a termination character from one memory location to another
US9280347B2 (en) 2012-03-15 2016-03-08 International Business Machines Corporation Transforming non-contiguous instruction specifiers to contiguous instruction specifiers
US9459864B2 (en) 2012-03-15 2016-10-04 International Business Machines Corporation Vector string range compare
US9715383B2 (en) 2012-03-15 2017-07-25 International Business Machines Corporation Vector find element equal instruction
US9454367B2 (en) 2012-03-15 2016-09-27 International Business Machines Corporation Finding the length of a set of character data having a termination character
US9459868B2 (en) 2012-03-15 2016-10-04 International Business Machines Corporation Instruction to load data up to a dynamically determined memory boundary
US9588762B2 (en) 2012-03-15 2017-03-07 International Business Machines Corporation Vector find element not equal instruction
US9268566B2 (en) 2012-03-15 2016-02-23 International Business Machines Corporation Character data match determination by loading registers at most up to memory block boundary and comparing
US9459867B2 (en) 2012-03-15 2016-10-04 International Business Machines Corporation Instruction to load data up to a specified memory boundary indicated by the instruction
US9710266B2 (en) 2012-03-15 2017-07-18 International Business Machines Corporation Instruction to compute the distance to a specified memory boundary
US9495155B2 (en) * 2013-08-06 2016-11-15 Intel Corporation Methods, apparatus, instructions and logic to provide population count functionality for genome sequencing and alignment
US9513907B2 (en) * 2013-08-06 2016-12-06 Intel Corporation Methods, apparatus, instructions and logic to provide vector population count functionality
US20160124651A1 (en) * 2014-11-03 2016-05-05 Texas Instruments Incorporated Method for performing random read access to a block of data using parallel lut read instruction in vector processors
US20190250917A1 (en) * 2018-02-14 2019-08-15 Apple Inc. Range Mapping of Input Operands for Transcendental Functions

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030018687A1 (en) * 1999-04-29 2003-01-23 Stavros Kalafatis Method and system to perform a thread switching operation within a multithreaded processor based on detection of a flow marker within an instruction information
US20050044123A1 (en) * 2003-08-22 2005-02-24 Apple Computer, Inc., Computation of power functions using polynomial approximations
CN1754187A (en) * 2003-02-28 2006-03-29 索尼株式会社 Image processing device, method, and program
US20070074007A1 (en) * 2005-09-28 2007-03-29 Arc International (Uk) Limited Parameterizable clip instruction and method of performing a clip operation using the same

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4918618A (en) * 1988-04-11 1990-04-17 Analog Intelligence Corporation Discrete weight neural network
JP3303835B2 (en) * 1999-04-30 2002-07-22 日本電気株式会社 Apparatus and method for generating pitch pattern for rule synthesis of speech
JP3688533B2 (en) * 1999-11-12 2005-08-31 本田技研工業株式会社 Degradation state evaluation method of exhaust gas purification catalyst device
WO2003019356A1 (en) 2001-08-22 2003-03-06 Adelante Technologies B.V. Pipelined processor and instruction loop execution method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030018687A1 (en) * 1999-04-29 2003-01-23 Stavros Kalafatis Method and system to perform a thread switching operation within a multithreaded processor based on detection of a flow marker within an instruction information
CN1754187A (en) * 2003-02-28 2006-03-29 索尼株式会社 Image processing device, method, and program
US20050044123A1 (en) * 2003-08-22 2005-02-24 Apple Computer, Inc., Computation of power functions using polynomial approximations
US20070074007A1 (en) * 2005-09-28 2007-03-29 Arc International (Uk) Limited Parameterizable clip instruction and method of performing a clip operation using the same

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
國立成功大學機械工程系碩士論文,研究生:陳武勇,指導教授:何旭彬「使用圖形處理器於B-Spline有限元素分析」(2007年6月) *

Also Published As

Publication number Publication date
WO2010051298A2 (en) 2010-05-06
KR20100048928A (en) 2010-05-11
TW201030607A (en) 2010-08-16
KR101105474B1 (en) 2012-01-13
US8386547B2 (en) 2013-02-26
WO2010051298A3 (en) 2010-07-08
JP5518087B2 (en) 2014-06-11
US20100115014A1 (en) 2010-05-06
JP2014096174A (en) 2014-05-22
DE102009051288A1 (en) 2010-05-06
JP5883462B2 (en) 2016-03-15
CN101907987B (en) 2015-05-20
JP2012507796A (en) 2012-03-29
CN101907987A (en) 2010-12-08

Similar Documents

Publication Publication Date Title
TWI470545B (en) Apparatus,processor,system,method,instruction,and logic for performing range detection
CN109240746B (en) Apparatus and method for performing matrix multiplication
JP5647859B2 (en) Apparatus and method for performing multiply-accumulate operations
JP5573134B2 (en) Vector computer and instruction control method for vector computer
US20210264273A1 (en) Neural network processor
JP5731937B2 (en) Vector floating point argument reduction
EP3451153B1 (en) Apparatus and method for executing transcendental function operation of vectors
CN110321161B (en) Vector function fast lookup using SIMD instructions
US12166878B2 (en) System and method to improve efficiency in multiplication_ladder-based cryptographic operations
US5341320A (en) Method for rapidly processing floating-point operations which involve exceptions
WO2017185392A1 (en) Device and method for performing four fundamental operations of arithmetic of vectors
TWI493456B (en) Method, apparatus and system for execution of a vector calculation instruction
US11416261B2 (en) Group load register of a graph streaming processor
TWI587137B (en) Improved simd k-nearest-neighbors implementation
WO2024032027A1 (en) Method for reducing power consumption, and processor, electronic device and storage medium
WO2019023910A1 (en) Data processing method and device
CN110060195A (en) A kind of method and device of data processing
US10289386B2 (en) Iterative division with reduced latency
KR20140138053A (en) Fma-unit, in particular for use in a model calculation unit for pure hardware-based calculation of a function-model
US11080054B2 (en) Data processing apparatus and method for generating a status flag using predicate indicators
JP3310316B2 (en) Arithmetic unit
US20220326956A1 (en) Processor embedded with small instruction set
JP3773033B2 (en) Data arithmetic processing apparatus and data arithmetic processing program
US20110153702A1 (en) Multiplication of a vector by a product of elementary matrices
JP2002304288A (en) Data operation processing device and data operation processing program

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees