[go: up one dir, main page]

WO1998014877A1 - Adressage virtuel pour sous-systeme dma - Google Patents

Adressage virtuel pour sous-systeme dma Download PDF

Info

Publication number
WO1998014877A1
WO1998014877A1 PCT/US1997/016986 US9716986W WO9814877A1 WO 1998014877 A1 WO1998014877 A1 WO 1998014877A1 US 9716986 W US9716986 W US 9716986W WO 9814877 A1 WO9814877 A1 WO 9814877A1
Authority
WO
WIPO (PCT)
Prior art keywords
subsystem
memory
address
graphics
set forth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US1997/016986
Other languages
English (en)
Inventor
Timothy J. Mcdonald
Michael Larson
Tom Albers
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cirrus Logic Inc
Original Assignee
Cirrus Logic Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cirrus Logic Inc filed Critical Cirrus Logic Inc
Publication of WO1998014877A1 publication Critical patent/WO1998014877A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1027Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1081Address translation for peripheral access to main memory, e.g. direct memory access [DMA]

Definitions

  • the present invention relates generally to computer addressing systems and more particularly to an improved signal processing method for providing direct memory access for computer subsystems.
  • the operating system generally utilizes the central processing unit (CPU) in a hardware-assisted virtual memory mode.
  • the hardware allows the software to treat memory as a large (larger than the available physical memory) , virtualized (at least one level of indirection to physical memory address) object. This is despite the fact that memory is allocated and de-allocated in 4K byte granularity, and that consecutive 4K blocks m virtual memory have no relation to each other m terms of physical address. If all applications and operating systems could always obtain assignment of all of the physically contiguous memory that the OS or application requested, there would be no need for virtual memory. However, this is not the case m modern applications and substantially all computer systems require virtual memory and virtual memory management .
  • Virtual memory management software is not bound to keep the virtual translation scheme stationary over time. That is, virtual to physical address mapping can and does change over time as virtual addresses are swapped to disk and their corresponding physical memory locations are freed to be used by other virtual address regions.
  • the memory management is usually done entirely within the purview of the operation system, software applications, and peripherals, and driver software is not involved m, nor informed of, these virtual address changes. The only interaction is the ability to request that certain regions of memory are locked-m for possible future use. Operating system suppliers usually recommend that memory not stay locked or dedicated indefinitely, partially because such memory dedication would cede too much control to device drivers and also degrade system performance. The system degradation is especially problematical in systems with only minimum memory installed.
  • the PCI bus m a computer system deals only m physical memory addresses.
  • the PCI addresses correspond directly to the physical address decode in the PC core logic. Therefore, there is a need for translation from virtual memory as used by the CPU and its software, to physical memory as heeded by the physically addressed devices and memory on the PCI bus.
  • a method and system are provided for implementing an information access and memory process by which memory page tables are assigned a stationary physical address in memory, and accessed directly at the assigned address by a graphics processor which effectively by-passes the system CPU and the CPU-related page table address translation iterations.
  • Figure 1 1 is a block diagram of a computer system including a graphics subsystem
  • Figure 2 is block diagram of the graphics device shown in Figure 1 ;
  • Figure 3 is a process diagram illustrating a typical transaction in obtaining real address data from a virtual address request
  • Figure 4 is an illustration of the addressing method implemented in the present example
  • FIG. 5 is a block diagram of several components of the graphics processor device shown in Figure 1;
  • Figure 6 is a schematic diagram illustrating the address translation unit shown in Figure 5;
  • FIG. 7 is a flowchart showing the steps of the method implemented in the present example.
  • FIG 8 is a flowchart showing the internal flow of the cache fill process shown in Figure 7.
  • FIG. 1 An exemplary hardware configuration of a workstation which may be used in conjunction with the present invention is illustrated and includes a central processing unit (CPU) 103, such as a conventional microprocessor, and a number of other units interconnected through a system bus 105 such as a so called "PCI" bus.
  • the bus 105 may include an extension 121 for further connections to other workstations or networks, other peripherals and the like.
  • the workstation shown in Figure 1 includes system random access memory (RAM) 109, and a memory controller 107.
  • the system bus 105 is also typically connected through a user interface adapter 115 to a keyboard device 111 and a mouse or other pointing device 113.
  • the system bus 105 is shown connected to the graphics device 117.
  • the graphics device is representative of many subsystems which may be implemented to take advantage of the benefits available from an implementation of the present invention.
  • the exemplary graphics device 117 includes a graphics processor 201 which is arranged to process, transmit and receive information or data from a graphics memory unit 203.
  • the graphics memory 203 may include, for example, a frame buffer unit for storing frame display information which is accessed by the graphics processor 201 and sent to the display device 119.
  • the display device 119 is operable to provide a graphics display of the information stored in the frame buffer as processed by the operation of the graphics processor 201.
  • FIG 3 there is shown an example of a typical system memory fetching operation after an address request is generated by a subsystem such as a graphics unit 301.
  • the graphics unit 301 is connected to a system memory controller 303 which, in turn, is connected to system memory 305.
  • System memory 305 in the illustration provides information back to the graphics unit 301.
  • the system memory controller 303 processes that request and, in conjunction with a system operating system, addresses a page table portion of system memory 305.
  • the contents of the physical addresses of the system memory 305 are moved throughout the system memory and are present at different locations at different times depending upon the applications being run on the computer.
  • the physical addresses and address content are kept track of by the operating system and maintained in a page table portion of the system memory 305.
  • the page table in system memory 305 is referred to in order to determine the address of the requested information at that particular time.
  • the system then decodes the address and releases the data DTI to the system bus to be picked up by the subsystem or graphics unit 301 which requested the data.
  • the data is typically requested by reference to a base address and a corresponding section beginning with that base address is returned to satisfy the request.
  • the graphics unit 301 will generate another request R1A to the memory controller 303 to access the memory 305 and the page table to locate the next segment of requested information R1A and send the data DT1A back to the requesting unit 301. That process generally continues and the requesting unit 301 may generate additional requests RIB and, in response, receive information DT1B from memory 305 until all of the requested information has been received.
  • the operation of the memory controller 303 is required in all of the data fetches and the operation of the operating system limits the amount of data accessed at one time so that additional accesses are often required and the memory controller 303 is engaged at each request for data.
  • the present disclosure shows a method and apparatus through which such data accesses may be accomplished much faster and with less memory bandwidth usage thereby freeing-up the memory bandwidth to process other tasks.
  • a system memory 400 is shown and includes a page table 401 and, in the present example, three locked pages 405, 407 and 409 at different addresses in system memory 401.
  • the page table 401 from the main system memory 400 is copied and read into a locked copy location or dedicated base address 403 in system memory 400.
  • the locked page table copy location 403 is located in the main system memory 400, but the copied page table may also be read into a dedicated base address in another memory system or subsystem.
  • various page segments 405, 407 and 409 within the system memory are located through use of the copied page table 403 and accessed as sequential segments by the graphics processor, engine 501 shown in Figure 5. That process allows single access readout of a larger block of memory, including sequential segments, without requiring additional processing by the system memory controller 303.
  • Figure 5 illustrates various components within the graphics device 117 shown in Figure 1.
  • a graphics processor engine 501 generates an address request for a virtual address which is sent to an address translation unit 503.
  • the address translation unit 503 translates the requested virtual address into a physical address which may be recognized by the PCI bus 105 and the system CPU 103.
  • the address request in a physical address format is sent to a PCI Interface unit 507 and applied to the PCI bus 105.
  • the address translation unit 503 is shown in detail in Figure 6.
  • there are three separate system or host memory apertures which are designated HXY0 (Host XY aperture "0"), HXY1 (Host XY aperture "1") and PF (Prefetch Unit) .
  • HXY0 Host XY aperture "0"
  • HXY1 Host XY aperture "1”
  • PF Prefetch Unit
  • Each of the three memory apertures has an associated req_base, page table and cache designation.
  • Each aperture represents a different view of the system and contains various aspects of the subsystem.
  • the HXY0 aperture may contain the Host XY, color and "Z" dimension information.
  • the HXYl aperture may contain linear transfer and host texture information
  • the PF aperture may contain so called "display list" information.
  • the requested address contains various bit segments which contain information about the requested address.
  • bits 21-12 (RA ⁇ 21-12 ⁇ ) represent a page number
  • bits 21-14 (RA ⁇ 21-14 ⁇ ) represent a page block
  • bits 13-12 (RA ⁇ 13-12 ⁇ ) represent a cache entry in the block
  • bits 11-2 (RA ⁇ ll-2 ⁇ ) represent an offset in the page.
  • three aperture-related base address registers 601, 603 and 605 are connected to a base address multiplexor device 607.
  • three aperture-related request address registers 609, 611 and 613 are connected to a corresponding request address multiplexer 615.
  • a aperture select circuit 625 provides an aperture select signal which is applied through a common connection 624 to the control terminals of the base address multiplexor 607, the request address multiplexor 615 and the cache tag multiplexor 623.
  • the output of the multiplexor 607 is applied to a request base register 627 and the output of multiplexer 615 is applied to a request address register 629. Similarly, the output from the multiplexer 623 is applied to a cache tag register 631.
  • the register 631 is connected to a "B" input of a comparator circuit 633. Bits 21-14 from register 629 are applied to register 635 which is in turn applied to an "A" input to the comparator 633 and also to the 11-4 bit positions of another register 639.
  • Bit positions 31-12 of register 639 receive an input fro bit positions 31-12 of the request base address register 627, and bit positions 3-2 of the register 639 receive an input from a Page Table Load State Machine (S.M.) as hereinafter explained in connection with Figure 8.
  • S.M. Page Table Load State Machine
  • the output of register 639 is applied to one input of a select 'address multiplexor 637.
  • the output from the select address multiplexor 637 is applied to the current PCI or physical address register 643.
  • Bit positions 11-2 of register 629 are applied to the bit position 11-2 input of register 641, the output of which is applied to a second input of the multiplexor 637.
  • the output of multiplexor 637 is controlled by a control input from the output of the comparator 633.
  • Thirty-two bit PCI READBACK information in register 645 is applied to a series of three register files 647, 649 and 651. Each of the register files includes a set of four registers.
  • Each set of four registers comprises a cache for each of the three apertures PF, HXYO and HXYl.
  • Outputs from the register files 647, 649 and 651 are applied to a select cache multiplexor 653, the output of which is controlled by the 13-12 bit contents of the request address register 629.
  • the output of the select cache multiplexor 653 is applied to register 655, the 31-12 bit output of which is applied to the 31-12 bit positions of register 641.
  • a REQ_ADDR [21:2] command from the graphics engine represents a request for data from the physical address specified in bit position 31 through bit position 12 of the command from the page table in system memory.
  • a REQ_BASE [31:12] command from the graphics unit represents a requested address from a 4 Megabyte aligned vertical addresses of system memory.
  • a cache_tag [21:14] represents a 16K aligned address tag of the current four pages in the cache registers of the graphics unit.
  • Cache_entry [x] [31 : 12] [y] commands represent the physical address of the designated page in cache with the "x" value standing for the page number of the value in cache, and the "y” value standing for whether or not a page is present.
  • "cache_entry [2] [31 : 12] [1] stands for the physical address specified in bit positions 31-12; with the "2" bit meaning that the third page (of pages “0", “1", “2” and “3") is present; and the "1" bit at the end position standing for the fact that there is a page present.
  • REQ_ADDR [21:2] bits 21 to 14 stand for the page block, bits 13 and 12 stand for the cache entry, and bits 11-2 stand for the offset from the beginning of the page.
  • the graphics unit issues a REQ_ADDR [21:2] command to fetch data stored in memory for a graphics operation to be accomplished such as the filling-in of a pixel on the display in accordance with the appropriate color information as stored in system memory.
  • a request is made REQ_ADDR ⁇ 21-2] for a host memory address access 701.
  • a check is made 703 to determine if the requested address is currently already stored in the graphics cache registers since if that is true, the requested page address information would be available in the graphics cache registers and an access to the page table copy in system memory would not be required. If the page address results in a cache hit, then data may be accessed in host memory directly. If there is a graphics cache miss, then the four page table addresses must be loaded into the cache and an access to the system memory is accomplished to
  • the cache registers 647, 649 and 651 eliminate the need to read the page table on each access, which would require two memory accesses for each REQ_ADDR given.
  • FIG 8 there is shown the cache fill state machine flow chart.
  • the process seeks access to the PCI bus to access host memory. Until access is granted to the PCI bus, the method idles 801 and waits for such access.
  • the cache-fill process loads four physical page pointers from four consecutive addresses in the page table into the four cache entries. The process continues to load 805 data phase "0" which loads cache entry "0" and increments the PCI address to the next page table entry.
  • That incrementing step loads the "3" bit position and the "2" bit position of the register 639 in Figure 6 such that those two bit positions are sequentially loaded with "00", “01” “10” and “11” combinations, respectively, as the iteration "i" is sequenced from “0" through “3”.
  • a check is then made 807 to determine whether the data is valid and if not the load step 805 continues until the data is determined to be valid.
  • each of the three graphics cache registers is loaded 809.
  • the process waits for a PCI idle state 813, and when a PCI idle state is detected, the process returns to the idle state 801 to await the next concurrence of a graphics cache miss and a PCI grant 803.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

la présente invention concerne un procédé et la mise en oeuvre d'un système informatique dans lequel un sous-système graphique comporte des registres d'antémémoire permettant de stocker des groupes d'adresses consécutives d'antémémoire permettant de stocker des groupes d'adresses consécutives pour chacune des fenêtres d'une pluralité de fenêtres de mémoire hôte. La table de page de mémoire d'un système hôte est copiée dans une partie séparée de mémoire. Des adresses virtuelles consécutives sont chargées dans des registres d'antémémoire de segments consécutifs de mémoire virtuelle dans les sous-systèmes graphiques et les requêtes d'adresse d'unité de traitement graphique sont ensuite comparées aux groupes stockés dans les registres d'antémémoire de traitement graphique local. Lorsqu'une correspondance existe, les informations d'adresse demandées sont fournies localement sans faire intervenir un accès à la mémoire hôte et sans extraire un délai de lecture de transaction. Lorsqu'il n'y a pas de correspondance, on copie un autre groupe d'adresses consécutives dans les registres d'antémémoire de traitement graphique, le processus étant répété.
PCT/US1997/016986 1996-09-30 1997-09-22 Adressage virtuel pour sous-systeme dma Ceased WO1998014877A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US72039696A 1996-09-30 1996-09-30
US08/720,396 1996-09-30

Publications (1)

Publication Number Publication Date
WO1998014877A1 true WO1998014877A1 (fr) 1998-04-09

Family

ID=24893872

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1997/016986 Ceased WO1998014877A1 (fr) 1996-09-30 1997-09-22 Adressage virtuel pour sous-systeme dma

Country Status (1)

Country Link
WO (1) WO1998014877A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006039057A3 (fr) * 2004-09-30 2006-06-29 Intel Corp Amelioration du rendement de la traduction d'adresse au moyen de tables de traduction couvrant de grands espaces adresse
CN1936869B (zh) * 2005-09-22 2010-06-16 国际商业机器公司 用于翻译地址的方法和系统

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4173783A (en) * 1975-06-30 1979-11-06 Honeywell Information Systems, Inc. Method of accessing paged memory by an input-output unit
US5369744A (en) * 1989-10-16 1994-11-29 Hitachi, Ltd. Address-translatable graphic processor, data processor and drawing method with employment of the same

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4173783A (en) * 1975-06-30 1979-11-06 Honeywell Information Systems, Inc. Method of accessing paged memory by an input-output unit
US5369744A (en) * 1989-10-16 1994-11-29 Hitachi, Ltd. Address-translatable graphic processor, data processor and drawing method with employment of the same

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"LINEAR-TO-PHYSICAL MEMORY MAPPING BY BUS MASTERS IN VIRTUAL MEMORY SYSTEMS", IBM TECHNICAL DISCLOSURE BULLETIN, vol. 34, no. 4A, 1 September 1991 (1991-09-01), pages 355 - 357, XP000210931 *
KJOS T J ET AL: "HARDWARE CACHE COHERENT INPUT/OUTPUT", HEWLETT-PACKARD JOURNAL, vol. 47, no. 1, 1 February 1996 (1996-02-01), pages 52 - 59, XP000559990 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006039057A3 (fr) * 2004-09-30 2006-06-29 Intel Corp Amelioration du rendement de la traduction d'adresse au moyen de tables de traduction couvrant de grands espaces adresse
GB2432443B (en) * 2004-09-30 2008-04-02 Intel Corp Performance enhancement of address translation using translation tables covering large address spaces
US8843727B2 (en) 2004-09-30 2014-09-23 Intel Corporation Performance enhancement of address translation using translation tables covering large address spaces
CN1936869B (zh) * 2005-09-22 2010-06-16 国际商业机器公司 用于翻译地址的方法和系统

Similar Documents

Publication Publication Date Title
US5740406A (en) Method and apparatus for providing fifo buffer input to an input/output device used in a computer system
US5956756A (en) Virtual address to physical address translation of pages with unknown and variable sizes
US5450564A (en) Method and apparatus for cache memory access with separate fetch and store queues
US5386524A (en) System for accessing information in a data processing system
US6449671B1 (en) Method and apparatus for busing data elements
US5638535A (en) Method and apparatus for providing flow control with lying for input/output operations in a computer system
US3761881A (en) Translation storage scheme for virtual memory system
US5864876A (en) DMA device with local page table
US5999198A (en) Graphics address remapping table entry feature flags for customizing the operation of memory pages associated with an accelerated graphics port device
US5765201A (en) Changing page size in storage media of computer system
US20080028181A1 (en) Dedicated mechanism for page mapping in a gpu
CN112631961B (zh) 一种内存管理单元、地址转译方法以及处理器
US5805930A (en) System for FIFO informing the availability of stages to store commands which include data and virtual address sent directly from application programs
US5696990A (en) Method and apparatus for providing improved flow control for input/output operations in a computer system having a FIFO circuit and an overflow storage area
US5918050A (en) Apparatus accessed at a physical I/O address for address and data translation and for context switching of I/O devices in response to commands from application programs
US5924126A (en) Method and apparatus for providing address translations for input/output operations in a computer system
HK1043222A1 (en) Input/output (i/o) address translation in a bridge proximate to a local i/o bus
US5765022A (en) System for transferring data from a source device to a target device in which the address of data movement engine is determined
US6088046A (en) Host DMA through subsystem XY processing
JP3449487B2 (ja) 変換索引緩衝機構
WO1998014878A1 (fr) Procede servant a obtenir un tampon de memoire contigue et a construire une table page
JPH07104816B2 (ja) コンピュータシステムを動作する方法及びコンピュータシステムにおけるメモリ管理装置
US6167498A (en) Circuits systems and methods for managing data requests between memory subsystems operating in response to multiple address formats
US5293622A (en) Computer system with input/output cache
CN111026680B (zh) 数据处理系统、电路及方法

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CN JP KR

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: JP

Ref document number: 1998516622

Format of ref document f/p: F

122 Ep: pct application non-entry in european phase