JP2953343B2

JP2953343B2 - Cache control system for distributed memory multiprocessor

Info

Publication number: JP2953343B2
Application number: JP7128884A
Authority: JP
Inventors: 尚夫小柳
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1995-04-29
Filing date: 1995-04-29
Publication date: 1999-09-27
Anticipated expiration: 2014-09-27
Also published as: JPH08305633A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、分散メモリ型マルチプ
ロセッサのキャッシュ制御システムに関し、特に、要素
プロセッサ間のデータ転送処理とオーバーラップして別
のメモリアクセス処理の実行を可能とする分散メモリ型
マルチプロセッサのキャッシュ制御システムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a cache control system for a distributed memory type multiprocessor, and more particularly, to a distributed memory type multiprocessor capable of executing another memory access process overlapping data transfer processing between element processors. The present invention relates to a cache control system for a multiprocessor.

【０００２】[0002]

【従来の技術】キャッシュ一致制御は、マルチプロセッ
サの方式技術において重要な問題の１つである。一般に
考えられている方式には、スヌープキャッシュ方式、デ
ィレクトリ方式、ソフトウェアベースの方式等が提案さ
れている。このうち、スヌープキャッシュ方式は、多く
のマルチプロセッサ方式の情報処理装置において採用さ
れている。また、ディレクトリ方式もＳＣ１（スケーラ
ブル・コヒーレント・インターフェース）等にも採用さ
れ、将来的には広く普及すると考えられている。しか
し、両方式には一長一短があり、一部のアプリケーショ
ン・プログラムについていえば性能面に、あるいは製品
のコスト面で問題がある。2. Description of the Related Art Cache coherence control is one of the important problems in multiprocessor system technology. As generally considered methods, a snoop cache method, a directory method, a software-based method, and the like have been proposed. Among them, the snoop cache method is adopted in many multiprocessor information processing apparatuses. The directory system is also adopted for SC1 (scalable coherent interface) and the like, and is considered to be widely used in the future. However, both methods have advantages and disadvantages, and there are problems in terms of performance or product cost for some application programs.

【０００３】ソフトウェア・ベースの方式については、
限られた容量のキャッシュを効率よく利用することがで
きないという欠点があるものの、キャッシュ一致制御の
ための要素プロセッサ間通信が少ないことから性能向上
に寄与すると共に、通信パスが存在せず、かつディレク
トリのような高価な記憶素子を必要としないことからコ
スト的にも非常に有利である。しかし、このソフトウェ
ア・ベースのキャッシュ一致制御方式では、キャッシュ
に存在するデータが有効か否かを確実に判断することが
できないという欠点がある。[0003] As for the software-based method,
Although there is a drawback that a limited-capacity cache cannot be used efficiently, there is little communication between element processors for cache matching control, which contributes to performance improvement, and there is no communication path and the directory Since an expensive storage element such as that described above is not required, it is very advantageous in terms of cost. However, this software-based cache coherence control method has a drawback that it is not possible to reliably determine whether data existing in the cache is valid.

【０００４】一方、分散メモリ型マルチプロセッサは、
近年、そのスケーラビリティ、ピーク性能の高さから一
部で商用化されるなど、１ＴＦＬＯＰＳ（テラフロップ
ス）の性能を実現する可能性があることから非常に注目
されている。しかし、分散メモリ型マルチプロセッサに
おいて高性能を達成するための大きな障害は、要素プロ
セッサ間通信性能が上がらないことである。On the other hand, a distributed memory type multiprocessor is
In recent years, due to its scalability and high peak performance, it has been attracting much attention because of its possibility of realizing 1TFLOPS (teraflops) performance, such as being partially commercialized. However, a major obstacle to achieving high performance in a distributed memory type multiprocessor is that the communication performance between element processors does not increase.

【０００５】また、今日提供されている情報処理装置の
うちには、拡張記憶装置が搭載されているものが多数あ
り、大容量メモリへの需要は今後も大きくなると考えら
れている。しかし、それに伴って、拡張記憶装置に対す
るデータ転送頻度も高くなることから、システム性能に
与える影響も大き〈なる傾向にあり、その速度向上が望
まれている。ただし、現状のマルチプロセッサにおける
キャッシュ一致制御に関しては、十分な考慮が払われて
いないのが実情である。これに関しては、例えば、拡張
記憶装置からの読み出しデータを主記憶装置に移動させ
る領域とそれ以外の処理で使用される記憶領域をソフト
ウェアが区別する方式、あるいは主記憶装置に書き込ま
れたデータ領域でキャッシュされているデータをデータ
転送中に無効化するといった方式が一般的に採用されて
いる。[0005] Further, among information processing apparatuses provided today, there are a large number of apparatuses equipped with an extended storage device, and the demand for a large-capacity memory is expected to increase in the future. However, with this, the frequency of data transfer to the extended storage device also increases, so that the influence on the system performance tends to be large, and an improvement in the speed is desired. However, in reality, sufficient consideration has not been given to the cache coherence control in the current multiprocessor. In this regard, for example, a method in which software distinguishes an area for moving read data from the extended storage device to the main storage device and a storage area used for other processing, or a data area written to the main storage device A method of invalidating cached data during data transfer is generally adopted.

【０００６】[0006]

【発明が解決しようとする課題】上述した従来の分散メ
モリ型マルチプロセッサにおいては、要素プロセッサ間
のデータ転送速度が低いことが、性能向上を実現するう
えで大きな障害となっている。近年、プロセッサのめざ
ましい性能向上があるものの、マルチプロセッサ化によ
る性能向上の追求は今後も続けられると思われる。しか
し、ＣＰＵの速度向上に反してＣＰＵを接続するパスの
通信性能は桁違いに遅く、実プログラムで要素プロセッ
サ間のデータ転送を含む場合、ＣＰＵがいくら高速でも
実効性能は向上しない。このことが、分散メモリ型マル
チプロセッサの普及を妨げる大きな要因となっている。 In the above-mentioned conventional distributed memory type multiprocessor, the inter-element processor
Low data transfer speed can improve performance
This is a major obstacle. In recent years, the aim of the processor
Although there is better performance,
The pursuit of higher performance is expected to continue in the future. Only
In contrast to the speed improvement of the CPU,
Communication performance is orders of magnitude slower.
When including data transfer between servers, no matter how fast the CPU is
Effective performance does not improve. This is a distributed memory type
This is a major factor that has hindered the spread of multiprocessors.

【０００７】[0007]

【０００８】また、従来の方法では、拡張記憶装置から
主記憶装置へデータを転送する際に、記憶領域の管理を
ソフトウェアで行う必要があると共に、データ転送処理
中においてはそれとは無関係の処理をオーバーラップし
て実行出来ないという問題があり、このことが、実行性
能を低下させる要因となっている。In the conventional method, when data is transferred from the extended storage device to the main storage device, the storage area must be managed by software, and during the data transfer process, unrelated processing is performed. There is a problem that execution is not possible due to overlap, and this is a factor that reduces execution performance.

【０００９】本発明は、上記従来の欠点を解消し、性能
向上を妨げる最大の要因である要素プロセッサ間転送処
理とオーバーラップして他のメモリアクセスを実行する
ことで要素プロセッサ間転送処理を見えなくする機会を
多くし、かつ、ソフトウェアベースのキャッシュ制御の
欠点である無駄なキャッシュ無効化を回避することので
きる分散メモリ型マルチプロセッサのキャッシュ制御シ
ステムを提供することを目的とする。The present invention solves the above-mentioned drawbacks of the prior art, and makes it possible to view the transfer processing between element processors by executing another memory access overlapping with the transfer processing between element processors, which is the biggest factor hindering performance improvement. An object of the present invention is to provide a cache control system of a distributed memory type multiprocessor that can increase the number of times of elimination and can avoid useless cache invalidation which is a drawback of software-based cache control.

【００１０】[0010]

【課題を解決するための手段】上記の目的を達成するた
め、本発明は、ＣＰＵとメモリを組み合わせた要素プロ
セッサを複数互いに接続してなり、前記要素プロセッサ
間のデータ転送機能を有する分散メモリ型マルチプロセ
ッサにおいて、前記要素プロセッサが、固定長のブロッ
クデータを格納する複数のエントリからなり、かつ前記
各エントリ毎にデータの属性を示す属性情報を設定する
属性フィールドを備えるキャッシュメモリと、前記要素
プロセッサ間データ転送を行なうと共に、前記要素プロ
セッサが備える前記メモリに対してデータの書き込みが
行われる場合に、書き込みが行われたアドレスを前記キ
ャッシュメモリに送出する要素プロセッサ間データ転送
制御手段と、前記要素プロセッサ間データ転送制御手段
からの前記書込みアドレスと一致する前記キャッシュメ
モリのエントリの前記属性フィールドに前記属性情報を
設定するキャッシュ制御手段と、前記要素プロセッサ間
のデータ転送処理中に、前記メモリに対するアクセス命
令を実行し、前記要素プロセッサ間のデータ転送処理後
に、前記属性フィールドに前記属性情報を設定した前記
キャッシュメモリのエントリを無効化する無効命令を実
行する命令実行手段を備える構成としている。SUMMARY OF THE INVENTION In order to achieve the above object, the present invention relates to a distributed memory type comprising a plurality of element processors each having a combination of a CPU and a memory connected to each other and having a data transfer function between the element processors. In a multiprocessor, the element processor includes a plurality of entries for storing fixed-length block data, and a cache memory including an attribute field for setting attribute information indicating an attribute of data for each entry; Inter-processor data transfer control means for performing an inter-processor data transfer and sending the written address to the cache memory when data is written to the memory provided in the element processor; and Writing from the inter-processor data transfer control means A cache control unit that sets the attribute information in the attribute field of the entry of the cache memory that matches the address, and executes an access instruction to the memory during a data transfer process between the element processors; After the data transfer processing, there is provided an instruction executing means for executing an invalidation instruction for invalidating an entry of the cache memory in which the attribute information is set in the attribute field.

【００１１】また、他の態様では、前記命令実行手段
は、前記要素プロセッサ間のデータ転送の起動を指示す
る命令と前記データ転送の終了を待ち合わせる命令の間
で、前記データ転送中に実行可能なアクセス命令を実行
する構成としている。In another aspect, the instruction execution means is executable during the data transfer between an instruction for instructing activation of data transfer between the element processors and an instruction for waiting for completion of the data transfer. It is configured to execute an access instruction.

【００１２】さらに他の態様では、前記命令実行手段
は、前記要素プロセッサ間のデータ転送中に、キャッシ
ュミスを起こした前記アクセス命令の実行を前記データ
転送終了まで延期させる構成としている。In still another aspect, the instruction execution means is configured to postpone execution of the access instruction having a cache miss during the data transfer between the element processors until completion of the data transfer.

【００１３】また、好ましい他の態様では、前記キャッ
シュ制御手段は、前記要素プロセッサ間データ転送制御
手段からの前記書込みアドレスと前記キャッシュメモリ
のアドレスを比較するアドレス比較手段と、前記アドレ
ス比較手段からのアドレス一致信号に基づいて前記キャ
ッシュメモリのエントリの前記属性フィールドに前記属
性情報を設定する設定手段を備える。In another preferred aspect, the cache control means includes: an address comparison means for comparing the write address from the element processor data transfer control means with an address of the cache memory; Setting means for setting the attribute information in the attribute field of the entry of the cache memory based on an address match signal;

【００１４】さらに、好ましい態様では、前記命令実行
手段は、前記要素プロセッサ間のデータ転送中であるこ
とを前記キャッシュ制御手段に通知し、前記キャッシュ
制御手段は、前記要素プロセッサ間データ転送制御手段
からの前記書込みアドレスと前記キャッシュメモリのア
ドレスを比較するアドレス比較手段と、前記アドレス比
較手段からのアドレス一致信号と前記命令実行手段から
のデータ転送中の通知に基づいて前記キャッシュメモリ
のエントリの前記属性フィールドに前記属性情報を設定
する設定手段と、前記命令実行手段からの無効命令に基
づいて前記属性フィールドに前記属性情報を設定した前
記キャッシュメモリのエントリを無効化する手段を備え
る。Further, in a preferred aspect, the instruction execution means notifies the cache control means that data transfer between the element processors is being performed, and the cache control means transmits the data transfer control information between the element processors. Address comparing means for comparing the write address with the address of the cache memory, and the attribute of the entry of the cache memory based on an address match signal from the address comparing means and a notification during data transfer from the instruction executing means. Setting means for setting the attribute information in a field; and means for invalidating an entry in the cache memory in which the attribute information is set in the attribute field based on an invalid instruction from the instruction executing means.

【００１５】[0015]

【作用】本発明によれば、要素プロセッサ間のデータ
転送処理中に更新されるデータであってキャッシュメモ
リにキャッシュされているデータに対してマークを設定
し、要素プロセッサ間のデータ転送中に実行される処理
においてキャッシュメモリをアクセス可能とし、要素プ
ロセッサ間のデータ転送後にマークされているキャッシ
ェデータのみを無効化することによって、要素プロセッ
サ間のデータ転送処理による性能低下を防ぐと共に、無
駄なキャッシュ無効化を回避する。すなわち、要素プロ
セッサ間データ転送起動命令と要素プロセッサ間データ
転送終了待ち命令で、要素プロセッサ間データ転送前に
実行することが決定している処理を挟み込み、要素プロ
セッサ間データ転送時間中にその挟み込んだ処理を実行
することにより、要素プロセッサが何も処理することが
無いアイドル状態の時間を削減する。According to the present invention, a mark is set for data that is updated during data transfer processing between element processors and is cached in the cache memory, and during data transfer between element processors, a mark is set. By making the cache memory accessible in the executed processing and invalidating only the cache data marked after the data transfer between the element processors, it is possible to prevent the performance degradation due to the data transfer processing between the element processors, Avoid cache invalidation. That is, the processing determined to be executed before the data transfer between the element processors is inserted between the data transfer start instruction between the element processors and the data transfer end wait instruction between the element processors, and the processing is inserted during the data transfer time between the element processors. By executing the processing, the idle time during which the element processors do nothing is reduced.

【００１６】[0016]

【実施例】以下、本発明の実施例について図面を参照し
て詳細に説明する。図１は、本発明によるキャッシュ制
御方式を適用した分散メモリ型マルチプロセッサの第１
実施例の構成を示すブロック図である。Embodiments of the present invention will be described below in detail with reference to the drawings. FIG. 1 shows a first example of a distributed memory type multiprocessor to which a cache control method according to the present invention is applied.
FIG. 3 is a block diagram illustrating a configuration of an example.

【００１７】図１においては、分散メモリ型マルチプロ
セッサを構成する複数の要素プロセッサのうちの一つの
要素プロセッサの構成を示している。他の要素プロセッ
サについても同様の構成となっている。FIG. 1 shows the configuration of one element processor among a plurality of element processors constituting a distributed memory multiprocessor. The other elements have the same configuration.

【００１８】本実施例の要素プロセッサ１０は、命令制
御部１１、キャッシュ制御部１２、メモリアクセス制御
部１３、ローカルメモリ１４、要素プロセッサ間データ
転送制御部１５を備えて構成される。ここで、要素プロ
セッサ間データ転送制御部１５は、自身のローカルメモ
リ１４とデータ転送相手である他の要素プロセッサのロ
ーカルメモリ２０との間でデータ転送を制御する。The element processor 10 of this embodiment includes an instruction control unit 11, a cache control unit 12, a memory access control unit 13, a local memory 14, and a data transfer control unit 15 between element processors. Here, the inter-element-processor data transfer control unit 15 controls data transfer between its own local memory 14 and the local memory 20 of another element processor as a data transfer partner.

【００１９】本実施例の特徴は、要素プロセッサ間のデ
ータ転送処理中に更新されるデータであってキャッシュ
メモリにキャッシュされているデータに対してマークを
設定し、要素プロセッサ間のデータ転送中に実行される
処理においてキャッシュメモリをアクセス可能とし、要
素プロセッサ間のデータ転送後にマークされているキャ
ッシェデータのみを無効化することによって、要素プロ
セッサ間のデータ転送処理による性能低下を防ぐと共
に、無駄なキャッシュ無効化を回避することにある。A feature of this embodiment is that a mark is set for data that is updated during data transfer processing between element processors and that is cached in the cache memory, and that data is set during data transfer between element processors. By making the cache memory accessible in the executed processing and invalidating only the cache data marked after the data transfer between the element processors, it is possible to prevent the performance degradation due to the data transfer processing between the element processors, The purpose is to avoid cache invalidation.

【００２０】このような特徴を実現するためには、ある
程度のプログラムのチューニングが必要となるが、ここ
で必要とされている技術は、従来の最適化コンパイラ技
術程度で十分可能である。すなわち、要素プロセッサ間
データ転送の性能低下を他の処理で隠すために、要素プ
ロセッサ間のデータ転送処理の起動を前もってかけてお
き、要素プロセッサ間データ転送起動命令と要素プロセ
ッサ間データ転送終了待ち命令で、要素プロセッサ間デ
ータ転送前に実行することが決定している処理を挟み込
み、要素プロセッサ間データ転送時間中にその挟み込ん
だ処理を実行することにより、要素プロセッサが何も処
理することが無いアイドル状態の時間を削減する構成と
する。To realize such characteristics, some degree of program tuning is required. However, the technology required here can be sufficiently achieved by the conventional optimizing compiler technology. In other words, in order to hide the performance degradation of the data transfer between the element processors by other processing, the data transfer processing between the element processors is activated in advance, and the instruction to start the data transfer between the element processors and the instruction to wait for the end of the data transfer between the element processors are issued. By interposing the processing determined to be executed before the data transfer between the element processors and executing the interposed processing during the data transfer time between the element processors, the idle processing in which the element processor does nothing is performed. It is configured to reduce the state time.

【００２１】要素プロセッサ間データ転送起動命令と要
素プロセッサ間データ転送終了待ち命令に挟まれた処理
ブロックにおいてキャッシュミスを起こさない限り、要
素プロセッサ間データ転送と上記処理ブロックによるア
クセス処理を並行して実行することができる。もし、上
記処理ブロックにおける処理でキャッシュミスを起こし
た場合は、メモリアクセス制御部１３によって、その命
令のアクセスはホールドされる。Unless a cache miss occurs in a processing block sandwiched between a data transfer start instruction between element processors and a data transfer end wait instruction between element processors, data transfer between element processors and access processing by the processing blocks are executed in parallel. can do. If a cache miss occurs in the processing in the above processing block, the access of the instruction is held by the memory access control unit 13.

【００２２】要素プロセッサ間データ転送処理中は、ロ
ーカルメモリ１４に対して書き込みが行われるアドレス
が登録されているキャッシュメモリのエントリ内の付随
情報にマークを設定するが、このマークされたキャッシ
ュデータは要素プロセッサ間データ転送処理中のメモリ
アクセス命令のみが使用可能なデータであり、要素プロ
セッサ間データ転送終了後は無効化が必要である。ここ
で、要素プロセッサ間データ転送終了待ち命令とは、要
素プロセッサ間データ転送終了まで動作が保証されない
処理を延期させるための命令である。この命令によっ
て、要素プロセッサ間データ転送と並行実行可能ブロッ
クの後の処理の正当性が保証される。During the data transfer processing between the element processors, a mark is set in the associated information in the entry of the cache memory in which the address to be written to the local memory 14 is registered. Only the memory access instruction during the data transfer processing between the element processors is usable data, and must be invalidated after the data transfer between the element processors is completed. Here, the instruction for waiting for the end of data transfer between element processors is an instruction for postponing a process whose operation is not guaranteed until the end of data transfer between element processors. This instruction guarantees the validity of the data transfer between the element processors and the processing after the parallel executable block.

【００２３】図１の実施例において、命令制御部１１
は、命令をデコードし、信号線１０２を介してキャッシ
ュ制御部１２に対してアクセス指示を送る。また、要素
プロセッサ間のデータ転送処理を行なう場合は、メモリ
制御部１１が、メモリアクセス制御部１３と要素プロセ
ッサ間データ転送制御部１５に対して信号線１０３、１
０５を介して転送起動を通知する。In the embodiment shown in FIG.
Decodes the instruction and sends an access instruction to the cache control unit 12 via the signal line 102. When performing data transfer processing between the element processors, the memory control unit 11 sends the signal lines 103, 1 to the memory access control unit 13 and the data transfer control unit 15 between the element processors.
The transfer start is notified via the command line 05.

【００２４】さらに、命令制御部１１には、要素プロセ
ッサ間データ転送制御部１６から信号線１１２を介して
送られる通知によって要素プロセッサ間データ転送起動
から転送終了までの期間を認識する機構が設けられ、命
令制御部１１は、その状態をキャッシュ制御部１２に対
して信号線１０４を介して通知する。Further, the instruction control unit 11 is provided with a mechanism for recognizing a period from the start of the data transfer between the element processors to the end of the data transfer by the notification transmitted from the data transfer control unit 16 between the element processors via the signal line 112. , The instruction control unit 11 notifies the cache control unit 12 of the state via the signal line 104.

【００２５】キャッシュ制御部１２は、命令実行制御部
１１からのメモリアクセス命令の処理を行い、もしキャ
ッシュミスが発生した場合は、メモリアクセス制御部１
３に対して信号線１１０によって指示する。また、要素
プロセッサ間データ転送処理中は、要素プロセッサ間デ
ータ転送制御部１５からキャッシュ制御部１２に対し
て、同一要素プロセッサ１０内のローカルメモリ１４に
対する書き込みが行われるアドレスが信号線１１５によ
って通知され、それがキャッシュメモリ内に保持されて
いる場合、そのエントリにマークが設定される。さら
に、命令制御部１１からの指示によってキャッシュ無効
化命令が発行された場合、マークされたエントリのみを
無効化する処理が行なわれる。The cache control unit 12 processes the memory access instruction from the instruction execution control unit 11, and if a cache miss occurs, the memory access control unit 1
3 is indicated by a signal line 110. Also, during the data transfer processing between the element processors, the address at which the writing to the local memory 14 in the same element processor 10 is performed is notified from the data transfer control section 15 between the element processors to the cache control section 12 by the signal line 115. , If it is held in the cache memory, the entry is marked. Further, when a cache invalidation instruction is issued according to an instruction from the instruction control unit 11, a process of invalidating only the marked entry is performed.

【００２６】メモリアクセス制御部１３は、信号線１１
０を介してキャッシュ制御部１２からキャッシュミスの
際のキャッシュデータ要求が通知された場合、同一要素
プロセッサ内のローカルメモリ１４に対して、信号線１
１１を介して要求されたデータのアクセスを行う。ま
た、要素プロセッサ間データ転送処理中である場合は、
キャッシュミス時のキャッシュデータ要求を要素プロセ
ッサ間データ転送処理が終了するまで保留する。その
時、要素プロセッサ間データ転送処理の開始は信号線１
０３によって、終了は信号線１１３によってメモリアク
セス制御部１３に通知される。The memory access control unit 13 is connected to the signal line 11
When a cache data request in the event of a cache miss is notified from the cache control unit 12 via the command line 0 to the local memory 14 in the same element processor, the signal line 1
11 to access the requested data. If the data transfer between the element processors is in progress,
The cache data request at the time of a cache miss is suspended until the data transfer processing between the element processors is completed. At this time, the start of the data transfer process between the element processors is performed on the signal line 1.
03, the end is notified to the memory access control unit 13 via the signal line 113.

【００２７】要素プロセッサ間データ転送制御部１５
は、命令制御部１１から信号線１０５を介して送られる
起動指示を受取ることにより、信号線１２０，１２１を
介して要素プロセッサ間、すなわち自身のローカルメモ
リ１５と他の要素プロセッサのローカルメモリ２０間に
おけるデータ転送の処理を実行する。その時、要素プロ
セッサ間データ転送制御部１５は、同一要素プロセッサ
内のローカルメモリ１４に対してデータの書き込みを行
う場合、信号線１１５を介して書込みを行なうデータの
アドレスをキャッシュ制御部１２に送出する。要素プロ
セッサ間データ転送処理が終了した場合は、信号線１１
２，１１３を介して命令制御部１１とメモリアクセス制
御部１３に転送終了が通知される。Data transfer control unit 15 between element processors
Receives an activation instruction sent from the instruction control unit 11 via the signal line 105, and thereby, between the element processors via the signal lines 120 and 121, that is, between the local memory 15 of itself and the local memory 20 of another element processor. Of the data transfer in. At this time, when writing data to the local memory 14 in the same element processor, the inter-element processor data transfer control unit 15 sends the address of the data to be written to the cache control unit 12 via the signal line 115. . When the data transfer process between the element processors is completed, the signal line 11
2, 113, the command control unit 11 and the memory access control unit 13 are notified of the end of the transfer.

【００２８】図２は、本実施例の特徴であるキャッシュ
制御部１２の構成例を示すブロック図である。図２にお
いては、キャッシュ制御部１２は、４ウェイのキャッシ
ュメモリのアクセス制御を行なう構成としている。な
お、図２においては、４つのアドレスアレイのうち、一
つの構成のみを示し、他のアドレスアレイの詳細につい
ては全く同一の構成であるため省略している。FIG. 2 is a block diagram showing a configuration example of the cache control unit 12, which is a feature of this embodiment. In FIG. 2, the cache control unit 12 is configured to control access to a 4-way cache memory. In FIG. 2, only one configuration of the four address arrays is shown, and details of the other address arrays are omitted because they have exactly the same configuration.

【００２９】図２において、２０１は命令制御部１１か
らのアクセス指示によって送られるアドレスまたは要素
プロセッサ間データ転送制御部１６から信号線１１５を
介して送られるアドレス２５１が格納されるキャッシュ
ブロック・アドレスレジスタであり、２０２はキャッシ
ュブロック・アドレスレジスタ２０１に格納されている
アドレスの上位アドレスの値を１サイクル毎にインクリ
メントするカウンタである。In FIG. 2, reference numeral 201 denotes a cache block address register in which an address sent by an access instruction from the instruction control unit 11 or an address 251 sent from the data transfer control unit 16 between the element processors via the signal line 115 is stored. Numeral 202 is a counter for incrementing the value of the upper address of the address stored in the cache block address register 201 every cycle.

【００３０】２１０はアドレスアレイ、２１１は要素プ
ロセッサ間データ転送制御部１６から送られるアドレス
が属するエントリがキャッシュメモリ内に存在するかど
うかを示す属性情報を設定する属性フィールド、２１２
はキャッシュメモリのデータが有効かどうかを示す有効
ビットである。Reference numeral 210 denotes an address array; 211, an attribute field for setting attribute information indicating whether an entry to which the address transmitted from the inter-processor data transfer control unit 16 belongs exists in the cache memory;
Is a valid bit indicating whether data in the cache memory is valid.

【００３１】また、２１３は命令制御部１１で実行され
る要素プロセッサ間のデータ転送起動命令に伴って送ら
れるデータ転送命令信号２６０とキャッシュ無効化命令
に伴って送られるキャッシュクリア信号２６１のインバ
ータ２１５の出力との論理積をとるＡＮＤ回路、２１４
はキャッシュクリア信号２６１と属性フィールド２１１
の値との否定的論理積をとるＮＡＮＤ回路である。属性
フィールド２１１には、ＡＮＤ回路２１３の出力である
論理値“１”または“０”が属性情報として設定され
る。また、有効ビット２１２には、ＮＡＮＤ回路２１４
の出力である論理値“１”または“０”が設定される。
属性フィールド２１１の値は、“１”の時に無効化対象
であることを示している。また、有効ビット２１２の値
は、“１”の時に有効であることを示している。Reference numeral 213 denotes an inverter 215 of a data transfer instruction signal 260 sent in response to a data transfer start instruction between element processors executed by the instruction control unit 11 and a cache clear signal 261 sent in conjunction with a cache invalidation instruction. AND circuit that takes the logical product with the output of
Is the cache clear signal 261 and the attribute field 211
Is a NAND circuit that performs a NAND operation with the value of? In the attribute field 211, a logical value “1” or “0” output from the AND circuit 213 is set as attribute information. The valid bit 212 includes a NAND circuit 214
Is set as a logical value "1" or "0".
When the value of the attribute field 211 is “1”, it indicates that the object is to be invalidated. Also, the value of the valid bit 212 indicates that it is valid when it is "1".

【００３２】２１６はキャッシュブロック・アドレスレ
ジスタ２０１の下位アドレス２５２とアドレスアレイ２
１０のアドレスを比較するアドレス比較器、２１７はア
ドレス比較器２１６の出力と有効ビット２１２との論理
積をとるＡＮＤ回路、２１８はデータ転送命令信号２６
０とＡＮＤ回路２１７の出力との論理積をとるＡＮＤ回
路、２１９はキャッシュクリア信号２６１と属性フィー
ルド２１１の値との論理積をとるＡＮＤ回路、２２０は
ＡＮＤ回路２１８，２１９の出力の論理和をとるＯＲ回
路である。ＯＲ回路２２０の出力によってアドレスアレ
イ２１０の書込み指示ＷＥのタイミングが設定される。Reference numeral 216 denotes the lower address 252 of the cache block address register 201 and the address array 2
An address comparator 217 for comparing the addresses of the ten addresses is an AND circuit for calculating the logical product of the output of the address comparator 216 and the valid bit 212, and 218 is a data transfer command signal 26.
An AND circuit for calculating the logical product of 0 and the output of the AND circuit 217, 219 is an AND circuit for calculating the logical product of the cache clear signal 261 and the value of the attribute field 211, and 220 is the logical sum of the outputs of the AND circuits 218 and 219. This is an OR circuit. The timing of the write instruction WE of the address array 210 is set by the output of the OR circuit 220.

【００３３】ここで、図２を参照して要素プロセッサ間
データ転送時とキャッシュ無効化命令時の動作について
説明する。まず、要素プロセッサ間データ転送時の動作
について説明する。Here, the operation at the time of data transfer between element processors and at the time of a cache invalidation instruction will be described with reference to FIG. First, the operation during data transfer between element processors will be described.

【００３４】要素プロセッサ間データ転送時には、要素
プロセッサ間データ転送制御部１６から信号線１１５を
介してローカルメモリ１５に対するデータの書込みアド
レス２５１が送られ、命令制御部１１からその有効タイ
ミングを示す転送命令信号２６０が送られてくる。アド
レス２５１は、キャッシュブロック・アドレスレジスタ
２０１に格納される。転送命令信号２６０は、ＡＮＤ回
路２１３に入力する。この時、要素プロセッサ間データ
転送時にはキャッシュクリア信号２６１が論理値“０”
であるので、ＡＮＤ回路２１３の出力が論理値“１”と
なる。At the time of data transfer between the element processors, a data write address 251 for the local memory 15 is sent from the data transfer control section 16 between the element processors via the signal line 115, and a transfer instruction indicating the effective timing is transmitted from the instruction control section 11. A signal 260 is sent. The address 251 is stored in the cache block address register 201. The transfer command signal 260 is input to the AND circuit 213. At this time, at the time of data transfer between the element processors, the cache clear signal 261 has the logical value “0”.
Therefore, the output of the AND circuit 213 has the logical value “1”.

【００３５】この時、キャッシュブロック・アドレスレ
ジスタ２０１の下位アドレス２５２とアドレスアレイ２
１０のアドレスとがアドレス比較器２１６によって比較
されることにより、要素プロセッサ間データ転送処理に
よって転送されたデータのアドレスがキャッシュメモリ
内に存在するかを判別する。At this time, the lower address 252 of the cache block address register 201 and the address array 2
By comparing the ten addresses with the address comparator 216, it is determined whether the address of the data transferred by the data transfer processing between the element processors exists in the cache memory.

【００３６】要素プロセッサ間データ転送処理によって
転送されたデータのアドレスがキャッシュメモリ内に存
在する場合、アドレス比較器２１６からアドレス一致を
示す“１”がＡＮＤ回路２１７に出力される。ここで、
ＡＮＤ回路２１７の他方の入力には、キャッシュブロッ
ク・アドレスレジスタ２０１の上位アドレス２５３で指
定されるアドレスに対応するアドレスアレイ２１０の有
効ビット２１２が入力している。従って、有効ビット２
１２が有効を示す“１”であれば、ＡＮＤ回路２１７か
ら論理値“１”が出力される。When the address of the data transferred by the data transfer processing between the element processors exists in the cache memory, “1” indicating the address match is output from the address comparator 216 to the AND circuit 217. here,
To the other input of the AND circuit 217, the valid bit 212 of the address array 210 corresponding to the address specified by the upper address 253 of the cache block address register 201 is input. Therefore, valid bit 2
If 12 is “1” indicating validity, the AND circuit 217 outputs a logical value “1”.

【００３７】そして、転送命令信号２６０とＡＮＤ回路
２１７の出力を入力するＡＮＤ回路２１８の出力が論理
値“１”となるので、アドレスアレイ２１０の書込み指
示ＷＥがライトイネーブル状態となる。これにより、キ
ャッシュブロック・アドレスレジスタ２０１の上位アド
レス２５３で指定されるアドレスに対応するアドレスア
レイ２１０の属性フィールド２１１にＡＮＤ回路２１３
の出力“１”がキャッシュ無効化対象を示すマークとし
てセットされる。以下、同様にして要素プロセッサ間デ
ータ転送制御部１６から送られるデータの書込みアドレ
ス２５１の全てについて上記処理を行なう。これによ
り、要素プロセッサ間のデータ転送においてローカルメ
モリに書き込まれるデータであって、キャッシュされて
いるデータについてアドレスアレイ２１０の属性フィー
ルド２１１にマーク“１”が設定される。Then, the output of the AND circuit 218, which receives the transfer command signal 260 and the output of the AND circuit 217, becomes a logical value "1", so that the write instruction WE of the address array 210 is in a write enable state. As a result, the AND circuit 213 is added to the attribute field 211 of the address array 210 corresponding to the address specified by the upper address 253 of the cache block address register 201.
Is set as a mark indicating a cache invalidation target. Hereinafter, similarly, the above processing is performed for all the write addresses 251 of the data sent from the inter-element-processor data transfer control unit 16. As a result, the mark “1” is set in the attribute field 211 of the address array 210 for the data to be written in the local memory in the data transfer between the element processors and the cached data.

【００３８】次に、キャッシュ無効化命令時の動作につ
いて図３のフローチャートを参照して説明する。まず、
命令制御部１１において要素プロセッサ間のデータ転送
終了待ち命令が実行された後、すなわち要素プロセッサ
間データ転送が終了した後に、命令制御部１１において
キャッシュ無効化命令が実行され、それによってキャッ
シュクリア信号２６１が送られる。この時、キャッシュ
無効化命令によってキャッシュブロック・アドレスレジ
スタ２０１の上位アドレスに全て“０”がセットされ
る。Next, the operation at the time of the cache invalidation instruction will be described with reference to the flowchart of FIG. First,
After the instruction control unit 11 executes the instruction for waiting for the end of the data transfer between the element processors, that is, after the data transfer between the element processors is completed, the instruction control unit 11 executes the cache invalidation instruction. Is sent. At this time, all “0” s are set to the upper address of the cache block address register 201 by the cache invalidation instruction.

【００３９】キャッシュ無効化命令が実行されると、ま
ず、アドレスアレイ２１０の先頭アドレスから順に、有
効ビット２１２が有効“１”であるかどうかを判別する
（ステップ３０１）。ここで、有効ビット２１２が有効
でなければ、キャッシュブロック・アドレスレジスタ２
０１の上位アドレスを１サイクル毎にカウンタ２０２２
でインクリメントし（ステップ３０５）、アドレスアレ
イ２１０の全てのアドレスをアクセスするまで、ステッ
プ３０１からの処理を繰り返す（ステップ３０６）。When the cache invalidation instruction is executed, first, it is determined whether the valid bit 212 is valid "1" in order from the head address of the address array 210 (step 301). Here, if the valid bit 212 is not valid, the cache block address register 2
01 is stored in the counter 2022 every cycle.
(Step 305), and the processing from step 301 is repeated until all addresses of the address array 210 are accessed (step 306).

【００４０】ステップ３０１で、有効ビット２１２が有
効であれば、属性フィールド２１１がキャッシュ無効化
対象となっているか、すなわちマーク“１”がセットさ
れているかが判別され（ステップ３０２）、キャッシュ
無効化対象となっていれば、上位アドレス２５３に対応
するアドレスアレイ２１０の有効ビット２１２がリセッ
トされる（ステップ３０３）。この判別は、図２におい
て、キャッシュクリア信号２６１と属性フィールド２１
１の値を入力するＮＡＮＤ回路２１４によって行なわれ
る。すなわち、ここでは、キャッシュクリア信号２６１
が“１”なので、属性フィールド２１１の値が“１”で
あれば、ＮＡＮＤ回路２１４から“０”が出力されると
共に、同様に、キャッシュクリア信号２６１と属性フィ
ールド２１１の値を入力するＡＮＤ回路２１９から
“１”が出力されアドレスアレイ２１０の書込み指示Ｗ
Ｅがライトイネーブル状態となる。よって、上位アドレ
ス２５３に対応するアドレスアレイ２１０の有効ビット
２１２に“０”が設定されて有効ビット２１２のリセッ
トが行なわれる。In step 301, if the validity bit 212 is valid, it is determined whether the attribute field 211 is to be invalidated, that is, whether the mark "1" is set (step 302). If so, the valid bit 212 of the address array 210 corresponding to the upper address 253 is reset (step 303). This determination is made by comparing the cache clear signal 261 and the attribute field 21 in FIG.
This is performed by the NAND circuit 214 that inputs a value of 1. That is, here, the cache clear signal 261
Is “1”, and if the value of the attribute field 211 is “1”, “0” is output from the NAND circuit 214 and similarly, an AND circuit that inputs the cache clear signal 261 and the value of the attribute field 211 “1” is output from 219 and the write instruction W of the address array 210 is output.
E becomes the write enable state. Therefore, “0” is set to the valid bit 212 of the address array 210 corresponding to the upper address 253, and the valid bit 212 is reset.

【００４１】同時に、ＡＮＤ回路２１３には、転送命令
信号２６０の“０”とキャッシュクリア信号２６１の反
転値“０”が入力されるため、ＡＮＤ回路２１３から
“０”が出力されているので、上位アドレス２５３に対
応するアドレスアレイ２１０の属性フィールド２１１に
“０”が設定される（ステップ３０４）。At the same time, since "0" of the transfer command signal 260 and the inverted value "0" of the cache clear signal 261 are input to the AND circuit 213, "0" is output from the AND circuit 213. “0” is set in the attribute field 211 of the address array 210 corresponding to the upper address 253 (step 304).

【００４２】この動作をアドレスアレイの全エントリに
ついて行う（ステップ３０６）。これによって、要素プ
ロセッサ間データ転送時にマークされた部分のキャッシ
ュメモリの内容が無効化される。従って、データ転送時
にローカルメモリ１４に書き込まれたデータについて
は、キャッシュメモリの内容が無効化されるので、デー
タの正当性が補償される。This operation is performed for all entries in the address array (step 306). As a result, the contents of the cache memory marked at the time of data transfer between element processors are invalidated. Therefore, for data written to the local memory 14 at the time of data transfer, the contents of the cache memory are invalidated, and the validity of the data is compensated.

【００４３】図４は、本発明による効果があるプログラ
ムの内容例について説明した図である。図４において、
まず、要素プロセッサ間データ転送処理を起動する命令
が実行され、要素プロセッサ間データ転送制御部１５
が起動する。FIG. 4 is a diagram for explaining an example of the contents of a program having an effect according to the present invention. In FIG.
First, an instruction for activating the data transfer processing between the element processors is executed, and the data transfer control section 15 between the element processors is executed.
Starts.

【００４４】次に、要素プロセッサ間データ転送処理の
実行中に実行すべきプログラム処理ブロックが存在す
る。このプログラム処理ブロックの中にローカルメモ
リ１４からデータを読み込むロード命令が存在しても、
キャッシュにヒットしていれば、その時実行中の要素プ
ロセッサ間データ転送処理によってキャッシング処理が
不正となることはない。その後、要素プロセッサ間デー
タ転送処理待ち命令とキャッシュ無効化命令が実行
される。Next, there is a program processing block to be executed during execution of the data transfer processing between the element processors. Even if a load instruction for reading data from the local memory 14 exists in this program processing block,
If there is a hit in the cache, the caching process does not become illegal due to the data transfer process between the element processors being executed at that time. Thereafter, an instruction for waiting for data transfer processing between element processors and a cache invalidation instruction are executed.

【００４５】キャッシュ無効化命令では、プログラム
処理ブロックを実行している際に、要素プロセッサ間
データ転送処理によってローカルメモリ１４に対して書
き込みが発生したキャッシュメモリのエントリに対して
のみ上述したような無効化を行うことによって、プログ
ラム処理ブロックの後のプログラム処理ブロックに
対しても処理の正当性を補償することができる。In the cache invalidation instruction, when executing the program processing block, the above invalidation is performed only on the cache memory entry in which the local memory 14 is written by the data transfer processing between the element processors. By performing the conversion, the legitimacy of the processing can be compensated for the program processing block after the program processing block.

【００４６】図５は、図４のプログラムの動作シーケン
スについて説明した図である。要素プロセッサ間データ
転送起動命令とプログラム処理ブロックまではスム
ーズに実行されるが、要素プロセッサ間データ転送待ち
命令を実行する場合、要素プロセッサ間データ転送処
理が終了するまでその命令の実行指示が延期される。こ
のことで、要素プロセッサ間データ転送起動処理／転送
処理と先行するプログラム処理をオーバーラップさせる
ことができるため、分散メモリ型並列処理での性能向上
の大きな障害となる要素プロセッサ間要素プロセッサ間
データ転送処理を隠蔽することができ処理効率の向上が
実現する。FIG. 5 is a diagram for explaining the operation sequence of the program in FIG. The execution of the data transfer start instruction between the element processors and the program processing block are executed smoothly, but when the data transfer waiting instruction between the element processors is executed, the execution instruction of the instruction is postponed until the data transfer processing between the element processors is completed. You. This makes it possible to overlap the inter-element processor data transfer start processing / transfer processing with the preceding program processing, so that inter-element processor element-to-element data transfer becomes a major obstacle to performance improvement in distributed memory type parallel processing. Processing can be concealed and processing efficiency can be improved.

【００４７】図６は、要素プロセッサ間データ転送処理
と並行して実行されるプログラム処理単位の中にキャ
ッシュミスを発生するメモリアクセス命令が存在する場
合の処理について示したものである。キャッシュミスを
起こすデータロード命令（ＬＤ命令）−３が存在する
こと以外は、図４のプログラムと同様である。FIG. 6 shows a process in the case where a memory access instruction causing a cache miss exists in a program processing unit executed in parallel with the data transfer process between element processors. This is the same as the program in FIG. 4 except that a data load instruction (LD instruction) -3 causing a cache miss exists.

【００４８】図７は、図６のプログラムの動作シーケン
スについて説明した図である。キャッシュミスを起こし
たＬＤ命令−３については、要素プロセッサ間データ
転送処理が終了するまでローカルメモリ１４へのアクセ
ス指示がメモリアクセス制御部１３でホールドされ、こ
れによって不正動作を回避する。FIG. 7 is a diagram for explaining the operation sequence of the program of FIG. With respect to the LD instruction-3 in which a cache miss has occurred, an access instruction to the local memory 14 is held by the memory access control unit 13 until the data transfer processing between the element processors is completed, thereby avoiding an illegal operation.

【００４９】図８は、本発明によるキャッシュ制御方式
を適用した分散メモリ型マルチプロセッサの第２実施例
の構成を示すブロック図である。この第２の実施例で
は、主記憶装置８１と拡張記憶装置８２間におけるデー
タ転送を行なう分散メモリ型マルチプロセッサを示して
いる。図１の第１実施例と比較して分かるように、転送
元のメモリが他の要素プロセッサのローカルメモリ２０
であるか、拡張記憶装置８２であるかの違いだけであ
り、その他の構成及び動作については第１の実施例と全
く同一であるので、図１と共通の符号を付して詳細な説
明を省略する。キャッシュ制御部１２についても図２の
構成と同じである。以上好ましい実施例をあげて本発明
を説明したが、本発明は必ずしも上記実施例に限定され
るものではない。FIG. 8 is a block diagram showing the configuration of a second embodiment of the distributed memory type multiprocessor to which the cache control system according to the present invention is applied. In the second embodiment, a distributed memory type multiprocessor for transferring data between the main storage device 81 and the extended storage device 82 is shown. As can be seen from comparison with the first embodiment of FIG. 1, the transfer source memory is the local memory 20 of another element processor.
Or the expansion storage device 82, and the other configurations and operations are completely the same as those of the first embodiment. Therefore, the same reference numerals as those in FIG. Omitted. The configuration of the cache control unit 12 is the same as that of FIG. Although the present invention has been described with reference to the preferred embodiments, the present invention is not necessarily limited to the above embodiments.

【００５０】[0050]

【発明の効果】以上説明したように、本発明の分散メモ
リ型マルチプロセッサのキャッシュ制御システムによれ
ば、性能向上を妨げる最大の要因である要素プロセッサ
間転送処理とオーバーラップして別のメモリアクセス処
理を実行するようにしたので、要素プロセッサ間のデー
タ転送処理を隠蔽することができ、かつ、従来のソフト
ウェアベースのキャッシュ制御の欠点である無駄なキャ
ッシュ無効化を回避することができる。これにより、分
散メモリ型マルチプロセッサにおける性能の向上を実現
することができる。As described above, according to the cache control system of the distributed memory type multiprocessor of the present invention, another memory access is overlapped with the inter-element processor transfer processing which is the biggest factor preventing the performance improvement. Since the processing is executed, the data transfer processing between the element processors can be concealed, and useless cache invalidation which is a drawback of the conventional software-based cache control can be avoided. This makes it possible to improve the performance of the distributed memory multiprocessor.

【００５１】また、要素プロセッサ間のデータ転送終了
後に、キャッシュメモリのうち属性情報を設定したエン
トリのみを無効化するため、データ転送終了後の処理に
おいても処理の正当性を補償することができる。After the data transfer between the element processors is completed, only the entry in the cache memory in which the attribute information is set is invalidated, so that the validity of the process can be compensated for in the process after the data transfer.

[Brief description of the drawings]

【図１】本発明によるキャッシュ制御方式を適用した
分散メモリ型マルチプロセッサの第１実施例の構成を示
すブロック図である。FIG. 1 is a block diagram showing a configuration of a first embodiment of a distributed memory type multiprocessor to which a cache control system according to the present invention is applied.

【図２】本実施例の特徴であるキャッシュ制御部の構
成例を示すブロック図である。FIG. 2 is a block diagram illustrating a configuration example of a cache control unit which is a feature of the present embodiment.

【図３】キャッシュ制御部におけるキャッシュ無効化
命令時の動作を説明するフローチャートである。FIG. 3 is a flowchart illustrating an operation of the cache control unit when a cache invalidation instruction is issued.

【図４】本発明による効果があるプログラムの内容例
について説明した図である。FIG. 4 is a diagram illustrating an example of the contents of a program that has an effect according to the present invention.

【図５】図４に示すプログラムの動作シーケンスにつ
いて説明した図である。FIG. 5 is a diagram illustrating an operation sequence of the program shown in FIG. 4;

【図６】要素プロセッサ間データ転送処理と並行して
実行されるプログラム処理の中にキャッシュミスを発生
するメモリアクセス命令が存在する場合のプログラムの
内容例を説明した図である。FIG. 6 is a diagram illustrating an example of the contents of a program when a memory access instruction causing a cache miss exists in a program process executed in parallel with the data transfer process between element processors.

【図７】図６に示すプログラムの動作シーケンスにつ
いて説明した図である。FIG. 7 is a diagram illustrating an operation sequence of the program shown in FIG. 6;

【図８】本発明によるキャッシュ制御方式を適用した
分散メモリ型マルチプロセッサの第２実施例の構成を示
すブロック図である。FIG. 8 is a block diagram showing a configuration of a second embodiment of the distributed memory type multiprocessor to which the cache control method according to the present invention is applied.

[Explanation of symbols]

１０要素プロセッサ１１命令制御部１２キャッシュ制御部１３メモリアクセス制御部１４，２０ローカルメモリ１５要素プロセッサ間データ転送制御部２０１キャッシュブロック・アドレスレジスタ２０２カウンタ２１０アドレスアレイ２１１属性フィールド２１２有効ビット２１３，２１７，２１８，２１９ＡＮＤ回路２１４ＮＡＮＤ回路２１５インバータ２１６アドレス比較器２２０ＯＲ回路 Reference Signs List 10 element processor 11 instruction control unit 12 cache control unit 13 memory access control unit 14, 20 local memory 15 data transfer control unit between element processors 201 cache block address register 202 counter 210 address array 211 attribute field 212 valid bits 213, 217, 218, 219 AND circuit 214 NAND circuit 215 Inverter 216 Address comparator 220 OR circuit

Claims

(57) [Claims]

1. A distributed memory multiprocessor having a plurality of element processors each combining a CPU and a memory and having a data transfer function between the element processors, wherein the element processors store fixed-length block data. A cache memory comprising an attribute field for setting attribute information indicating an attribute of data for each entry, comprising: a plurality of entries to perform the data transfer between the element processors; The data transfer control means for sending the written address to the cache memory when the data is written in the cache memory; and the cache matching the write address from the data transfer control means between the element processors. Memory entry Cache control means for setting the attribute information in the attribute field; executing an access instruction to the memory during data transfer processing between the element processors; A cache control system for a distributed memory type multiprocessor, comprising: an instruction execution unit for executing an invalid instruction for invalidating an entry of the cache memory in which attribute information is set.

2. The method according to claim 1, wherein the instruction execution unit includes: an instruction for instructing activation of data transfer between the element processors; and an instruction for waiting for completion of the data transfer.
2. The cache control system for a distributed memory multiprocessor according to claim 1, wherein an executable access instruction is executed during the data transfer.

3. The instruction execution unit according to claim 1, wherein the instruction execution unit delays execution of the access instruction having caused a cache miss until the data transfer ends during data transfer between the element processors. A cache control system for a distributed memory multiprocessor.

4. The cache control unit includes: an address comparison unit that compares the write address from the element processor data transfer control unit with an address of the cache memory; and an address match signal from the address comparison unit. 2. The cache control system according to claim 1, further comprising a setting unit configured to set the attribute information in the attribute field of the entry of the cache memory.

5. The instruction execution unit notifies the cache control unit that data is being transferred between the element processors, and the cache control unit has the write address from the element processor data transfer control unit. Address comparing means for comparing the address of the cache memory with the address of the cache memory, and the attribute field of the entry of the cache memory based on the address match signal from the address comparing means and the notification during the data transfer from the instruction executing means. 2. The apparatus according to claim 1, further comprising: a setting unit configured to set information; and a unit configured to invalidate an entry of the cache memory in which the attribute information is set in the attribute field based on an invalid instruction from the instruction executing unit. A cache control system for a distributed memory type multiprocessor as described in the above.