WO2015040711A1

WO2015040711A1 - Storage device, method for controlling data in storage device, and storage system

Info

Publication number: WO2015040711A1
Application number: PCT/JP2013/075233
Authority: WO
Inventors: 孝章佐々木; 英寿有川; 直光田代
Original assignee: Hitachi Ltd; Hitachi Information and Telecommunication Engineering Ltd
Current assignee: Hitachi Ltd; Hitachi Information and Telecommunication Engineering Ltd
Priority date: 2013-09-19
Filing date: 2013-09-19
Publication date: 2015-03-26
Anticipated expiration: 2016-03-19

Abstract

This storage device is provided with a recording device and a controller connected to the recording device and a computer. The controller receives a plurality of sub-blocks obtained on the basis of the splitting of a block of data from the computer and stores same in the recording device, and records storage information associating the block and the stored plurality of sub-blocks. The controller receives subject information indicating a sub-block within the block from the computer, and on the basis of the storage information, identifies the block to which the sub-block indicated by the subject information belongs, and on the basis of the storage information, generates redundant information indicating the stored plurality of sub-blocks and transmits same to the computer. The controller receives from the computer an updated sub-block that is a sub-block not indicated by the redundant information among the plurality of sub-blocks obtained by means of splitting the updated data of the block.

Description

Storage device, data control method in storage device, and storage system

　本発明は、データを記憶デバイスに格納するためのデータの制御に関する。 The present invention relates to control of data for storing data in a storage device.

　計算機システムは例えば、ストレージ装置と、通信ネットワーク経由でストレージ装置に接続されるホスト計算機等の計算機とを有する。ストレージ装置は、データを記憶する記憶デバイスとして、例えば複数のＨＤＤ（Hard　Disk　Drive）を備える。 The computer system includes, for example, a storage device and a computer such as a host computer connected to the storage device via a communication network. The storage device includes, for example, a plurality of HDDs (Hard Disk Drives) as storage devices for storing data.

　コスト低減のため、計算機システムは、記憶デバイスにデータを格納する際に、データ量の削減処理を実行する。データ量の削減処理には、例えば、ファイル圧縮処理（Compression）、重複排除処理（De-duplication）がある。 In order to reduce costs, the computer system executes a data amount reduction process when storing data in the storage device. Data amount reduction processing includes, for example, file compression processing (Compression) and deduplication processing (De-duplication).

　ファイル圧縮処理は、１ファイル内で同一内容のデータセグメントを集約することにより、データ容量を削減する。一方、重複排除処理は、１ファイル内だけでなく、ファイル間で検出される同一内容のデータセグメントを集約することにより、データ容量を削減する。以下、重複排除処理の単位となる単位データを「チャンク（Chunk）」という。ファイル File compression processing reduces the data capacity by aggregating data segments with the same content in one file. On the other hand, deduplication processing reduces the data capacity by aggregating data segments of the same content detected not only within one file but also between files. Hereinafter, unit data as a unit of deduplication processing is referred to as “chunk”.

　特許文献１、２には、ホスト計算機から通信ネットワーク経由でストレージ装置にコンテンツを格納する場合、ホスト計算機が事前に各チャンクがストレージ装置に格納されているかどうかの重複判定をし、格納されていないチャンクのみをストレージ装置に送信することが記載されている。 In Patent Documents 1 and 2, when content is stored in a storage device from a host computer via a communication network, the host computer determines whether each chunk is stored in the storage device in advance, and is not stored. It is described that only chunks are transmitted to the storage apparatus.

米国特許第５９９０８１０号明細書US Pat. No. 5,990,810 国際公開第ＷＯ２０１２／１０１６７４号International Publication No. WO2012 / 101674

　重複判定は、ホスト計算機で行われる場合と、ストレージ装置で行われる場合がある。 Duplicate determination may be performed by the host computer or the storage device.

　例えば、特許文献１には、ストレージ装置が重複判定を行い、ホスト計算機はストレージ装置から重複判定の結果を取得し、新規チャンクのみをストレージ装置に格納する方法が記載されている。 For example, Patent Document 1 describes a method in which a storage apparatus performs duplication determination, the host computer acquires the duplication determination result from the storage apparatus, and stores only new chunks in the storage apparatus.

　しかし、特許文献１の重複排除処理では、全てのチャンクについて、ホスト計算機がストレージ装置に重複判定の問い合わせを行う。そのため、ホスト計算機は、重複判定に必要な情報をストレージ装置に送信し、ストレージ装置から重複判定の結果を受信することが必要となる。これは、ホスト計算機のみで重複判定を行う方式と比較して、ホスト計算機とストレージ装置との間の通信のラウンドトリップ（Roundtrip）分だけ性能が低下する。 However, in the deduplication process of Patent Document 1, the host computer makes an inquiry for duplication determination to the storage apparatus for all chunks. For this reason, the host computer needs to transmit information necessary for duplication determination to the storage apparatus and receive the duplication determination result from the storage apparatus. Compared with a method in which duplication determination is performed only by the host computer, the performance is reduced by the round trip of the communication between the host computer and the storage apparatus.

　また、特許文献２には、ホスト計算機で重複判定を行い、その結果に従って、ストレージ装置に格納されていないチャンクのみをストレージ装置に送信する方法が記載されている。 Patent Document 2 describes a method of performing duplication determination by a host computer and transmitting only chunks not stored in the storage apparatus to the storage apparatus according to the result.

　しかし、特許文献２の重複排除処理では、ストレージ装置に格納されている全データに関して重複判定に用いる情報をホスト計算機に保存する必要がある。このような重複排除処理では、例えば格納するデータが大きくなるにつれて、重複判定に用いる情報のサイズも多くなるため、ホスト計算機のディスク容量を圧迫してしまう。また、重複判定に用いる情報の数が多くなり、その中から必要な情報を選択しなければならなくなるため、重複排除処理が非効率になる場合がある。 However, in the deduplication processing of Patent Document 2, it is necessary to save information used for duplication determination on all data stored in the storage device in the host computer. In such deduplication processing, for example, as the data to be stored increases, the size of information used for duplication determination also increases, and therefore the disk capacity of the host computer is compressed. In addition, since the number of pieces of information used for duplication determination increases and necessary information must be selected from among them, deduplication processing may become inefficient.

　ストレージ装置は、記憶デバイスと、記憶デバイスおよび計算機に接続されたコントローラとを備える。コントローラは、データのブロックの分割に基づいて得られる複数のサブブロックを、計算機から受信して記憶デバイスへ格納し、ブロックと、格納された複数のサブブロックとを関連づける格納情報を記憶する。コントローラは、計算機から前記ブロック内のサブブロックを示す対象情報を受信し、格納情報に基づいて、対象情報に示されているサブブロックが属するブロックを特定し、格納情報に基づいて、格納された複数のサブブロックを示す重複情報を生成して計算機に送信する。コントローラは、ブロックの更新データを分割することにより得られる複数のサブブロックの中で重複情報に示されていないサブブロックである更新サブブロックを計算機から受信する。 The storage apparatus includes a storage device and a controller connected to the storage device and the computer. The controller receives a plurality of sub-blocks obtained based on the division of the block of data from the computer and stores them in the storage device, and stores storage information that associates the blocks with the plurality of stored sub-blocks. The controller receives the target information indicating the sub-block in the block from the computer, identifies the block to which the sub-block indicated in the target information belongs based on the stored information, and stored based on the stored information Duplicate information indicating a plurality of sub-blocks is generated and transmitted to the computer. The controller receives, from the computer, an update subblock that is a subblock that is not indicated in the duplication information among a plurality of subblocks obtained by dividing the update data of the block.

図１は、実施例に係るストレージシステムを示す。FIG. 1 illustrates a storage system according to an embodiment. 図２は、実施例に係るソフトウェアの構成例を示す。FIG. 2 shows a configuration example of software according to the embodiment. 図３は、チャンク索引テーブル１３００の一例を示す。FIG. 3 shows an example of the chunk index table 1300. 図４は、重複情報リスト１４００の一例を示す。FIG. 4 shows an example of the duplicate information list 1400. 図５は、コンテンツ管理テーブル１５００の一例を示す。FIG. 5 shows an example of the content management table 1500. 図６は、チャンクデータセット索引テーブル１６００の一例を示す。FIG. 6 shows an example of the chunk data set index table 1600. 図７は、実施例に係る重複排除処理を説明するための模式図である。FIG. 7 is a schematic diagram for explaining deduplication processing according to the embodiment. 図８は、ストレージシステムにおけるデータ構造の一例を示す。FIG. 8 shows an example of a data structure in the storage system. 図９は、実施例に係るバックアップ処理の一例を示す。FIG. 9 illustrates an example of backup processing according to the embodiment. 図１０は、情報作成処理の詳細を示す。FIG. 10 shows details of the information creation process.

　以下、一実施例を説明する。 An example will be described below.

　なお、以下の説明では、「ｘｘｘテーブル」の表現にて各種情報を説明することがあるが、各種情報は、テーブル以外のデータ構造で表現されていてもよい。データ構造に依存しないことを示すために「ｘｘｘテーブル」を「ｘｘｘ情報」と呼ぶことができる。 In the following description, various types of information may be described using the expression “xxx table”, but the various types of information may be expressed using a data structure other than a table. In order to show that it does not depend on the data structure, the “xxx table” can be called “xxx information”.

　以後の説明では「プログラム」や「ソフトウェア」を主語として説明を行う場合があるが、プログラムやソフトウェアはプロセッサによって実行されることで定められた処理をメモリ及び通信ポートを用いながら行うため、プロセッサを主語とした説明としてもよい。 In the following description, “program” and “software” may be used as the subject. However, since the program and software are executed by the processor to perform processing determined using the memory and the communication port, the processor is used. It may be a subject explanation.

　また、管理計算機は、１以上の計算機で構成されてよい。 Further, the management computer may be composed of one or more computers.

　本実施形態の説明では、バックアップデータについて、上位装置から認識される論理的にまとまったデータを「コンテンツ」という。コンテンツには、通常のファイルの他、例えばアーカイブファイル、バックアップファイル、仮想ボリュームファイルなどの通常のファイルを集約したファイルが含まれる。また、コンテンツ内で重複排除処理の単位となるデータセグメントを「チャンク」という。また、記憶デバイスにおいて幾つかのチャンクを格納する形式を「チャンクデータセット」という。 In the description of the present embodiment, the logically collected data recognized from the host device with respect to the backup data is referred to as “content”. In addition to normal files, the contents include files that are a collection of normal files such as archive files, backup files, and virtual volume files. In addition, a data segment that is a unit of deduplication processing in content is called “chunk”. A format for storing several chunks in the storage device is called “chunk data set”.

　一つのチャンクデータセットは、互いに関連性の高いチャンクが集約されるように作成される。例えば、各チャンクデータセットに対して、予め所定のチャンク数またはデータ容量を設定し、コンテンツから生成されたチャンクをチャンクデータセットが一杯になるまでまとめることで、データのローカリティを考慮したチャンクデータセットの生成が可能である。言い換えれば、チャンクデータセットにはコンテンツの区切りとは関連なくデータが格納されるが、あるコンテンツの最初のチャンクが格納されているチャンクデータセットが特定されれば、後続のチャンクについても同じチャンクデータセットから取得可能である可能性が高い。しかし、１度の処理で格納されたコンテンツは同じチャンクデータセットに格納される可能性が高いが、その後の更新による差分データは、異なるチャンクデータセットに格納される可能性が高い。すなわち、更新の回数が増えるほど、1つのコンテンツを構成するチャンクが複数のチャンクデータセットに分散する可能性が高い。 One chunk data set is created so that chunks that are highly related to each other are aggregated. For example, by setting a predetermined number of chunks or data capacity in advance for each chunk data set and collecting chunks generated from content until the chunk data set is full, the chunk data set considering the locality of the data Can be generated. In other words, data is stored in the chunk data set regardless of the content delimiter, but if the chunk data set storing the first chunk of a certain content is specified, the same chunk data is also applied to subsequent chunks. It is likely that it can be obtained from the set. However, although the content stored by one process is highly likely to be stored in the same chunk data set, the difference data resulting from the subsequent update is highly likely to be stored in a different chunk data set. That is, as the number of updates increases, there is a higher possibility that chunks constituting one content are distributed to a plurality of chunk data sets.

　一般に、一つのチャンクのサイズは数キロバイト以上である。このため、重複判定処理の実行時に、チャンク同士をチャンクの先頭から順番に比較すると、多くの処理時間とコストを必要とする。そこで、本実施形態に係るストレージ装置では、チャンクのメッセージダイジェストを利用し、短時間かつ低コストで重複判定処理の実行を可能としている。メッセージダイジェストとは、任意の長さのデータ入力に対し、固定長のダイジェストを出力する技術である。本明細書では、メッセージダイジェストの出力結果を、フィンガプリント（以下、ＦＰ）またはハッシュ値という。ＦＰは、任意のハッシュ関数を用いて算出されるハッシュ値である。このハッシュ関数としては、例えばＳＨＡ２５６などの、乱数性が極めて高く、各チャンクに対してハッシュ値が一意に定まる可能性の高いハッシュ関数を利用することが好ましい。また、ＦＰに替え、チャンクそのものや、他の識別子等、当該チャンクが識別できる情報であればよい。 Generally, the size of one chunk is several kilobytes or more. For this reason, when the duplication determination process is executed, if the chunks are compared in order from the top of the chunk, a lot of processing time and cost are required. Therefore, in the storage apparatus according to the present embodiment, it is possible to execute the duplicate determination process in a short time and at a low cost by using a chunk message digest. Message digest is a technique for outputting a fixed-length digest in response to an arbitrary length of data input. In this specification, the output result of the message digest is referred to as a fingerprint (hereinafter referred to as FP) or a hash value. The FP is a hash value calculated using an arbitrary hash function. As this hash function, it is preferable to use a hash function such as SHA256, which has extremely high randomness and is highly likely to have a unique hash value for each chunk. Moreover, it is sufficient to replace the FP with information that can identify the chunk, such as the chunk itself or another identifier.

　図１は、実施例に係るストレージ物理構成例を示す。 FIG. 1 shows a storage physical configuration example according to the embodiment.

　ストレージ、ホスト計算機１０、及びストレージ装置３０より構成されている。ホスト計算機１０とストレージ装置３０は、所定の通信ネットワーク（例えば、ＳＡＮ（Storage　Area　Network）２）を介して通信可能に接続されている。 The storage is composed of a host computer 10 and a storage device 30. The host computer 10 and the storage device 30 are connected to be communicable via a predetermined communication network (for example, SAN (Storage Area Network) 2).

　ホスト計算機１０は、通信インタフェースと、記憶デバイスと、それらに接続されたプロセッサとを有する。プロセッサとしては、例えば、ＣＰＵ（Central　Processing　Unit）１１がある。記憶デバイスとしては、例えば、メモリ１２がある。通信インタフェースとしては、例えば、ＨＢＡ（Host　Bus　Adapter）１３がある。 The host computer 10 has a communication interface, a storage device, and a processor connected to them. An example of the processor is a CPU (Central Processing Unit) 11. An example of the storage device is a memory 12. An example of the communication interface is an HBA (Host Bus Adapter) 13.

　メモリ１２には、ストレージ装置３０を管理するための情報及びプログラムが記憶される。ＣＰＵ１１は、メモリ１２に格納されているプログラムを実行することで、各種機能を実現する。 The memory 12 stores information and programs for managing the storage device 30. The CPU 11 implements various functions by executing a program stored in the memory 12.

　ＨＢＡ１３は、ホスト計算機１０と各ノード３１を接続するためのインタフェースデバイスである。本実施例におけるノード３１は、ストレージコントローラとする。 The HBA 13 is an interface device for connecting the host computer 10 and each node 31. The node 31 in this embodiment is a storage controller.

　ストレージ装置３０は、ストレージコントローラ３１及び記憶部３２を有する。記憶部３２は、複数の記憶デバイス３２１で構成されている。 The storage device 30 includes a storage controller 31 and a storage unit 32. The storage unit 32 includes a plurality of storage devices 321.

　ストレージコントローラ３１は、通信インタフェースと、記憶デバイスと、それらに接続されたプロセッサとを有する。通信インタフェースとしては、例えば、ＦＥ－Ｉ／Ｆ（Front　End　Inter/Face）３１４、及びＢＥ－Ｉ／Ｆ（Back　End　Inter/Face）３１５がある。記憶デバイスとしては、例えば、制御メモリ３１２、及びキャッシュメモリ３１３がある。プロセッサとしては、例えば、ＣＰＵ３１１がある。 The storage controller 31 has a communication interface, a storage device, and a processor connected to them. Examples of communication interfaces include FE-I / F (Front End Inter / Face) 314 and BE-I / F (Back End Inter / Face) 315. Examples of the storage device include a control memory 312 and a cache memory 313. An example of the processor is a CPU 311.

　制御メモリ３１２には、記憶部３２を制御するための情報及びプログラムが記憶される。ＣＰＵ３１１は、制御メモリ３１２に格納されているプログラムを実行することで、各種機能を実現する。キャッシュメモリ３１３には、記憶部３２にライトされるデータ及び記憶部３２からリードされたデータが一時的に格納される。 The control memory 312 stores information and a program for controlling the storage unit 32. The CPU 311 realizes various functions by executing a program stored in the control memory 312. The cache memory 313 temporarily stores data written to the storage unit 32 and data read from the storage unit 32.

　ＦＥ－Ｉ／Ｆ３１４は、ホスト計算機１０等のフロントエンドに存在する外部デバイスと通信するためのインタフェースデバイスである。ＢＥ－Ｉ／Ｆ３１５は、ストレージコントローラ３１が記憶部３２内の記憶デバイスと通信するためのインタフェースデバイスである。 The FE-I / F 314 is an interface device for communicating with an external device existing at the front end of the host computer 10 or the like. The BE-I / F 315 is an interface device for the storage controller 31 to communicate with a storage device in the storage unit 32.

　記憶部３２は、複数のＨＤＤ（Hard　Disk　Drive）３２１を有する。ＨＤＤ３２１に代えて、他種の記憶デバイス（不揮発性の記憶デバイス））、例えば、ＳＳＤ（Solid　State　Device）のようなＦＭ（Flash　Memory）デバイスが採用されもよい。記憶部３２は、異なる種類の記憶デバイスを有してよい。また、複数の同種の記憶デバイスでＲＧ（ＲＡＩＤグループ）が構成されてよい。ＲＧには、所定のＲＡＩＤレベルに従いデータが格納される。本実施例では、ＲＧは、複数のＨＤＤ３２１で構成される。 The storage unit 32 includes a plurality of HDDs (Hard Disk Drives) 321. Instead of the HDD 321, another type of storage device (nonvolatile storage device)), for example, an FM (Flash Memory) device such as an SSD (Solid State Device) may be employed. The storage unit 32 may have different types of storage devices. Further, an RG (RAID group) may be configured by a plurality of storage devices of the same type. The RG stores data according to a predetermined RAID level. In this embodiment, the RG is composed of a plurality of HDDs 321.

　図２は、実施例に係るソフトウェアの構成例を示す。 FIG. 2 shows a configuration example of software according to the embodiment.

　ホスト計算機１０は、メモリ１２に、一次重複排除処理ソフトウェア１２１、重複情報リスト１４００Ａ、及びコンテンツ管理テーブル１５００Ａを格納している。 The host computer 10 stores the primary deduplication processing software 121, the duplication information list 1400A, and the content management table 1500A in the memory 12.

　一次重複排除処理ソフトウェア１２１は、ユーザからの指示または予め定められたスケジュールに従って、データのバックアップ処理を実行する。重複情報リスト１４００Ａは、ストレージ装置３０（具体的には、二次重複排除処理ソフトウェア３１２１）により作成され、ストレージ装置３０から送信される。コンテンツ管理テーブル１５００Ａは、ホスト計算機１０（具体的には、一次重複排除処理ソフトウェア１２１）が、重複情報リスト１４００Ａに基づいて作成する。重複情報リスト１４００Ａ、及びコンテンツ管理テーブル１５００Ａの詳細は後述する。 The primary deduplication processing software 121 executes data backup processing in accordance with an instruction from a user or a predetermined schedule. The duplicate information list 1400A is created by the storage apparatus 30 (specifically, the secondary deduplication processing software 3121) and transmitted from the storage apparatus 30. The content management table 1500A is created by the host computer 10 (specifically, the primary deduplication processing software 121) based on the duplication information list 1400A. Details of the duplicate information list 1400A and the content management table 1500A will be described later.

　ストレージ装置３０は、制御メモリ３１２に、二次重複排除処理ソフトウェア３１２１、ファイルシステム管理部３１２２、チャンク索引テーブル１３００、重複情報リスト１４００Ｂ、コンテンツ管理テーブル１５００Ｂ、及びチャンクデータセット索引テーブル１６００を格納している。 The storage device 30 stores the secondary deduplication processing software 3121, the file system management unit 3122, the chunk index table 1300, the duplicate information list 1400B, the content management table 1500B, and the chunk data set index table 1600 in the control memory 312. Yes.

　二次重複排除処理ソフトウェア３１２１は、ホスト計算機１０側の一次重複排除処理ソフトウェア１２１から送信されたデータの重複排除処理を行なう。ファイルシステム管理部３１２２は、記憶部３２に記憶されるデータの管理を行う。 The secondary deduplication processing software 3121 performs deduplication processing on the data transmitted from the primary deduplication processing software 121 on the host computer 10 side. The file system management unit 3122 manages data stored in the storage unit 32.

　チャンク索引テーブル１３００、重複情報リスト１４００、及びチャンクデータセット索引テーブル１６００は、ストレージ装置３０（具体的には、二次重複排除処理ソフトウェア３１２１）が作成する。重複情報リスト１４００Ｂは、ストレージ装置３０により、チャンク索引テーブル１３００、及びコンテンツ管理テーブル１５００Ｂに基づいて作成される。 The chunk index table 1300, the duplicate information list 1400, and the chunk data set index table 1600 are created by the storage device 30 (specifically, the secondary deduplication processing software 3121). The duplicate information list 1400B is created by the storage apparatus 30 based on the chunk index table 1300 and the content management table 1500B.

　コンテンツ管理テーブル１５００Ｂは、ホスト計算機１０（具体的には、一次重複排除処理ソフトウェア１２１）により作成され、ホスト計算機１０から送信される。 The content management table 1500B is created by the host computer 10 (specifically, the primary deduplication processing software 121) and transmitted from the host computer 10.

　なお、ストレージ装置３０は、ストレージ装置３０に格納された全コンテンツに対応するコンテンツ管理テーブルを制御メモリ３１に格納している。一方、ホスト計算機１０は、バックアップ対象のコンテンツに対応するコンテンツ管理テーブルだけを一時的に格納している。そのため、ホスト計算機１０のメモリ１２に格納されているコンテンツ管理テーブルをコンテンツ管理テーブル１５００Ａとし、ストレージ装置３０の制御メモリ３１に格納されているコンテンツ管理テーブルをコンテンツ管理テーブル１５００Ｂとして、両者のテーブルを区別した。 Note that the storage apparatus 30 stores a content management table corresponding to all contents stored in the storage apparatus 30 in the control memory 31. On the other hand, the host computer 10 temporarily stores only the content management table corresponding to the content to be backed up. Therefore, the content management table stored in the memory 12 of the host computer 10 is the content management table 1500A, and the content management table stored in the control memory 31 of the storage device 30 is the content management table 1500B. did.

　チャンク索引テーブル１３００、重複情報リスト１４００、コンテンツ管理テーブル１５００（Ａ、Ｂ）、及びチャンクデータセット索引テーブル１６００の詳細は後述する。 Details of the chunk index table 1300, the duplicate information list 1400, the content management table 1500 (A, B), and the chunk data set index table 1600 will be described later.

　図３は、チャンク索引テーブル１３００の一例を示す。 FIG. 3 shows an example of the chunk index table 1300.

　チャンク索引テーブル１３００は、ホスト計算機１０からの指示に基づいてストレージ装置３０により作成される（詳細は後述する）。チャンク索引テーブル１３００は、ストレージ装置３０が有するコントローラ３１の制御メモリ３１２に格納される。 The chunk index table 1300 is created by the storage apparatus 30 based on an instruction from the host computer 10 (details will be described later). The chunk index table 1300 is stored in the control memory 312 of the controller 31 included in the storage device 30.

　チャンク索引テーブル１３００は、ストレージ装置３０が、ホスト計算機１０から送信されたチャンクに関して、そのチャンクを含んでいるコンテンツを特定するために用いられる。 The chunk index table 1300 is used for the storage apparatus 30 to specify the content including the chunk with respect to the chunk transmitted from the host computer 10.

　チャンク索引テーブル１３００は、チャンク毎のエントリを有する。各エントリは、ハッシュ値１３０１、コンテンツＩＤ１３０２、及びオフセット１３０３を関連付けて記憶する。 The chunk index table 1300 has an entry for each chunk. Each entry stores a hash value 1301, a content ID 1302, and an offset 1303 in association with each other.

　或るチャンクのエントリにおいて、ハッシュ値１３０１は、当該チャンクのハッシュ値を示す。コンテンツＩＤ１３０２は、当該チャンクが含まれているコンテンツを識別するための情報である。オフセット１３０３は、当該コンテンツにおける当該チャンクの位置を示す。 In an entry of a certain chunk, the hash value 1301 indicates the hash value of the chunk. The content ID 1302 is information for identifying the content including the chunk. An offset 1303 indicates the position of the chunk in the content.

　図４は、重複情報リスト１４００の一例を示す。 FIG. 4 shows an example of the duplicate information list 1400.

　重複情報リスト１４００Ａ、Ｂのデータ構造は同じであるため、ここでは、両者のテーブルを区別せずに説明する。以下、両者の重複情報リストを区別せずに説明する場合、重複情報リスト１４００と表して説明することがある。重複情報リスト１４００は、ストレージ装置３０により作成される（詳細は後述する）。重複情報リスト１４００は、ストレージ装置３０が有するコントローラ３１の制御メモリ３１２、及びホスト計算機１０のメモリ１２に格納される。 Since the data structures of the duplicate information lists 1400A and 1400B are the same, here, description will be made without distinguishing between the two tables. Hereinafter, when the description is made without distinguishing between the duplicate information lists, the duplicate information list 1400 may be described. The duplicate information list 1400 is created by the storage apparatus 30 (details will be described later). The duplicate information list 1400 is stored in the control memory 312 of the controller 31 and the memory 12 of the host computer 10 that the storage apparatus 30 has.

　重複情報リスト１４００は、ホスト計算機１０に、重複判定処理の一部を行わせるためにストレージ装置３０により作成され、ストレージ装置３０からホスト計算機１０に送信される。重複情報リスト１４００は、コンテンツ毎に作成される。以下の説明において、コンテンツとは、複数のファイルを含むアーカイブファイルを示す。 The duplication information list 1400 is created by the storage device 30 in order to cause the host computer 10 to perform part of the duplication determination process, and is transmitted from the storage device 30 to the host computer 10. The duplicate information list 1400 is created for each content. In the following description, content refers to an archive file including a plurality of files.

　重複情報リスト１４００は、チャンク毎のエントリを有する。各エントリは、ハッシュ値１４０１、チャンク長１４０２、及びチャンクデータセットＩＤ１４０３を有する。 The duplicate information list 1400 has an entry for each chunk. Each entry has a hash value 1401, a chunk length 1402, and a chunk data set ID 1403.

　或るチャンクのエントリにおいて、ハッシュ値１４０１は、当該チャンクのハッシュ値を示す。チャンク長１４０２は、当該チャンクのデータ長を示す。チャンクデータセットＩＤ１４０３は、当該チャンクが格納されているチャンクデータセットを識別するための情報である。 In an entry of a certain chunk, a hash value 1401 indicates the hash value of the chunk. The chunk length 1402 indicates the data length of the chunk. The chunk data set ID 1403 is information for identifying the chunk data set in which the chunk is stored.

　図５は、コンテンツ管理テーブル１５００の一例を示す。 FIG. 5 shows an example of the content management table 1500.

　コンテンツ管理テーブル１５００Ａ、Ｂのデータ構造は同じであるため、ここでは、両者のテーブルを区別せずに説明する。以下、両者のコンテンツ管理テーブルを区別せずに説明する場合、コンテンツ管理テーブル１５００と表して説明することがある。 Since the data structures of the content management tables 1500A and 1500B are the same, here, description will be made without distinguishing between the two tables. Hereinafter, when the two content management tables are described without distinction, the content management table 1500 may be described.

　コンテンツ管理テーブル１５００は、チャンクデータセット毎に格納されているチャンクを、コンテンツと関連付ける。例えば、ストレージ装置３０は、ホスト計算機１０からのリード要求に応じてコンテンツを読み出す場合、コンテンツ管理テーブル１５００に基づいて、そのコンテンツに含まれるチャンクを格納するチャンクデータセットを特定して読み出す。 The content management table 1500 associates the chunk stored for each chunk data set with the content. For example, when reading content in response to a read request from the host computer 10, the storage apparatus 30 specifies and reads a chunk data set that stores chunks included in the content based on the content management table 1500.

　コンテンツ管理テーブル１５００は、ストレージ装置３０が有するコントローラ３１の制御メモリ３１２、及びホスト計算機１０のメモリ１２に格納される。 The content management table 1500 is stored in the control memory 312 of the controller 31 and the memory 12 of the host computer 10 that the storage apparatus 30 has.

　ストレージ装置３０は、コンテンツ毎のコンテンツ管理テーブル１５００Ｂを作成する。或るコンテンツのコンテンツ管理テーブル１５００Ｂは、当該コンテンツのコンテンツＩＤを含むファイル名を有し、当該コンテンツ内のチャンク毎のエントリを有する。各エントリは、コンテンツＩＤ１５０１、ハッシュ値１５０２、チャンクデータセットＩＤ１５０３、コンテンツオフセット１５０４、及びチャンク長１５０５を関連付けて記憶する。 The storage device 30 creates a content management table 1500B for each content. The content management table 1500B of a certain content has a file name including the content ID of the content, and has an entry for each chunk in the content. Each entry stores a content ID 1501, a hash value 1502, a chunk data set ID 1503, a content offset 1504, and a chunk length 1505 in association with each other.

　特定のコンテンツ内の一つのチャンクのエントリにおいて、コンテンツＩＤ１５０１は、当該コンテンツを識別するための情報である。ハッシュ値１５０２は、当該チャンクのハッシュ値を示す。チャンクデータセットＩＤ１５０３は、当該チャンクが格納されているチャンクデータセットを識別するための情報である。コンテンツオフセット１５０４は、特定のコンテンツにおける当該チャンクの位置を示す。チャンク長１５０５は、当該チャンクのデータ長を示す。 In the entry of one chunk in specific content, the content ID 1501 is information for identifying the content. A hash value 1502 indicates a hash value of the chunk. The chunk data set ID 1503 is information for identifying the chunk data set in which the chunk is stored. The content offset 1504 indicates the position of the chunk in specific content. The chunk length 1505 indicates the data length of the chunk.

　図６は、チャンクデータセット索引テーブル１６００の一例を示す。 FIG. 6 shows an example of the chunk data set index table 1600.

　ストレージ装置３０は、チャンクをチャンクデータセット毎に管理する。つまり、チャンクデータセット索引テーブル１６００は、ストレージ装置３０が、チャンクをチャンクデータセット毎に管理するために使用される。チャンクデータセット索引テーブル１６００は、ストレージ装置３０が有するコントローラ３１の制御メモリ３１２に格納される。 Storage device 30 manages chunks for each chunk data set. That is, the chunk data set index table 1600 is used by the storage apparatus 30 to manage chunks for each chunk data set. The chunk data set index table 1600 is stored in the control memory 312 of the controller 31 included in the storage device 30.

　ストレージ装置３０は、チャンクデータセット毎のチャンクデータセット索引テーブル１６００を作成する。或るチャンクデータセットのチャンクデータセット索引テーブル１６００は、当該チャンクデータセットのチャンクデータセットＩＤを含むファイル名を有し、当該チャンクデータセット内のチャンク毎のエントリを有する。各エントリは、ハッシュ値１６０１、チャンクデータセットオフセット１６０２、及びチャンク長１６０３を有する。 The storage apparatus 30 creates a chunk data set index table 1600 for each chunk data set. The chunk data set index table 1600 of a certain chunk data set has a file name including the chunk data set ID of the chunk data set, and has an entry for each chunk in the chunk data set. Each entry has a hash value 1601, a chunk data set offset 1602, and a chunk length 1603.

　或るチャンクデータセット内の或るチャンクのエントリにおいて、ハッシュ値１６０１は、当該チャンクのハッシュ値を示す。チャンクデータセットオフセット１６０２は、当該チャンクデータセットにおける当該チャンクの位置を示す。チャンク長１６０３は、当該チャンクのデータ長を示す。 In an entry of a certain chunk in a certain chunk data set, a hash value 1601 indicates the hash value of the chunk. The chunk data set offset 1602 indicates the position of the chunk in the chunk data set. The chunk length 1603 indicates the data length of the chunk.

　図７は、実施例に係る重複排除処理の概要を説明するための模式図である。 FIG. 7 is a schematic diagram for explaining an overview of the deduplication processing according to the embodiment.

　重複排除処理の概要を処理Ｓ１～Ｓ１４の順番に従って説明するが、処理Ｓ１～Ｓ１４が行われる前に、ストレージシステムが、以下の状態であったとする。
（１）ストレージ装置３０には、更新前コンテンツＦ１から生成されたチャンク群「ａ、ｂ、ｃ、ｄ、ｅ」が格納されている。
（２）チャンク群「ａ、ｂ、ｃ、ｄ、ｅ」の代表チャンクが「ａ」である。
（３）ストレージ装置３０の制御メモリ３１に、代表チャンク「ａ」を含むチャンク索引テーブル１３００が格納されている。
（４）コンテンツＦ１に関するコンテンツ管理テーブル１５００がホスト計算機１０により作成されて、ストレージ装置３０の制御メモリ３１に格納されている。
（５）ホスト計算機１０側で、コンテンツＦ１の更新コンテンツが入力され、更新コンテンツから生成されるチャンク群は「ａ、ｂ、ｃ、ｄ１、ｅ」である。 The outline of the deduplication process will be described in the order of the processes S1 to S14. It is assumed that the storage system is in the following state before the processes S1 to S14 are performed.
(1) The storage device 30 stores a chunk group “a, b, c, d, e” generated from the pre-update content F1.
(2) The representative chunk of the chunk group “a, b, c, d, e” is “a”.
(3) The chunk index table 1300 including the representative chunk “a” is stored in the control memory 31 of the storage device 30.
(4) A content management table 1500 related to the content F 1 is created by the host computer 10 and stored in the control memory 31 of the storage device 30.
(5) On the host computer 10 side, the updated content of the content F1 is input, and the chunk group generated from the updated content is “a, b, c, d1, e”.

　＜処理Ｓ１＞
　ホスト計算機１０は、例えば、ユーザからバックアップ対象のコンテンツが指定される。ここでは、ユーザからコンテンツＦ１が指定される。 <Process S1>
In the host computer 10, for example, content to be backed up is designated by the user. Here, the content F1 is designated by the user.

　ホスト計算機１０は、更新コンテンツＦ１を複数のデータセグメントに分割する。更に、ホスト計算機１０は、更新コンテンツＦ１を複数の単位データに区切る。ここでホスト計算機１０は、例えば、コンテンツを先頭から所定のデータサイズ（例えば、６４ＭＢ）毎に区切って単位データを認識する。なお、コンテンツが複数のファイルを含む場合、単位データはファイルであっても良い。また、所定のデータサイズ毎の区切りとファイルの区切りの両方を、単位データの区切りとしても良い。また、データセグメントのサイズは固定サイズではなく、データの特性等に応じて可変としてもよい。 The host computer 10 divides the update content F1 into a plurality of data segments. Further, the host computer 10 divides the update content F1 into a plurality of unit data. Here, for example, the host computer 10 recognizes unit data by dividing the content from the top into predetermined data sizes (for example, 64 MB). When the content includes a plurality of files, the unit data may be a file. Further, both the delimiter for each predetermined data size and the delimiter for the file may be used as a unit data delimiter. Further, the size of the data segment is not a fixed size, and may be variable according to the data characteristics.

　ホスト計算機１０は、データセグメントを圧縮してチャンクを生成し、各単位データを代表するデータセグメントに対応するチャンク（以下、代表チャンク）を決定する。代表チャンクは、例えば、コンテンツを単位データに区切った際の、その区切りの後の先頭のデータセグメントに対応するチャンク、とすることができる。この例において、代表チャンクは「ａ」である。更にホスト計算機１０は、生成されたすべてのチャンクのハッシュ値を算出して第１計算ハッシュ値とする。 The host computer 10 generates a chunk by compressing the data segment, and determines a chunk (hereinafter, representative chunk) corresponding to the data segment representing each unit data. The representative chunk can be, for example, a chunk corresponding to the first data segment after the division when the content is divided into unit data. In this example, the representative chunk is “a”. Further, the host computer 10 calculates the hash values of all the generated chunks as the first calculated hash value.

　＜処理Ｓ２＞
　この段階では、重複情報リスト１４００Ａは空である。ホスト計算機１０は、重複情報リスト１４００Ａを参照し、重複情報リスト１４００Ａが空であることを認識する。 <Process S2>
At this stage, the duplicate information list 1400A is empty. The host computer 10 refers to the duplicate information list 1400A and recognizes that the duplicate information list 1400A is empty.

　なお、処理の流れの中で、ホスト計算機１０とストレージ装置３０に格納されている重複情報リスト１４００は内容が異なる場合がある。そのため、以下の説明では、ホスト計算機１０のみに格納されている重複情報リストを重複情報リスト１４００Ａとし、ストレージ装置３０により更新された重複情報リスト１４００Ａを、重複情報リスト１４００Ｂと表して説明する場合がある。 In the processing flow, the contents of the duplicate information list 1400 stored in the host computer 10 and the storage device 30 may be different. Therefore, in the following description, the duplicate information list stored only in the host computer 10 is referred to as the duplicate information list 1400A, and the duplicate information list 1400A updated by the storage apparatus 30 is represented as the duplicate information list 1400B. is there.

　＜処理Ｓ３＞
　重複情報リスト１４００Ａの内容が空である場合、ホスト計算機１０は、代表チャンクをストレージ装置３０に送信する。 <Process S3>
When the content of the duplicate information list 1400 </ b> A is empty, the host computer 10 transmits the representative chunk to the storage device 30.

　＜処理Ｓ４＞
　ストレージ装置３０は、ホスト計算機１０から受信されたチャンクのハッシュ値を計算して第２計算ハッシュ値とし、第２計算ハッシュ値が、チャンク索引テーブル１３００に存在するか否かを判定する。チャンク索引テーブル１３００に存在する場合、ストレージ装置３０は、チャンク索引テーブル１３００を参照して、第２計算ハッシュ値に一致するハッシュ値１３０１に対応するコンテンツＩＤ１３０２を特定する。この例において、特定したコンテンツＩＤは、コンテンツＦ１を示す。つまり、受信されたチャンクａは、ストレージ装置３０に格納されているコンテンツＦ１の代表チャンクである。 <Process S4>
The storage apparatus 30 calculates the hash value of the chunk received from the host computer 10 as the second calculated hash value, and determines whether the second calculated hash value exists in the chunk index table 1300. If the chunk index table 1300 exists, the storage apparatus 30 refers to the chunk index table 1300 to identify the content ID 1302 corresponding to the hash value 1301 that matches the second calculated hash value. In this example, the specified content ID indicates the content F1. That is, the received chunk a is a representative chunk of the content F1 stored in the storage device 30.

　＜処理Ｓ５＞
　ストレージ装置３０は、特定したコンテンツＩＤに対応するコンテンツ管理テーブル１５００Ｂを特定し、特定したコンテンツ管理テーブル１５００Ｂを参照して、特定したコンテンツに含まれているすべてのチャンクを特定する。 <Process S5>
The storage device 30 identifies the content management table 1500B corresponding to the identified content ID, and refers to the identified content management table 1500B to identify all chunks included in the identified content.

　ストレージ装置３０は、特定されたチャンクのハッシュ値１５０２を含む重複情報リスト１４００Ｂを作成し、作成した重複情報リスト１４００Ｂをホスト計算機１０に送信する。重複情報リスト１４００Ｂは、コンテンツ毎に作成される。この例において、重複情報リスト１４００Ｂは、「ａ、ｂ、ｃ、ｄ、ｅ」のハッシュ値を含む。なお、送信する重複排除リスト１４００Ｂは、「ｂ、ｃ、ｄ、ｅ」のハッシュ値を含み、既にストレージ装置が受信している「ａ」のハッシュ値を含まなくてもよい。 The storage apparatus 30 creates a duplicate information list 1400B including the hash value 1502 of the identified chunk, and transmits the created duplicate information list 1400B to the host computer 10. The duplicate information list 1400B is created for each content. In this example, the duplicate information list 1400B includes hash values of “a, b, c, d, e”. The deduplication list 1400B to be transmitted includes the hash values of “b, c, d, e” and does not have to include the hash value of “a” that has already been received by the storage apparatus.

　＜処理Ｓ６、Ｓ７＞
　ホスト計算機１０は、受信した重複情報リスト１４００Ｂを使用して、更新コンテンツを構成する代表チャンク以外のチャンクが重複情報リストに存在するか否かを判定する。具体的に、ホスト計算機１０は、代表チャンク以外のチャンクの第１計算ハッシュ値が重複情報リスト１４００Ｂに存在するか否かを判定する。なお、以下の説明において、ホスト計算機１０が計算するチャンクのハッシュ値を第１計算ハッシュ値とし、ストレージ装置３０が計算するチャンクのハッシュ値を第２計算ハッシュ値とする。第１計算ハッシュ値、及び第２計算ハッシュ値の計算に使用される関数は、同じである。 <Processing S6 and S7>
The host computer 10 uses the received duplicate information list 1400B to determine whether a chunk other than the representative chunk that constitutes the update content exists in the duplicate information list. Specifically, the host computer 10 determines whether or not the first calculation hash value of a chunk other than the representative chunk exists in the duplicate information list 1400B. In the following description, the hash value of the chunk calculated by the host computer 10 is referred to as a first calculated hash value, and the hash value of the chunk calculated by the storage device 30 is referred to as a second calculated hash value. The functions used for calculating the first calculated hash value and the second calculated hash value are the same.

　ここで、ホスト計算機１０は、「ｂ」及び「ｃ」のハッシュ値は、重複情報リスト１４００Ｂに存在していると判定する。重複情報リスト１４００にハッシュ値が存在しているチャンク「ｂ」及び「ｃ」は、ストレージ装置３０にチャンクが存在するためストレージ装置３０に送信されない。 Here, the host computer 10 determines that the hash values “b” and “c” exist in the duplicate information list 1400B. Chunks “b” and “c” having hash values in the duplicate information list 1400 are not transmitted to the storage apparatus 30 because the chunks exist in the storage apparatus 30.

　＜処理Ｓ８＞
　ホスト計算機１０は、更新コンテンツを構成する残りのチャンクのハッシュ値が重複情報リスト１４００Ｂに存在しているか否かを判定する。ここでホスト計算機１０は、チャンク「ｄ１」のハッシュ値が重複情報リスト１４００Ａに存在していないと判定する。 <Process S8>
The host computer 10 determines whether or not the hash values of the remaining chunks that constitute the update content exist in the duplicate information list 1400B. Here, the host computer 10 determines that the hash value of the chunk “d1” does not exist in the duplicate information list 1400A.

　＜処理Ｓ９、Ｓ１０＞
　ホスト計算機１０は、重複情報リスト１４００Ｂにハッシュ値が存在しないことから、ストレージ装置３０にチャンクが存在しない可能性が高いチャンク「ｄ１」をストレージ装置３０に送信する。ストレージ装置３０は、「ｄ１」のハッシュ値を計算して第２計算ハッシュ値とし、チャンク索引テーブル１３００に存在するか否かを判定する。「ｄ１」の第２計算ハッシュ値は、代表チャンクのハッシュ値ではないので、チャンク索引テーブル１３００に存在しない。そこで、ストレージ装置３０は、チャンクデータセット索引テーブル１６００を参照し、「ｄ１」の第２計算ハッシュ値が存在するか否かを判定する。存在しない場合、ストレージ装置３０は、「ｄ１」の第２計算ハッシュ値を、チャンクデータセット索引テーブル１６００に格納する。 <Processing S9 and S10>
Since the hash value does not exist in the duplicate information list 1400B, the host computer 10 transmits the chunk “d1” that has a high possibility that no chunk exists in the storage apparatus 30 to the storage apparatus 30. The storage apparatus 30 calculates the hash value of “d1” to obtain the second calculated hash value, and determines whether or not it exists in the chunk index table 1300. Since the second calculated hash value of “d1” is not the hash value of the representative chunk, it does not exist in the chunk index table 1300. Therefore, the storage apparatus 30 refers to the chunk data set index table 1600 and determines whether or not the second calculated hash value “d1” exists. If not, the storage apparatus 30 stores the second calculated hash value of “d1” in the chunk data set index table 1600.

　＜処理Ｓ１１＞
　ストレージ装置３０は、記憶部３２に新たなチャンクデータセットを生成し、そのチャンクデータセットにチャンク「ｄ１」を格納する。 <Process S11>
The storage device 30 generates a new chunk data set in the storage unit 32 and stores the chunk “d1” in the chunk data set.

　＜処理Ｓ１２＞
　ストレージ装置３０は、「ｄ１」を格納したチャンクデータセットのチャンクデータセットＩＤをホスト計算機１０へ送信する。ホスト計算機１０は、ストレージ装置３０から送信されたチャンクデータセットＩＤを受信することにより、受信したチャンクデータセットＩＤを含むコンテンツに関するコンテンツ管理テーブル１５００Ａを作成することができる。 <Process S12>
The storage apparatus 30 transmits the chunk data set ID of the chunk data set storing “d1” to the host computer 10. By receiving the chunk data set ID transmitted from the storage device 30, the host computer 10 can create a content management table 1500A related to the content including the received chunk data set ID.

　＜処理Ｓ１３＞
　ホスト計算機１０は、更新コンテンツを構成する残りのチャンクのハッシュ値が重複情報リスト１４００Ｂに存在しているか否かを判定する。ここで、ホスト計算機１０は、「ｅ」が重複情報リスト１４００Ｂに存在していると判定する。重複情報リスト１４００Ｂに存在している「ｅ」は、ストレージ装置３０に送信されない。 <Process S13>
The host computer 10 determines whether or not the hash values of the remaining chunks that constitute the update content exist in the duplicate information list 1400B. Here, the host computer 10 determines that “e” exists in the duplicate information list 1400B. The “e” existing in the duplicate information list 1400B is not transmitted to the storage device 30.

　＜処理Ｓ１４＞
　ホスト計算機１０は、更新コンテンツを構成する全てのチャンクに対して重複判定処理を行なった後、ストレージ装置３０から受信した情報に基づいて、更新コンテンツのコンテンツ管理テーブル１５００Ａを生成し、生成されたコンテンツ管理テーブル１５００Ａをストレージ装置３０に送信する。この例において、生成されたコンテンツ管理テーブル１５００Ａは、更新コンテンツがチャンク群「ａ、ｂ、ｃ、ｄ１、ｅ」を含むことを示す。 <Process S14>
The host computer 10 performs duplication determination processing on all the chunks constituting the update content, and then generates the content management table 1500A for the update content based on the information received from the storage device 30, and the generated content The management table 1500A is transmitted to the storage device 30. In this example, the generated content management table 1500A indicates that the updated content includes the chunk group “a, b, c, d1, e”.

　ストレージ装置３０は、更新コンテンツのコンテンツ管理テーブル１５００Ａを受信し、制御メモリ３１に格納されている更新前コンテンツのコンテンツ管理テーブル１５００Ｂを、受信されたコンテンツ管理テーブル１５００Ａに置き換える。コンテンツ管理テーブル１５００を更新前から更新後のものに置き換わることにより、次回の重複排除処理の際に最新のコンテンツのデータに基づいた重複排除リスト１４００を作成することができる。すなわち、ホスト計算機１０が有する新たな更新コンテンツを構成するチャンクと、ストレージ装置３０から送信される最新の重複排除リスト１４００に含まれるチャンクとの一致率が高くなり、ホスト装置１０からストレージ装置３０へと送信するチャンクの数を減らすことができる。 The storage device 30 receives the content management table 1500A of the updated content, and replaces the content management table 1500B of the content before update stored in the control memory 31 with the received content management table 1500A. By replacing the content management table 1500 with the one before and after the update, the deduplication list 1400 based on the latest content data can be created in the next deduplication process. In other words, the match rate between the chunks constituting the new update content of the host computer 10 and the chunks included in the latest deduplication list 1400 transmitted from the storage apparatus 30 increases, and the host apparatus 10 transfers to the storage apparatus 30. And the number of chunks to be sent can be reduced.

　なお、以上の説明では代表チャンク「ａ」が更新されていない場合を想定したが、代表チャンク「ａ」が更新される（例えば「ａ１」に更新される）可能性もある。その場合、ストレージ装置３０は、更新後の代表チャンク「ａ１」と、新たなコンテンツＩＤ１３０２を新たに対応付け、チャンク索引テーブル１３００に登録し、そのコンテンツＩＤに対応する新たなコンテンツ管理テーブル１５００Ｂを生成する（図７の点線枠参照）。 In the above description, it is assumed that the representative chunk “a” has not been updated. However, the representative chunk “a” may be updated (for example, updated to “a1”). In that case, the storage apparatus 30 newly associates the updated representative chunk “a1” with the new content ID 1302, registers it in the chunk index table 1300, and generates a new content management table 1500B corresponding to the content ID. (Refer to the dotted frame in FIG. 7).

　次に、前述の重複排除処理を用いるバックアップ処理について説明する。 Next, backup processing using the above-described deduplication processing will be described.

　図９は、実施例に係るバックアップ処理の一例を示す。 FIG. 9 shows an example of backup processing according to the embodiment.

　ホスト計算機１０の一次重複排除処理ソフトウェア１２１（以下、ソフトウェア１２１）は、バックアップ対象のコンテンツをデータセグメントに分割する（Ｓ９０１）。 The primary deduplication processing software 121 (hereinafter, software 121) of the host computer 10 divides the content to be backed up into data segments (S901).

　ソフトウェア１２１は、分割されたデータセグメントを順次選択し、選択されたデータセグメントに対してＳ９０２～Ｓ９０９を実行する。 The software 121 sequentially selects the divided data segments, and executes S902 to S909 for the selected data segments.

　ソフトウェア１２１は、選択されたデータセグメントを圧縮することによりチャンクを生成して対象チャンクとする（Ｓ９０２）。ソフトウェア１２１は、対象チャンクの長さを取得し、更に対象チャンクのハッシュ値を計算して第１計算ハッシュ値とする（Ｓ９０３）。 The software 121 generates a chunk by compressing the selected data segment and sets it as a target chunk (S902). The software 121 acquires the length of the target chunk, further calculates the hash value of the target chunk, and sets it as the first calculated hash value (S903).

　ソフトウェア１２１は、Ｓ９０３で取得した対象チャンクの長さ、及び対象チャンクの第１計算ハッシュ値等に基づいて、第１計算ハッシュ値が、重複情報リスト１４００Ａに存在するか否かを判定する（Ｓ９０４）。 The software 121 determines whether or not the first calculated hash value exists in the duplicate information list 1400A based on the length of the target chunk acquired in S903, the first calculated hash value of the target chunk, and the like (S904). ).

　第１計算ハッシュ値が、重複情報リスト１４００Ａに存在する場合（Ｓ９０４：Ｙｅｓ）、ソフトウェア１２１は、重複情報リスト１４００Ａから、第１計算ハッシュ値と関連付けられているチャンクデータセットＩＤ１４０３を取得する（Ｓ９０５）。 When the first calculated hash value exists in the duplicate information list 1400A (S904: Yes), the software 121 acquires the chunk data set ID 1403 associated with the first calculated hash value from the duplicate information list 1400A (S905). ).

　ソフトウェア１２１は、対象チャンクのハッシュ値、そのハッシュ値と関連付けられているチャンクデータセットＩＤ、及び対象チャンクの長さを、コンテンツ管理テーブル１５００における、ハッシュ値１５０２、チャンクデータセットＩＤ１５０３、チャンク長１５０５にそれぞれ格納する（Ｓ９０９）。 The software 121 sets the hash value of the target chunk, the chunk data set ID associated with the hash value, and the length of the target chunk to the hash value 1502, the chunk data set ID 1503, and the chunk length 1505 in the content management table 1500. Each is stored (S909).

　すべてのデータセグメントについてＳ９０２～Ｓ９０９の処理が終了すると、ソフトウェア１２１は、コンテンツ管理テーブル１５００Ａをストレージ装置３０に送信する（Ｓ９１０）。ストレージ装置３０は、ホスト計算機１０からコンテンツ管理テーブル１５００Ａを受信し、制御メモリ３１２におけるバックアップ対象のコンテンツのコンテンツ管理テーブル１５００Ｂを受信されたコンテンツ管理テーブルＡに置き換える（Ｓ９１１）。 When the processing of S902 to S909 is completed for all the data segments, the software 121 transmits the content management table 1500A to the storage device 30 (S910). The storage apparatus 30 receives the content management table 1500A from the host computer 10, and replaces the content management table 1500B of the backup target content in the control memory 312 with the received content management table A (S911).

　一方、第１計算ハッシュ値が、重複情報リスト１４００Ａに存在しない場合（Ｓ９０４：Ｎｏ）、ソフトウェア１２１は、対象チャンクをストレージ装置３０に送信する（Ｓ９０６）。 On the other hand, when the first calculated hash value does not exist in the duplicate information list 1400A (S904: No), the software 121 transmits the target chunk to the storage device 30 (S906).

　ストレージ装置３０は、Ｓ９０６でソフトウェア１２１から送信された対象チャンクに基づき、重複情報リスト１４００Ｂ、及びチャンク索引テーブル１３００を作成する情報作成処理を行う。この処理の詳細は、図１０で説明する（Ｓ９０７）。 The storage apparatus 30 performs information creation processing for creating the duplicate information list 1400B and the chunk index table 1300 based on the target chunk transmitted from the software 121 in S906. Details of this processing will be described with reference to FIG. 10 (S907).

　ソフトウェア１２１は、ストレージ装置３０から対象チャンクのチャンクデータセットＩＤを受信する（Ｓ９０８）。その後、ソフトウェア１２１は、Ｓ９０９以降の処理をする。 The software 121 receives the chunk data set ID of the target chunk from the storage device 30 (S908). Thereafter, the software 121 performs the processing after S909.

　以上のバックアップ処理によれば、ストレージ装置３０に格納されているコンテンツを更新する場合の通信において、ホスト計算機１０は、バックアップ対象のコンテンツの代表チャンクと、そのコンテンツ内で重複していないと判定されたチャンクと、そのコンテンツのコンテンツ管理テーブル１５００Ａとを、ストレージ装置３０へ送信する。一方、ストレージ装置３０は、そのコンテンツ内のチャンクを示す重複排除リスト１４００Ａと、そのコンテンツ内で重複していないと判定されたチャンクのチャンクデータセットＩＤとを、ホスト計算機１０へ送信する。これにより、ホスト計算機１０における重複判定に必要な記憶容量を抑え、ホスト計算機１０とストレージ装置３０の間のネットワークの負荷を抑えることができる。 According to the above backup processing, in the communication when updating the content stored in the storage device 30, the host computer 10 is determined not to overlap with the representative chunk of the content to be backed up in the content. The chunk and the content management table 1500A of the content are transmitted to the storage device 30. On the other hand, the storage device 30 transmits to the host computer 10 the deduplication list 1400A indicating the chunks in the content and the chunk data set ID of the chunks determined not to be duplicated in the content. Thereby, the storage capacity required for the duplication determination in the host computer 10 can be suppressed, and the load on the network between the host computer 10 and the storage device 30 can be suppressed.

　図１０は、情報作成処理の詳細を示す。 FIG. 10 shows details of the information creation process.

　情報作成処理は、ストレージ装置３０の制御メモリ３１に格納されている二次重複排除処理ソフトウェア３１２１（以下、ソフトウェア３１２１）が行う。 The information creation processing is performed by secondary deduplication processing software 3121 (hereinafter, software 3121) stored in the control memory 31 of the storage device 30.

　ソフトウェア３１２１は、Ｓ９０６でソフトウェア１２１から送信された対象チャンクのハッシュ値を計算して第２計算ハッシュ値とし、第２計算ハッシュ値が、チャンク索引テーブル１３００のハッシュ値１３０１に存在するか否かを判定する（Ｓ１００１）。 The software 3121 calculates the hash value of the target chunk transmitted from the software 121 in S906 as a second calculated hash value, and determines whether the second calculated hash value exists in the hash value 1301 of the chunk index table 1300. Determination is made (S1001).

　第２計算ハッシュ値が、チャンク索引テーブル１３００に存在する場合（Ｓ１００１：Ｙｅｓ）、ソフトウェア３１２１は、チャンク索引テーブル１３００から、第２計算ハッシュ値に対応する、コンテンツＩＤ１３０２、及びオフセット１３０３を取得する（Ｓ１００２）。 When the second calculated hash value exists in the chunk index table 1300 (S1001: Yes), the software 3121 acquires the content ID 1302 and the offset 1303 corresponding to the second calculated hash value from the chunk index table 1300 ( S1002).

　ソフトウェア３１２１は、Ｓ１００２で取得したコンテンツＩＤ１３０２に対応するコンテンツ管理テーブル１５００を参照する（Ｓ１００３）。ソフトウェア３１２１は、Ｓ１００３で特定したコンテンツ管理テーブル１５００Ｂから、取得したコンテンツＩＤ１３０２の対象コンテンツを構成する各チャンクの、ハッシュ値１５０２、チャンクデータセットＩＤ１５０３、及びチャンク長１５０５を取得する。 The software 3121 refers to the content management table 1500 corresponding to the content ID 1302 acquired in S1002 (S1003). The software 3121 acquires the hash value 1502, the chunk data set ID 1503, and the chunk length 1505 of each chunk constituting the target content of the acquired content ID 1302 from the content management table 1500B specified in S1003.

　ソフトウェア３１２１は、Ｓ１００３で取得した情報を含む重複情報リスト１４００Ｂを作成する（Ｓ１００５）。ソフトウェア３１２１は、Ｓ１００５で作成した重複情報リスト１４００Ｂをホスト計算機１０に送信する（Ｓ１００６）。 The software 3121 creates a duplicate information list 1400B including the information acquired in S1003 (S1005). The software 3121 transmits the duplicate information list 1400B created in S1005 to the host computer 10 (S1006).

　ソフトウェア３１２１は、対象チャンクが代表チャンクか否かを判定する（Ｓ１０１０）。ホスト計算機１０は、チャンクをストレージ装置３０へ送信する場合、そのチャンクが代表チャンクであるか否かを示す情報を付与する。ソフトウェア３１２１は、代表チャンクとする情報が付与されている場合、送信された対象チャンクを代表チャンクと判定する。 The software 3121 determines whether the target chunk is a representative chunk (S1010). When transmitting a chunk to the storage apparatus 30, the host computer 10 gives information indicating whether or not the chunk is a representative chunk. The software 3121 determines that the transmitted target chunk is the representative chunk when information representing the representative chunk is given.

　対象チャンクが代表チャンクである場合（Ｓ１０１０：Ｙｅｓ）、ソフトウェア３１２１は、対象チャンクの第２計算ハッシュ値、コンテンツＩＤ、及びオフセットを、チャンク索引テーブル１３００に格納する。一方、対象チャンクが代表チャンクでない場合（Ｓ１０１０：Ｎｏ）、ソフトウェア３１２１は、Ｓ９０７の処理を終了する。 When the target chunk is the representative chunk (S1010: Yes), the software 3121 stores the second calculated hash value, the content ID, and the offset of the target chunk in the chunk index table 1300. On the other hand, when the target chunk is not the representative chunk (S1010: No), the software 3121 ends the process of S907.

　第２計算ハッシュ値が、チャンク索引テーブル１３００に存在しない場合（Ｓ１００１：Ｎｏ）、ソフトウェア３１２１は、対象チャンクが重複しているか否かを判定する（Ｓ１００７）。Ｓ１００７の処理の詳細は、後述する。 When the second calculated hash value does not exist in the chunk index table 1300 (S1001: No), the software 3121 determines whether the target chunk is duplicated (S1007). Details of the processing of S1007 will be described later.

　対象チャンクが重複していない場合（Ｓ１００７：Ｎｏ）、ソフトウェア３１２１は、新たなチャンクデータセットのチャンクデータセット索引テーブル１６００を生成し、生成されたチャンクデータセット索引テーブル１６００に対象チャンクを登録する。ここでソフトウェア３１２１は、生成されたチャンクデータセット索引テーブル１６００に対象チャンクのハッシュ値１６０１、チャンクデータセットオフセット１６０２、チャンク長１６０３を格納する（Ｓ１００８）。 If the target chunks are not duplicated (S1007: No), the software 3121 generates a chunk data set index table 1600 of a new chunk data set, and registers the target chunk in the generated chunk data set index table 1600. Here, the software 3121 stores the hash value 1601 of the target chunk, the chunk data set offset 1602, and the chunk length 1603 in the generated chunk data set index table 1600 (S1008).

　一方、チャンクが重複している場合（Ｓ１００７：Ｙｅｓ）、ソフトウェア３１２１は、コンテンツ管理テーブル１５００ＢのコンテンツＩＤ１５０１を参照して、対象チャンクが格納されているチャンクデータセットＩＤをホスト計算機１０に送信する（Ｓ１００９）。その後、ソフトウェア３１２１は、Ｓ１０１０以降の処理をする。 On the other hand, when the chunks overlap (S1007: Yes), the software 3121 refers to the content ID 1501 of the content management table 1500B and transmits the chunk data set ID in which the target chunk is stored to the host computer 10 ( S1009). Thereafter, the software 3121 performs the processing after S1010.

　以上の情報作成処理によれば、ストレージ装置３０は、ホスト計算機３０から受信した対象チャンクに対応するコンテンツやチャンクデータセットを示す情報を、ホスト計算機３０へ送信することができる。 According to the information creation process described above, the storage apparatus 30 can transmit information indicating the content and chunk data set corresponding to the target chunk received from the host computer 30 to the host computer 30.

　一方、ストレージ装置３０からホスト計算機１０に重複排除リスト１４００を送信するのではなく、チャンクデータセット索引テーブル１６００をそのまま送信することも考えられる。 On the other hand, instead of transmitting the deduplication list 1400 from the storage apparatus 30 to the host computer 10, it is possible to transmit the chunk data set index table 1600 as it is.

　しかし、ストレージ装置３０では、コンテンツを構成するチャンクが更新されると更新後のチャンクを新たなチャンクデータセットに格納し、新たなチャンクデータセット索引テーブル１６００を作成するため、更新前のチャンクが入ったチャンクデータセット索引テーブル１６００は更新されない。そのため、コンテンツの更新を重ねることで、最新のコンテンツを構成するチャンクが複数のチャンクデータセットに分散して格納されるようになり、チャンクデータセット索引テーブル１６００をホスト計算機１０に送信しても、ホスト計算機１０における、チャンクが重複していると判定できる確率が下がってしまう。 However, in the storage device 30, when the chunks that make up the content are updated, the updated chunks are stored in a new chunk data set, and a new chunk data set index table 1600 is created. The chunk data set index table 1600 is not updated. Therefore, by repeatedly updating the content, chunks constituting the latest content are distributed and stored in a plurality of chunk data sets, and even if the chunk data set index table 1600 is transmitted to the host computer 10, In the host computer 10, the probability that it can be determined that the chunks are duplicated decreases.

　しかし、以上の情報処理により、ストレージ装置３０は、ホスト計算機１０から送信されたチャンクと同じ最新のコンテンツを構成するチャンクの情報を有する重複排除リスト１４００を作成し、ホスト計算機１０に送信することができる。これにより、ホスト計算機１０における、チャンクが重複していると判定できる確率を上げることができる。 However, with the above information processing, the storage apparatus 30 can create a deduplication list 1400 having information on the chunks that make up the same latest content as the chunk transmitted from the host computer 10 and transmit it to the host computer 10. it can. Thereby, it is possible to increase the probability that the host computer 10 can determine that the chunks are duplicated.

　次に、Ｓ１００７の詳細を示す。 Next, the details of S1007 will be shown.

　図８は、ストレージシステムにおけるデータ構造の一例を示す。 FIG. 8 shows an example of the data structure in the storage system.

　この図において、図２の要素と同一の符号が付されている要素は、図２の要素と同一であるため、説明を省略する。チャンクインデックス（Chunk Index）１２００は、ストレージ装置３０に格納されているすべてのチャンクのハッシュ値と、そのチャンクが格納されているチャンクデータセットのチャンクデータセットＩＤを関連付けて格納する。 In this figure, elements having the same reference numerals as those in FIG. 2 are the same as those in FIG. The chunk index (Chunk Index) 1200 stores the hash values of all the chunks stored in the storage device 30 in association with the chunk data set ID of the chunk data set in which the chunk is stored.

　コンテンツ１７００は、任意のファイル名を有し、幾つかのチャンクのデータを有する。チャンクデータセット（Chunk Data set）１８００は、チャンクデータセットＩＤを含むファイル名を有し、幾つかのチャンクの長さとデータを有する。 Content 1700 has an arbitrary file name and has several chunks of data. A chunk data set 1800 has a file name including a chunk data set ID, and has a length and data of several chunks.

　ソフトウェア３１２１は、チャンクインデックス１２００を用いて、対象チャンクのハッシュ値と関連付けられているチャンクデータセットＩＤを特定する。その後、ソフトウェア３１２１は、特定したチャンクデータセットＩＤに対応するチャンクデータセット索引テーブル１６００を参照し、対象チャンクがストレージ装置３０に格納されているか否かを、ハッシュ値及び長さから判定する。対象チャンクの第２計算ハッシュ値及び長さが一致している場合、ソフトウェア３１２１は、チャンクが重複していると判定する。 The software 3121 uses the chunk index 1200 to identify the chunk data set ID associated with the hash value of the target chunk. Thereafter, the software 3121 refers to the chunk data set index table 1600 corresponding to the specified chunk data set ID, and determines whether or not the target chunk is stored in the storage device 30 from the hash value and the length. If the second calculated hash value and the length of the target chunk match, the software 3121 determines that the chunk is duplicated.

　ホスト計算機１０は、バックアップ対象のコンテンツのユーザのユーザＩＤを、そのコンテンツから取得し、代表チャンクと共に、ストレージ装置３０へ送信しても良い。この場合、ストレージ装置３０は、ユーザＩＤとコンテンツＩＤを関連付け、二つのコンテンツのデータが同一であっても、互い異なるユーザＩＤに関連付けられていれば、互いに異なるコンテンツと認識する。これにより、ストレージ装置３０は、異なるユーザのコンテンツを区別して管理することができる。 The host computer 10 may acquire the user ID of the user of the content to be backed up from the content and send it to the storage device 30 together with the representative chunk. In this case, the storage device 30 associates the user ID with the content ID, and recognizes that the content is different from each other if the data of the two contents is the same, but is associated with different user IDs. Thereby, the storage apparatus 30 can distinguish and manage the contents of different users.

　チャンクは、データセグメントが圧縮されたものとしたが、データセグメントそのものであってもよい。以上の実施例では、ホスト計算機１０とストレージ装置３０の二つの筺体が協働するストレージシステムについて説明したが、ストレージシステムが一つの筺体内にホスト計算機１０のプロセッサとストレージコントローラ３１を有していてもよい。 The chunk is a data segment compressed, but may be the data segment itself. In the above embodiment, the storage system in which the two cases of the host computer 10 and the storage device 30 cooperate is described. However, the storage system has the processor of the host computer 10 and the storage controller 31 in one case. Also good.

　本発明のストレージ装置において、計算機は、ホスト計算機１０やバックアップサーバ装置などに対応する。記憶デバイスは、記憶デバイス３２１などに対応する。ブロックは、コンテンツなどに対応する。サブブロックは、チャンクやデータセグメントなどに対応する。格納情報は、チャンク索引テーブル１３００およびコンテンツ管理テーブル１５００などに対応する。サブブロック情報は、チャンク索引テーブル１３００などに対応する。ブロック情報は、コンテンツ管理テーブル１５００Ｂなどに対応する。重複情報は、重複情報リスト１４００などに対応する。格納位置は、チャンクデータセットなどに対応する。位置情報は、チャンクデータセット索引テーブル１６００などに対応する。更新情報は、コンテンツ管理テーブル１５００Ａなどに対応する。複数のサブブロックの中の一部のサブブロックは、代表チャンクなどに対応する。代表サブブロックは、代表チャンクなどに対応する。第一コントローラは、ストレージコントローラ３１などに対応する。第二コントローラは、ＣＰＵ１１などに対応する。 In the storage device of the present invention, the computer corresponds to the host computer 10 or a backup server device. The storage device corresponds to the storage device 321 and the like. A block corresponds to content or the like. A sub-block corresponds to a chunk or a data segment. The storage information corresponds to the chunk index table 1300, the content management table 1500, and the like. The sub-block information corresponds to the chunk index table 1300 and the like. The block information corresponds to the content management table 1500B and the like. The duplicate information corresponds to the duplicate information list 1400 and the like. The storage location corresponds to a chunk data set or the like. The position information corresponds to the chunk data set index table 1600 and the like. The update information corresponds to the content management table 1500A and the like. Some of the sub-blocks correspond to representative chunks. The representative sub block corresponds to a representative chunk or the like. The first controller corresponds to the storage controller 31 and the like. The second controller corresponds to the CPU 11 and the like.

　以上、幾つかの実施例を説明したが、それらの実施例は、一例にすぎず、本発明は他の様々な態様に適用可能である。 Although several embodiments have been described above, these embodiments are merely examples, and the present invention can be applied to various other modes.

　１０…ホスト計算機、３０…ストレージ装置、１３００…チャンク索引テーブル、１４００…重複情報リスト、１５００…コンテンツ管理テーブル、１６００…チャンクデータセット索引テーブル

DESCRIPTION OF SYMBOLS 10 ... Host computer, 30 ... Storage apparatus, 1300 ... Chunk index table, 1400 ... Duplicate information list, 1500 ... Content management table, 1600 ... Chunk data set index table

Claims

A storage device;
A controller connected to the storage device and a computer,
The controller is
A plurality of sub-blocks obtained based on the division of the block of data are received from the computer and stored in the storage device;
Storing storage information associating the block with the plurality of stored sub-blocks;
Receiving target information indicating sub-blocks in the block from the computer;
Based on the storage information, identify a block to which the sub-block indicated in the target information belongs,
Based on the stored information, generate duplicate information indicating the plurality of stored sub-blocks and send to the computer,
Receiving an update sub-block from the computer, which is a sub-block not shown in the duplication information among a plurality of sub-blocks obtained by dividing the update data of the block;
Storage device.

The storage device according to claim 1,
The storage information includes a part of the plurality of stored sub-blocks, sub-block information indicating an association with the block, the block, the plurality of stored sub-blocks, Block information indicating an association with storage positions of a plurality of stored sub-blocks,
The controller is
Based on the sub-block information, identify a block to which the sub-block indicated in the target information belongs,
Based on the block information, the duplicate information indicating the stored sub-blocks is generated.
Storage device.

The storage device according to claim 2,
The controller is
Storing position information indicating each storage position of the plurality of stored sub-blocks;
Based on the position information, generating the duplication information including the storage position of each of the stored sub-blocks,
In response to receiving the update subblock, based on the location information, determine whether the update subblock is stored in the storage device;
If it is determined that the updated sub-block is not stored in the storage device, the updated sub-block is stored in the storage device;
Storage device.

The storage device according to claim 3,
The controller is
Transmitting the storage location of the update sub-block in the storage device to the computer;
The update information based on the duplication information and the storage location of the update sub-block, the update information indicating the storage location of all the sub-blocks included in the update data is received from the computer,
Updating the block information with the update information;
Storage device.

The storage apparatus according to claim 4, wherein
The target information is a representative sub-block selected by a predetermined rule from a plurality of sub-blocks included in the update data,
The sub-block information indicates an association between the representative sub-block and the block.
Storage device.

The storage device according to claim 5,
The controller calculates a message digest of each of the stored sub-blocks and includes it in the block information,
The duplicate information includes a message digest of each of the stored sub-blocks.
Storage device.

The storage apparatus according to claim 6, wherein
The plurality of sub-blocks are obtained by compressing a plurality of data segments obtained by dividing the block,
Storage device.

The storage apparatus according to claim 7, wherein
The block information indicates an association between the block and a user of the block;
The controller recognizes a plurality of blocks associated with different users as different blocks.
Storage device.

A storage device;
A first controller connected to the storage device;
A second controller connected to the first controller;
With
The second controller generates a plurality of sub-blocks based on the division of the block of data, and transmits the generated plurality of sub-blocks to the first controller;
The first controller is
Receiving the plurality of sub-blocks transmitted from the second controller, storing the received sub-blocks in the storage device;
Storing storage information associating the block with the plurality of stored sub-blocks;
The second controller transmits target information indicating a sub-block in the block to the first controller,
The first controller is
Receiving the target information transmitted from the second controller;
Based on the storage information, identify a block to which the sub-block indicated in the target information belongs,
Based on the storage information, generate duplicate information indicating the plurality of stored sub-blocks and send it to the second controller,
The second controller is a sub-block that generates a plurality of sub-blocks based on the division of the update data of the block and is not indicated in the duplication information among the plurality of sub-blocks generated from the update data Sending an update sub-block to the first controller;
The first controller receives the update sub-block transmitted from the second controller;
Storage system.

A method for controlling data in a storage apparatus connected to a computer and managing storage devices,
The storage device
A plurality of sub-blocks obtained based on the division of the block of data are received from the computer and stored in the storage device;
Storing storage information associating the block with the plurality of stored sub-blocks;
Receiving target information indicating sub-blocks in the block from the computer;
Based on the storage information, identify a block to which the sub-block indicated in the target information belongs,
Based on the stored information, generate duplicate information indicating the plurality of stored sub-blocks and send to the computer,
Receiving an update sub-block from the computer, which is a sub-block not shown in the duplication information among a plurality of sub-blocks obtained by dividing the update data of the block;
A method for controlling data in a storage apparatus.