ES2354429T3

ES2354429T3 - SYNCHRONIZED DATA TRANSFER SYSTEM.

Info

Publication number: ES2354429T3
Application number: ES04812687T
Authority: ES
Inventors: Steven W. Rose
Original assignee: INTERACTIVE CONTENT ENGINES LLC
Current assignee: INTERACTIVE CONTENT ENGINES LLC
Priority date: 2003-12-02
Filing date: 2004-12-02
Publication date: 2011-03-14
Anticipated expiration: 2024-12-02
Also published as: ATE487321T1; CN100410917C; CN1890658A; DE602004029925D1; HK1099817A1; IL175837A0; IL175837A

Abstract

Sistema de transferencia de datos sincronizado (100) que comprende: una serie de nodos de procesador (103); un conmutador de núcleo de red (101) acoplado a dicha serie de nodos de procesador para posibilitar la comunicación entre dicha serie de nodos de procesador; una serie de dispositivos de almacenamiento (111) distribuidos sobre dicha serie de nodos de procesador y que almacenan una serie de título, estando dividido cada título en una serie de subfragmentos (113) que están distribuidos sobre dicha serie de dispositivos de almacenamiento; una serie de procesos de transferencia (215), ejecutado cada uno de ellos en un nodo de procesador correspondiente de una serie de nodos de procesador y siendo operativo para enviar un mensaje (MSG) a un proceso de gestión de conmutador síncrono (219) para cada subfragmento que se tiene que transmitir desde un dispositivo de almacenamiento local de un nodo procesador fuente (203) a un nodo procesador de destino, incluyendo cada mensaje un identificador nodo fuente (SRC) que identifica uno de una serie de nodos de procesador como dicho nodo de procesador fuente y un identificador de nodo de destino (DST) que identifica uno de dicha serie de nodos de procesador como dicho nodo procesador de destino; y dicho proceso de gestión de conmutación síncrono, ejecutado por lo menos en uno de dicha serie de nodos de procesador (205), que envía periódicamente una orden de transmisión (TX CMD) a dicha serie de nodos de procesador para iniciar cada uno de los períodos de una serie de períodos de transmisión secuenciales que recibe una serie de mensajes (MSG), y antes de cada período de transmisión, que selecciona desde dicha serie de mensajes para asegurar que cada nodo de proceso envía hasta un subfragmento y recibe hasta un subfragmento durante un período siguiente, y que envía una serie de peticiones de transmisión (TXR) que corresponden a mensajes seleccionados; y en el que cada proceso de transferencia que ha enviado como mínimo un mensaje y que ha recibido una petición de transmisión (TXR) de dicho proceso gestionador de conmutación síncrono, identificando un subfragmento correspondiente, envía dicho subfragmento correspondiente (SC) durante el siguiente período de transmisión iniciado por una orden de transmisión enviada.Synchronized data transfer system (100) comprising: a series of processor nodes (103); a network core switch (101) coupled to said series of processor nodes to enable communication between said series of processor nodes; a series of storage devices (111) distributed on said series of processor nodes and storing a series of titles, each title being divided into a series of sub-fragments (113) that are distributed on said series of storage devices; a series of transfer processes (215), each executed in a corresponding processor node of a series of processor nodes and being operative to send a message (MSG) to a synchronous switch management process (219) to each sub-fragment that has to be transmitted from a local storage device of a source processor node (203) to a destination processor node, each message including a source node identifier (SRC) that identifies one of a series of processor nodes as said source processor node and a destination node identifier (DST) that identifies one of said series of processor nodes as said destination processor node; and said synchronous switching management process, executed in at least one of said series of processor nodes (205), which periodically sends a transmission order (TX CMD) to said series of processor nodes to initiate each of the periods of a series of sequential transmission periods that a series of messages (MSG) receives, and before each transmission period, which selects from said series of messages to ensure that each process node sends to a sub-fragment and receives up to a sub-fragment during a following period, and that sends a series of transmission requests (TXR) corresponding to selected messages; and wherein each transfer process that has sent at least one message and that has received a transmission request (TXR) of said synchronous switching management process, identifying a corresponding sub-fragment, sends said corresponding sub-fragment (SC) during the following period of transmission initiated by a transmission order sent.

Description

REFERENCIA A SOLICITUDES DE PATENTE RELACIONADAS REFERENCE TO RELATED PATENT APPLICATIONS

La presente solicitud reivindica el beneficio de la solicitud de patente provisional US nº60/526.437 presentada en 12/02/2003 y es una continuación en parte de la solicitud de patente US titulada “Interactive 5 Broadband Server System” (“Sistema Servidor de Banda Ancha Interactivo”) nº de serie 10/304.378 presentada en 11/26/2002, pendiente de tramitación, que por su parte reivindica el beneficio de la solicitud de patente provisional US nº60/333.856 presentada en 11/28/2001, todas las cuales tienen un inventor común con un mismo titular por asignación. The present application claims the benefit of the provisional US patent application No. 60 / 526,437 filed on 02/12/2003 and is a continuation in part of the US patent application entitled "Interactive 5 Broadband Server System" ("Broadband Server System Interactive ”) serial number 10 / 304,378 filed on 11/26/2002, pending processing, which in turn claims the benefit of provisional US patent application No. 60 / 333,856 filed on 11/28/2001, all of which have a common inventor with the same holder by assignment.

ANTECEDENTES DE LA INVENCIÓN 10 BACKGROUND OF THE INVENTION 10

SECTOR TÉCNICO DE LA INVENCIÓN TECHNICAL SECTOR OF THE INVENTION

La presente invención se refiere a sistemas servidores de banda ancha interactivos y, más particularmente, a un dispositivo con contenido interactivo que utiliza un sistema de transferencia de datos sincronizado, para facilitar el suministro simultaneo de múltiples flujos de datos isócronos a alta velocidad. The present invention relates to interactive broadband server systems and, more particularly, to a device with interactive content that uses a synchronized data transfer system, to facilitate simultaneous delivery of multiple isochronous high-speed data streams.

DESCRIPCIÓN DE LAS TÉCNICAS RELACIONADAS 15 DESCRIPTION OF RELATED TECHNIQUES 15

Es deseable proporcionar una solución al almacenamiento y suministro del contenido del flujo de medios. Un objetivo inicial para la escalabilidad es de 100 a 1.000.000 de flujos simultáneos individuales con contenido isócrono a 4 megabits por segundo (Mbps) por flujo, si bien se prevén diferentes velocidades de datos. La anchura de banda total disponible está limitada por la mayor conmutación disponible de la placa (“backplane”) disponible. Las mayores conmutaciones en la actualidad se encuentran en el rango de 20 terabits por segundo, es decir, aproximadamente 200.000 flujos de salida simultáneos. El número de flujos de salida es en general inversamente proporcional a la velocidad de bits por flujo. It is desirable to provide a solution to the storage and supply of media flow content. An initial objective for scalability is 100 to 1,000,000 individual simultaneous streams with isochronous content at 4 megabits per second (Mbps) per stream, although different data rates are expected. The total available bandwidth is limited by the greater available switching of the available backplane. The biggest commutes are currently in the range of 20 terabits per second, that is, approximately 200,000 simultaneous output streams. The number of output streams is generally inversely proportional to the bit rate per stream.

El modelo más simple de almacenamiento de contenido es un disco único conectado a un único procesador que tiene un solo conector de red. Los datos son leídos del disco, puestos en memoria y distribuidos en paquetes con intermedio de una red hacia cada usuario. Los datos tradicionales, tales como 25 páginas Web o similares, pueden ser facilitados de forma asíncrona. En otros términos, hay cantidades al azar de datos con retrasos de tiempo al azar. Se puede suministrar desde un servidor de Web un vídeo de bajo volumen y baja resolución. Un contenido de medios en tiempo real, tal como video y audio, requiere una transmisión isócrona o transmisión con tiempos de suministro garantizados. En esta situación, existe limitación de anchura de banda en la unidad de disco. El disco tiene movimiento de brazo y latencia 30 rotacional con los que se tiene que contar. Si el sistema puede soportar solamente 6 flujos simultáneos de contenido continuo desde el disco al procesador en cualquier momento, entonces la 7ª petición de un usuario debe esperar a que uno de los otros 6 usuarios anteriores ceda un flujo de contenido. La ventaja de este diseño es su simplicidad. La desventaja es el disco que, como único dispositivo mecánico del diseño, solo puede tener acceso y transferir datos con esta rapidez. 35 The simplest model of content storage is a single disk connected to a single processor that has a single network connector. The data is read from the disk, put in memory and distributed in packets through a network to each user. Traditional data, such as 25 Web pages or the like, can be provided asynchronously. In other words, there are random amounts of data with random time delays. A low volume and low resolution video can be supplied from a web server. Real-time media content, such as video and audio, requires isochronous transmission or transmission with guaranteed supply times. In this situation, there is bandwidth limitation on the disk drive. The disc has rotational arm movement and latency 30 that must be counted on. If the system can support only 6 simultaneous streams of continuous content from the disk to the processor at any time, then a user's 7th request must wait for one of the other 6 previous users to yield a content stream. The advantage of this design is its simplicity. The disadvantage is the disk that, as the only mechanical device in the design, can only access and transfer data with this speed. 35

Una mejora se puede conseguir añadiendo otra unidad de disco o de discos, e intercalando los accesos de disco. Asimismo, el contenido duplicado puede ser almacenado en cada disco ganando redundancia y rendimiento. Esta es mejor solución, pero existen todavía varios problemas. Solamente un determinado contenido puede ser colocado en el disco o discos locales. Las unidades de disco, CPU y memoria son, cada uno de ellos, puntos individuales de fallo que pueden ser catastróficos. El sistema 40 puede ser solamente escalado al número de unidades que puede manipular el controlador del disco. Incluso con muchas unidades existe el problema de la distribución de títulos. En la situación real todo el mundo desea ver las últimas películas. Como norma habitual, el 80% de peticiones de contenido se destinan a únicamente el 20% de los títulos. Toda la anchura de banda de un equipo no puede ser consumido por un título, puesto que bloquearía el acceso a títulos menos populares almacenados 45 solamente en este equipo. Como resultado, los títulos de “alta demanda” tendrían que ser cargados en la mayor parte de los equipos. En pocas palabras, si un usuario deseara ver una película antigua, éste usuario podría tener dificultades, aunque la película estuviera cargada en el sistema. Con una biblioteca grande, la proporción puede ser mucho mayor que la norma de 80/20 utilizada en este ejemplo. An improvement can be achieved by adding another disk or disk drive, and interleaving disk accesses. Also, duplicate content can be stored on each disk gaining redundancy and performance. This is a better solution, but there are still several problems. Only a certain content can be placed on the disk or local disks. The disk units, CPU and memory are, each of them, individual points of failure that can be catastrophic. System 40 can only be scaled to the number of units that the disk controller can manipulate. Even with many units there is the problem of the distribution of titles. In the real situation, everyone wants to see the latest movies. As a usual rule, 80% of content requests go to only 20% of the titles. All the bandwidth of a device cannot be consumed by a title, since it would block access to less popular titles stored only on this equipment. As a result, the "high demand" titles would have to be loaded on most of the equipment. In a nutshell, if a user wanted to watch an old movie, this user could have difficulties, even if the movie was loaded into the system. With a large library, the proportion can be much higher than the 80/20 standard used in this example.

Si el sistema se basara en la Red de Área Local estándar (LAN) utilizada en el proceso de datos, 50 se presentarían otras ineficiencias. Los sistemas TCP/IP basados en módem Ethernet son una maravilla de suministro garantizado, pero incluyen una penalización de tiempo provocada por colisiones de paquetes y retransmisiones de paquetes parcialmente perdidos y la gestión necesaria para hacer que funcione. No hay garantía de que un conjunto de flujos de contenido se encuentre a disposición en el momento oportuno. Asimismo, cada usuario consume un puerto de conmutación y cada servidor de contenido consume un 55 puerto de conmutación. Por lo tanto, el número de puertos de conmutación tiene que ser el doble del número de servidor, limitando el ancho de banda total en línea. If the system were based on the standard Local Area Network (LAN) used in the data processing, 50 other inefficiencies would arise. TCP / IP systems based on Ethernet modem are a marvel of guaranteed supply, but include a time penalty caused by packet collisions and retransmissions of partially lost packets and the management necessary to make it work. There is no guarantee that a set of content streams will be available at the right time. Also, each user consumes a switching port and each content server consumes a switching port. Therefore, the number of switching ports has to be twice the server number, limiting the total bandwidth online.

La arquitectura básica mostrada y descrita en el documento US 6.134.596 tiene dos características limitadoras. The basic architecture shown and described in US 6,134,596 has two limiting characteristics.

En primer lugar, el conmutador de red es el punto principal para la reconstrucción de segmentos de datos proporcionados por los servidores de datos acoplados al mismo. Esta configuración requiere una significativa capacidad adicional del procesador en el conmutador de red para reconstrucción y limita la 5 producción total. En segundo lugar, los puertos del conmutador de red deben ser compartidos entre los servidores de datos y los dispositivos acoplados al servidor de red para facilitar contenidos a los clientes. First, the network switch is the main point for the reconstruction of data segments provided by the data servers coupled to it. This configuration requires significant additional processor capacity in the network switch for reconstruction and limits total production. Second, the network switch ports must be shared between the data servers and the devices attached to the network server to provide content to the clients.

El documento US 5.367.520 da a conocer un medio para la reducción de la complejidad interna de una Modalidad de Transferencia Asíncrona (ATM) aumentando su fiabilidad a través de la modularidad y rutas redundantes internas. Utiliza tanto tampones de entrada como se salida y trata con una red que ya funciona 10 de manera asíncrona. Este enfoque no es útil o relevante para un sistema, según la presente invención, particularmente para información isócrona. Un conmutador de entrada con tampón autónomo no puede ser utilizado de manera fiable para recombinar datos en orden correcto para una salida determinada cuando el flujo de datos aparece en secuencia en múltiples entradas. El contenido interno de conmutación en rutas individuales puede provocar que los datos de la entrada 1 aparezcan en la salida designada después de 15 que hayan empezado a aparecer los datos de la entrada 2. Para un vídeo isócrono, esto no es favorable ya que los paquetes deben ser enviados en el orden correcto para producir una imagen coherente. US 5,367,520 discloses a means for reducing the internal complexity of an Asynchronous Transfer Mode (ATM) increasing its reliability through modularity and internal redundant routes. It uses both input and output buffers and deals with a network that already works asynchronously. This approach is not useful or relevant for a system, according to the present invention, particularly for isochronous information. An autonomous buffer input switch cannot be used reliably to recombine data in the correct order for a given output when the data stream appears in sequence in multiple inputs. The internal switching content on individual routes may cause the data of input 1 to appear on the designated output after 15 that the data of input 2 has begun to appear. For an isochronous video, this is not favorable since packets They must be sent in the correct order to produce a consistent image.

El documento USA 6.415.328 describe un sistema para recuperar datos en un servidor de vídeo en el que el proceso está centralizado en un controlador y programador, y los datos son almacenados dentro de un medio de almacenamiento centralizado. Este sistema es específico para un sistema de vídeo en el que 20 cada flujo de datos tiene una velocidad de consumo de datos máxima idéntica (anchura de banda de flujo de datos). Document US 6,415,328 describes a system for recovering data on a video server in which the process is centralized in a controller and programmer, and the data is stored within a centralized storage medium. This system is specific to a video system in which each data stream has an identical maximum data rate (data stream bandwidth).

Dado que el sistema conocido por el documento USA 6.415.328 requiere redundancia para equilibrado de la carga, hay situaciones probables en las que el fallo de un disco resultará en el fallo de entrega de un título a un usuario. Además, la magnitud de la redundancia requerida para un título depende de la 25 popularidad del título, adquiriendo una gestión extensa. El sistema conocido por el documento USA 6.415.328 no prevé el escalado del sistema más allá de la anchura de banda aceptada por el bus que conecta los dispositivos de almacenamiento o el bus que conecta los tampones de salida. La capacidad de flujo del sistema conocido está limitada por las capacidades máximas de bus y el sistema conocido consigue utilización máxima del bus almacenando copias redundantes del contenido. 30 Since the system known from US 6,415,328 requires redundancy for load balancing, there are likely situations in which the failure of a disk will result in the failure to deliver a title to a user. In addition, the magnitude of the redundancy required for a title depends on the popularity of the title, acquiring extensive management. The system known from US 6,415,328 does not provide for scaling the system beyond the bandwidth accepted by the bus that connects the storage devices or the bus that connects the output buffers. The flow capacity of the known system is limited by the maximum bus capacities and the known system achieves maximum use of the bus by storing redundant copies of the content. 30

La presente invención se ha definido en la reivindicación independiente de sistema 1 y la reivindicación independiente de método 12. The present invention has been defined in the independent claim of system 1 and the independent claim of method 12.

BREVE DESCRIPCIÓN DE LOS DIBUJOS BRIEF DESCRIPTION OF THE DRAWINGS

Las ventajas, características y beneficios que aporta la presente invención se comprenderán mejor con respecto a la siguiente descripción y dibujos adjuntos, en los que: 35 The advantages, features and benefits of the present invention will be better understood with respect to the following description and accompanying drawings, in which:

La figura 1 muestra un diagrama de bloques simplificado de una parte de un Dispositivo de Contenido Interactivo (ICE) implementado de acuerdo con una realización a título de ejemplo de la presente invención; y Figure 1 shows a simplified block diagram of a part of an Interactive Content Device (ICE) implemented in accordance with an exemplary embodiment of the present invention; Y

La figura 2 es un diagrama de bloque lógico de una parte del ICE de la figura 1, que muestra un sistema de transferencia de datos sincronizado implementado según una realización de la presente 40 invención. Figure 2 is a logical block diagram of a part of the ICE of Figure 1, showing a synchronized data transfer system implemented according to an embodiment of the present invention.

DESCRIPCIÓN DETALLADA DETAILED DESCRIPTION

La siguiente descripción tiene como objetivo posibilitar que un técnico ordinario en la materia pueda llevar a cabo y utilizar la presente invención, tal como se da a conocer dentro del contexto de una aplicación particular y de sus exigencias. No obstante, diferentes modificaciones de la realización 45 preferente quedarán evidentes a los técnicos en la materia y los principios generales definidos pueden ser aplicados a otras realizaciones. Por lo tanto, la presente invención no está destinada a su limitación a las realizaciones particulares que se han mostrado y descrito, sino que deberá recibir el ámbito más amplio coherente con los principios y nuevas características que se dan a conocer. The following description aims to enable an ordinary person skilled in the art to carry out and use the present invention, as disclosed within the context of a particular application and its requirements. However, different modifications of the preferred embodiment will be apparent to those skilled in the art and the general principles defined may be applied to other embodiments. Therefore, the present invention is not intended to be limited to the particular embodiments that have been shown and described, but should receive the broadest scope consistent with the principles and new features that are disclosed.

La arquitectura que se describe acepta componentes individuales de diferente capacidad para 50 evitar que la instalación quede limitada al momento de tiempo en el que se hizo la adquisición del sistema inicial. La utilización de componentes del comercio garantiza tecnología reciente bien comprobada, evitar fuentes de suministro únicas y los costes más bajos por flujo o envío de datos. Son tolerables fallos de componentes individuales. En muchos casos no hay cambios sensibles de comportamiento desde la perspectiva del usuario. En otros casos se produce un breve ciclo de “autorreparación”. En muchos casos 55 múltiples fallos pueden ser tolerados. Asimismo, en muchos casos sino en la totalidad, el sistema puede The architecture described accepts individual components of different capacity to prevent the installation from being limited to the time at which the initial system was acquired. The use of trade components guarantees well-proven recent technology, avoiding unique sources of supply and lower costs for data flow or delivery. Individual component failures are tolerable. In many cases there are no sensitive behavioral changes from the user's perspective. In other cases there is a brief cycle of "self-repair." In many cases 55 multiple faults can be tolerated. Also, in many cases but in the whole, the system can

recuperarse sin requerir atención inmediata, haciéndolo ideal para funcionamiento “desasistido” ("lights out"). recover without requiring immediate attention, making it ideal for "unattended" operation.

La atribución de almacenamiento de contenido y la anchura de banda interna son controladas automáticamente por algoritmos de tipo Menor Uso Reciente (LRU), lo que garantiza que el contenido en la memoria RAM y el dispositivo de disco duro son apropiados para la demanda actual y que la anchura de 5 banda del conmutador de placa (“backplane”) se utiliza de la manera más efectiva. La anchura de banda dentro del sistema es difícilmente, o nunca, sobreutilizada o requerida en exceso, por lo que no es necesario descartar o retrasar la transmisión de paquetes. La arquitectura proporciona la capacidad de aprovechar al máximo la anchura de banda compuesta de cada componente, de manera que se pueden cumplir las garantías, y la red es privada y se encuentra bajo control, de manera que incluso en una 10 situación de demanda pico no anticipada no hay sobrecarga de rutas de datos. Se pueden tolerar flujos con cualquier velocidad de bits, pero se puede esperar que los flujos típicos permanecerán dentro del rango de 1 a 20 Mbps. Un contenido asíncrono es tolerado en base a la amplitud de banda disponible. Se puede reservar amplitud de banda para este objetivo si es necesario para la aplicación. Los archivos pueden tener cualquier dimensión con un mínimo de ineficacia de almacenamiento. 15 Content storage allocation and internal bandwidth are automatically controlled by algorithms of the Lesser Recent Use (LRU) type, which ensures that the content in RAM and the hard disk device are appropriate for current demand and that The 5-band width of the backplane switch is used most effectively. The bandwidth within the system is hardly, or never, overused or excessively required, so it is not necessary to discard or delay packet transmission. The architecture provides the ability to take full advantage of the composite bandwidth of each component, so that guarantees can be fulfilled, and the network is private and under control, so that even in a peak demand situation In advance there is no overhead of data paths. Streams can be tolerated with any bit rate, but typical streams can be expected to remain within the range of 1 to 20 Mbps. Asynchronous content is tolerated based on the available bandwidth. Bandwidth can be reserved for this purpose if necessary for the application. Files can have any dimension with a minimum of inefficiency of storage. fifteen

La figura 1 es un diagrama de bloques simplificado de una parte de un Dispositivo de Contenidos Interactivo (ICE) (100) implementado de acuerdo con una realización a título de ejemplo de la presente invención. Las partes no aplicables para una comprensión completa de la presente invención no se han mostrado a efectos de claridad. El ICE (100) comprende un conmutador apropiado multipuerto Gigabit Ethernet (GbE) (101) como tejido “backplane” que tiene múltiples puertos eternet acoplados a una serie de 20 Nodos de Procesador de Almacenamiento (SPN) (103). Cada SPN (103) es un servidor simplificado que incluye dos puertos Gigabit Ethernet, uno o varios procesadores (107), una memoria (109) (por ejemplo, una memoria de acceso al azar (RAM)) y un número apropiado (por ejemplo de cuatro a ocho) unidades de disco (111). Un primer puerto Gb (105) de cada SPN (103) se conecta a un puerto correspondiente del conmutador (101) para funcionamiento duplex completo (transmisión y recepción simultáneas en cada 25 conexión SPN/puerto) y se utiliza para desplazar datos dentro del ICE 100. El otro puerto Gb (no mostrado) envía la salida de contenido a usuarios situados más abajo (no mostrados). Figure 1 is a simplified block diagram of a part of an Interactive Content Device (ICE) (100) implemented in accordance with an exemplary embodiment of the present invention. Parts not applicable for a complete understanding of the present invention have not been shown for clarity. The ICE (100) comprises an appropriate Gigabit Ethernet (GbE) multiport switch (101) as a backplane fabric having multiple ethernet ports coupled to a series of 20 Storage Processor Nodes (SPN) (103). Each SPN (103) is a simplified server that includes two Gigabit Ethernet ports, one or more processors (107), a memory (109) (for example, a random access memory (RAM)) and an appropriate number (for example four to eight) disk drives (111). A first Gb port (105) of each SPN (103) is connected to a corresponding switch port (101) for full duplex operation (simultaneous transmission and reception at every SPN connection / port) and is used to shift data within the ICE 100. The other Gb port (not shown) sends the content output to users below (not shown).

Cada SPN (103) tiene acceso de alta velocidad a sus unidades de disco locales y a las otras unidades de disco de los otros cuatro SPN de cada grupo de cinco SPN. El conmutador (101) es un “backplane” para el ICE (100) en vez de solamente un dispositivo de comunicación entre los SPN (103). 30 Solamente se han mostrado cinco SPN (103) con el objetivo de ilustración, debiéndose comprender que el ICE (100) comprende de manera típica un número más grande de servidores. Cada uno de los SPN (103) actúa como almacenamiento, proceso y transmisor de contenido. En la configuración que se ha mostrado cada SPN (103) está configurado utilizando componentes comerciales y no es un ordenador en el sentido usual. Si bien se prevén sistemas operativos estándar, dichos sistemas operativos accionados con 35 interrupción pueden presentar cuellos de botella innecesarios. Each SPN (103) has high speed access to its local disk drives and the other disk drives of the other four SPNs in each group of five SPNs. The switch (101) is a "backplane" for the ICE (100) instead of just a communication device between the SPNs (103). Only five SPNs (103) have been shown for the purpose of illustration, it being understood that the ICE (100) typically comprises a larger number of servers. Each of the SPN (103) acts as storage, process and content transmitter. In the configuration shown, each SPN (103) is configured using commercial components and is not a computer in the usual sense. While standard operating systems are provided, such operating systems operated with interruption may present unnecessary bottlenecks.

Cada título (por ejemplo, vídeo, película u otro contenido de medios) no está almacenado por completo en ninguna unidad de disco única (111). En vez de ello, los datos para cada título están divididos y almacenados entre varias unidades de disco dentro del ICE (100) para conseguir las ventajas de velocidad de acceso intercalado. El contenido de un título único es extendido en múltiples unidades de 40 disco de múltiples SPN (103). Se reúnen “marcos de tiempo” cortos de contenido del título en forma de carrusel de cada unidad de disco en cada SPN (103). De esta manera la carga física es extendida salvando los límites del número de unidades de disco de SCSI e IDE, se consigue una forma de funcionamiento con seguridad contra fallos, y se organizan y controlan una gran cantidad de títulos. Each title (for example, video, movie or other media content) is not completely stored on any single disk drive (111). Instead, the data for each title is divided and stored among several disk units within the ICE (100) to achieve the advantages of interleaved access speed. The content of a single title is extended in multiple 40 disk units of multiple SPNs (103). Short “time frames” of carousel-shaped title content from each disk drive in each SPN (103) meet. In this way the physical load is extended by saving the limits of the number of SCSI and IDE disk drives, a safe operation mode is achieved, and a large number of titles are organized and controlled.

En la configuración especifica que se ha mostrado, cada título de contenido es dividido en 45 fragmentos de un tamaño fijo (de manera típica unos 2 megabytes (MB) por fragmento). Cada fragmento es almacenado en un diferente conjunto de SPN (103) en disposición de carrusel. Cada fragmento es dividido en cuatro subfragmentos y cinco subfragmentos representan que se ha creado la paridad. Cada subfragmento es almacenado en una unidad de disco de diferente SPN (103). En la configuración que se ha mostrado y descrito, la dimensión del subfragmento de aproximadamente 512 kilobytes (KB) (siendo "K" 50 1024) se corresponde con la unidad nominal de datos de cada una de las unidades de disco (111). Los SPN (103) están dispuestos en grupos de cinco y cada grupo o conjunto de SPN almacena un fragmento de datos de un título. Tal como se ha mostrado, los cinco SPN (103) están marcados 1-4 y “Paridad”, que colectivamente almacenan un fragmento (113) como cinco subfragmentos separados (113a, 113b, 113c, 113d y 113e) almacenados en los SPN 1, 2, 3, 4 y Paridad, respectivamente. Los subfragmentos (113a-55 113e) se han mostrado almacenados de forma distribuida en un disco distinto para cada SPN distinto (por ejemplo, SPN1/DRIVE1, SPN2/DRIVE2, SPN3/DRIVE3, etc.), pero se pueden almacenar con cualquier otra combinación posible (por ejemplo, SPN1/DRIVE1, SPN2/DRIVE1, SPN3/DRIVE3, etc.). Los subfragmentos 1-4 comprenden los datos y el subfragmento paridad comprende la información de paridad para los subfragmentos de datos. Las dimensiones de cada conjunto de SPN si bien son típicamente de cinco, son 60 arbitrarias y podrían ser igualmente cualquier otro número, por ejemplo, 2 SPN a 10 SPN. Dos SPN utilizarían el 50% de su capacidad para redundancia y diez utilizarían el 10%. Cinco es un compromiso entre la eficacia de almacenamiento y la probabilidad de fallo. In the specific configuration shown, each content title is divided into 45 fragments of a fixed size (typically about 2 megabytes (MB) per fragment). Each fragment is stored in a different set of SPN (103) in carousel arrangement. Each fragment is divided into four subfragments and five subfragments represent that parity has been created. Each subfragment is stored in a disk drive of different SPN (103). In the configuration shown and described, the subfragment dimension of approximately 512 kilobytes (KB) (where "K" is 50 1024) corresponds to the nominal data unit of each of the disk drives (111). SPNs (103) are arranged in groups of five and each group or set of SPN stores a piece of data from a title. As shown, the five SPN (103) are marked 1-4 and "Parity", which collectively store a fragment (113) as five separate sub-fragments (113a, 113b, 113c, 113d and 113e) stored in the SPN 1 , 2, 3, 4 and Parity, respectively. The sub-fragments (113a-55 113e) have been shown distributed in a different way for each different SPN (for example, SPN1 / DRIVE1, SPN2 / DRIVE2, SPN3 / DRIVE3, etc.), but can be stored with any other possible combination (for example, SPN1 / DRIVE1, SPN2 / DRIVE1, SPN3 / DRIVE3, etc.). Subfragments 1-4 comprise the data and the parity subfragment comprises the parity information for the data subfragments. The dimensions of each SPN set, although typically five, are 60 arbitrary and could also be any other number, for example, 2 SPN to 10 SPN. Two SPNs would use 50% of their capacity for redundancy and ten would use 10%. Five is a compromise between storage efficiency and the probability of failure.

Al distribuir el contenido de esta manera, se consiguen, como mínimo, dos objetivos. En primer lugar, el número de usuarios que puede visionar un título determinado no queda limitado al número que puede ser servido mediante un conjunto único de SPN, sino por la anchura de banda de todos los conjuntos de SPN considerados conjuntamente. Por lo tanto, solamente se requiere una copia de cada título de contenido. El compromiso es la limitación del número de nuevos espectadores para un título 5 determinado que se pueden lanzar cada segundo, que es mucho menos limitativo que la carga de espacio y gestión desperdiciados de almacenamiento redundante. Un segundo objetivo es el incremento de la fiabilidad global del ICE (100). El fallo de un solo disco queda enmascarado por la regeneración en tiempo real de su contenido, utilizando el disco de Paridad, similar a un conjunto redundante de discos independientes (RAID). El fallo de un SPN (103) queda enmascarado por el hecho de que contiene un 10 disco de cada uno de varios conjuntos RAID, cada uno de los cuales continúa funcionando. Los usuarios conectados a un SPN que ha fallado son recuperados muy rápidamente por procesos “shadow” que funcionan en otros SPN. En caso de fallo de una unidad de disco o del conjunto SPN, el operador recibe notificación para reparar o sustituir el equipo que ha fallado. Cuando un subfragmento que falla es reconstruido por el proceso del usuario, se transmite en retorno al SPN que lo ha dispuesto, donde es 15 conservado en una RAM (tal como habría sido si hubiese sido leído del disco local). Esto evita el desperdicio de tiempo de otros procesos de usuario al realizar la misma reconstrucción de un título popular, al cumplimentarse peticiones subsiguientes a partir de la RAM, siempre que el subfragmento sea suficientemente popular para seguir guardado o reservado. By distributing the content in this way, at least two objectives are achieved. First, the number of users who can view a given title is not limited to the number that can be served by a single set of SPNs, but by the bandwidth of all SPN sets considered together. Therefore, only one copy of each content title is required. The commitment is to limit the number of new viewers for a given title 5 that can be launched every second, which is much less limiting than the wasted space load and redundant storage management. A second objective is to increase the overall reliability of the ICE (100). The failure of a single disk is masked by the real-time regeneration of its contents, using the Parity disk, similar to a redundant set of independent disks (RAID). The failure of an SPN (103) is masked by the fact that it contains a disk of each of several RAID sets, each of which continues to function. Users connected to a failed SPN are recovered very quickly by shadow processes that work in other SPNs. In the event of a disk drive or SPN assembly failure, the operator receives notification to repair or replace the equipment that has failed. When a subfragment that fails is rebuilt by the user's process, it is transmitted in return to the SPN that has arranged it, where it is kept in a RAM (as it would have been if it had been read from the local disk). This avoids wasting time from other user processes when performing the same reconstruction of a popular title, by completing subsequent requests from RAM, provided that the subfragment is popular enough to remain saved or reserved.

El objetivo de un proceso de usuario (UP) que funcionan en cada SPN (103) de usuario consiste 20 en reunir los subfragmentos de su propio disco más los cuatro subfragmentos correspondientes de otro SPN de usuario para reunir un fragmento de contenido de vídeo para su envío. Los SPN de usuario se distinguen de uno o varios SPN de gestión MGMT, que están configurados de la misma manera pero que llevan a cabo funciones distintas, tal como se describirá más adelante. Un par de SPN de MGMT redundantes se prevé que incrementen la fiabilidad y rendimiento. Las funciones de reunión y acoplamiento 25 llevadas a cabo por cada UP se realizan muchas veces teniendo en cuenta muchos usuarios en cada SPN (103) de usuario. Como consecuencia, existe una cantidad significativa de trafico de datos entre los SPN (103) de usuario. El protocolo eternet típico, con detección de colisión de paquetes y nuevos intentos, quedaría en otro caso superado. Los protocolos típicos están diseñados para transmisiones al azar y dependen de los intervalos de tiempo entre dichos eventos. Por lo tanto, este sistema no es utilizado. En el 30 ICE (100) se evitan las colisiones por utilización de un duplex completo, arquitectura completamente conmutada y gestionando la anchura de banda cuidadosamente. La mayor parte de la comunicación se realiza de forma sincronizada. El conmutador (101) en sí mismo es controlado de manera sincronizada, tal como se describirá mas adelante, de manera que las transmisiones son coordinadas. Dado que se determina cual de los SPN (103) pasa a transmitir y cuando lo hace, los puertos no quedan excedidos por 35 más datos de los que pueden manipular durante un periodo de tiempo determinado. Ciertamente, los datos son reunidos en primer lugar en la memoria (109) de los SPN de usuario (103) y a continuación su transferencia es controlada de forma sincronizada. Como parte de la disposición existen señales de situación entre los SPN de usuario (103). A diferencia del contenido real que pasa a los usuarios finales, las dimensiones de datos para señalización entre las unidades SPN de usuario son muy pequeñas. 40 The objective of a user process (UP) operating in each user SPN (103) is to gather the subfragments of its own disk plus the four corresponding subfragments of another user SPN to gather a fragment of video content for its Shipping. User SPNs are distinguished from one or more MGMT management SPNs, which are configured in the same way but perform different functions, as will be described later. A pair of redundant MGMT SPNs is expected to increase reliability and performance. The meeting and coupling functions 25 carried out by each UP are often carried out taking into account many users in each user SPN (103). As a consequence, there is a significant amount of data traffic between the user SPN (103). The typical ethernet protocol, with packet collision detection and new attempts, would otherwise be overcome. Typical protocols are designed for random transmissions and depend on the time intervals between such events. Therefore, this system is not used. In 30 ICE (100) collisions are avoided by using a complete duplex, completely switched architecture and managing bandwidth carefully. Most of the communication is done synchronously. The switch 101 itself is controlled in a synchronized manner, as will be described later, so that the transmissions are coordinated. Since it is determined which of the SPNs (103) happens to transmit and when it does, the ports are not exceeded by more data than they can manipulate for a certain period of time. Certainly, the data is first collected in the memory (109) of the user SPN (103) and then its transfer is controlled synchronously. As part of the arrangement there are situation signals between user SPNs (103). Unlike the actual content that passes to the end users, the data dimensions for signaling between the user SPN units are very small. 40

La longitud de cada subfragmento (aproximadamente 512K, siendo “K” 1024) superaría de otro modo cualquier efecto tampón disponible en el conmutador GbE (101) si se permitiera la transmisión de subfragmentos al azar o de manera asíncrona. El periodo para transmitir esta cantidad de información es de unos 4 milisegundos (ms) y es deseable asegurarse de que varios puertos no intentan transmitir y no transmiten a un solo puerto simultáneamente. Por lo tanto, tal como se describe más adelante, el 45 conmutador (101) es gestionado en una forma que provoca su funcionamiento asíncrono, con todos los puertos completamente utilizados en condiciones de carga completa. The length of each sub-fragment (approximately 512K, being "K" 1024) would otherwise exceed any buffer effect available on the GbE switch (101) if transmission of sub-fragments was allowed randomly or asynchronously. The period to transmit this amount of information is about 4 milliseconds (ms) and it is desirable to ensure that several ports do not attempt to transmit and do not transmit to a single port simultaneously. Therefore, as described below, the switch (101) is managed in a manner that causes its asynchronous operation, with all ports fully used under full load conditions.

El proceso de guía o directorio redundante que controla el sistema de archivo (o sistema de archivo virtual o VFS) es responsable de informar dónde está almacenado un título de contenido determinado cuando es solicitado por un usuario. También es responsable de atribuir el espacio de 50 almacenamiento requerido cuando se tiene que cargar un nuevo título. Todas las atribuciones tienen lugar en fragmentos integrales, cada uno de los cuales está compuesto de cinco subfragmentos. El espacio en cada unidad de disco es controlado dentro de la unidad por un Logical Block Address (LBA) (“dirección de bloque lógico”). Un subfragmento es almacenado en una unidad de disco en sectores contiguos o direcciones LBA. La capacidad de cada unidad de disco en el ICE (100) está representada por su máximo 55 de direcciones LBA dividido por el máximo de sectores por subfragmento. The redundant directory or directory process that controls the file system (or virtual or VFS file system) is responsible for informing where a particular content title is stored when requested by a user. It is also responsible for allocating the 50 storage space required when a new title has to be loaded. All attributions take place in integral fragments, each of which is composed of five subfragments. The space on each disk drive is controlled within the drive by a Logical Block Address (LBA). A subfragment is stored in a disk drive in contiguous sectors or LBA addresses. The capacity of each disk drive in the ICE (100) is represented by its maximum 55 LBA addresses divided by the maximum of sectors per sub-fragment.

Cada mapa de título o “inscripción de directorio contiene una lista indicando dónde están almacenados los fragmentos de su título y, de manera más específica, dónde está situado cada subfragmento de cada fragmento. En la realización que se ha mostrado, cada elemento de la lista que representa un subfragmento contiene un SPNID que identifica el SPN (103) de un usuario específico, un 60 número de unidad de disco (DD#) identificando un disco específico (111) del SNP (103) de usuario identificado y un apuntador de subfragmento (o Dirección de Bloque Lógico o LBA) empaquetado como valor de 64 bits. Cada inscripción de directorio contiene una lista de subfragmentos de aproximadamente Each title map or “directory inscription contains a list indicating where the fragments of its title are stored and, more specifically, where each sub-fragment of each fragment is located. In the embodiment shown, each item in the list representing a sub-fragment contains an SPNID that identifies the SPN (103) of a specific user, a 60 disk unit number (DD #) identifying a specific disk (111) of the identified user SNP (103) and a sub-fragment pointer (or Logical Block Address or LBA) packaged as a 64-bit value. Each directory entry contains a list of sub-fragments of approximately

media hora de contenido para los 4 Mbps nominales. Esto es igual a 450 fragmentos o 2250 subfragmentos. Cada registro de directorio tiene aproximadamente 20 KB de datos auxiliares. Cuando una UP que se ejecuta en un SPN pide una inscripción de directorio, la inscripción completa es enviada y almacenada localmente para el usuario correspondiente. Incluso si un SPN soporta 1000 usuarios, solamente se consumen 20 MB de memoria para las listas locales o inscripciones de directorio. 5 half an hour of content for the nominal 4 Mbps. This is equal to 450 fragments or 2250 subfragments. Each directory record has approximately 20 KB of auxiliary data. When a UP running in an SPN requests a directory entry, the entire entry is sent and stored locally for the corresponding user. Even if an SPN supports 1000 users, only 20 MB of memory is consumed for local lists or directory entries. 5

El ICE (100) mantiene una base de datos de todos los títulos disponibles para un usuario. La lista comprende la biblioteca del dicho óptico local, programación de red en tiempo real y títulos en lugares alejados, con los que se han realizado acuerdos de licencia y transporte. La base de datos contiene todos los metadatos para cada título, incluyendo información de gestión (periodo de licencia, velocidad de bits, resolución, etc.), así como información de interés para el usuario (productor, director, intérpretes, dotación 10 de personal, autor, etc.). Cuando el usuario realiza una selección, se pregunta a un directorio de un sistema de archivo virtual (VFS) (209) (figura 2) para determinar si el título ha sido ya cargado en el dispositivo de disco. En caso contrario, se inicia un proceso de carga (no mostrado) para dicha pieza de contenido y se notifica al UP en caso necesario cuando se encontrará disponible para su visionado. En la mayor parte de casos, la latencia no es superior a la latencia mecánica del robot de recuperación del disco óptico (no 15 mostrado), es decir, unos 30 segundos. The ICE (100) maintains a database of all the titles available to a user. The list includes the library of the local optical said, real-time network programming and titles in remote locations, with which license and transport agreements have been made. The database contains all metadata for each title, including management information (license period, bit rate, resolution, etc.), as well as information of interest to the user (producer, director, interpreters, staffing 10 , author, etc.). When the user makes a selection, a directory of a virtual file system (VFS) (209) (figure 2) is asked to determine if the title has already been loaded on the disk device. Otherwise, a loading process (not shown) for said piece of content is initiated and the UP is notified if necessary when it will be available for viewing. In most cases, the latency is not greater than the mechanical latency of the optical disk recovery robot (not shown), that is, about 30 seconds.

La información almacenada en el disco óptico (no mostrado) incluye todos los metadatos (que son leídos en la base de datos cuando el disco es cargado por primera vez en la biblioteca) así como el vídeo y audio digitales comprimidos que representan el título y toda la información que puede ser extraída por adelantado con respecto a dichos flujos de datos. Por ejemplo, contiene apuntadores para toda la 20 información relevante en los flujos de datos, tales como valores de reloj y sellos de tiempo. Está dividido ya en subfragmentos con el subfragmento de paridad precalculado y almacenado en el disco. En general, cualquier cosa que se pueda hacer por adelantado para ahorrar tiempo de carga y estructura de proceso está incluida en el disco óptico. The information stored on the optical disc (not shown) includes all metadata (which is read in the database when the disc is first loaded into the library) as well as compressed digital video and audio representing the title and all the information that can be extracted in advance with respect to such data flows. For example, it contains pointers for all relevant information in data flows, such as clock values and time stamps. It is already divided into subfragments with the precalculated parity subfragment and stored on the disk. In general, anything that can be done in advance to save loading time and process structure is included in the optical disk.

Se incluye en el sistema de gestión de recursos un expedidor (no mostrado) que la UP consulta 25 para recibir el tiempo de inicio para el flujo de dato (usualmente dentro de milisegundos de la petición). El expedidor asegura que la carga del sistema permanece regular, que se minimiza la latencia y que en ningún momento el ancho de banda requerido dentro del ICE (100) supera el disponible. En el caso en el que un usuario pide una interrupción, pausa, avance rápido, rebobinado u otra operación que interrumpe el paso de su flujo de datos, su ancho de banda es desasignado y se hace una nueva asignación para 30 cualquier nuevo servicio solicitado (por ejemplo, un flujo de avance rápido). A sender (not shown) that the UP consults 25 to receive the start time for the data flow (usually within milliseconds of the request) is included in the resource management system. The shipper ensures that the system load remains regular, that the latency is minimized and that at no time the bandwidth required within the ICE (100) exceeds that available. In the case where a user requests an interruption, pause, fast forward, rewind or other operation that interrupts the flow of their data flow, their bandwidth is deallocated and a new assignment is made for any new requested service ( for example, a fast forward flow).

La figura 2 es un diagrama de bloques lógicos de una parte del ICE (100) mostrando un sistema (200) de transferencia de datos sincronizado implementado de acuerdo con una realización de la presente invención. El conmutador (101) se ha mostrado acoplado a varios SPN (103) a título de ejemplo, incluyendo un SPN (201) de un primer usuario, un SPN (203) de un segundo usuario y un SPN (205) de gestión 35 (MGMT). Tal como se ha indicado previamente, muchos de los SPN (103) están acoplados al conmutador (101) y solamente se han mostrado, a efectos ilustrativos de la presente invención, dos SPN (201, 203) de usuario y se han implementado físicamente igual que cualquier SPN (103) igual que se ha descrito anteriormente. El SPN (205) MGMT está implementado físicamente igual que cualquier otro SPN (103), pero de manera general lleva a cabo funciones de gestión en vez de funciones específicas de usuario. El 40 SPN (201) muestra ciertas funciones y el SPN (203) muestra otras funciones de cada SPN (103) de usuario. No obstante, se comprenderá que cada SPN (103) de usuario está configurada para llevar a cabo similares funciones, de manera que las funciones (y procesos) descritos para el SPN (201) están dispuestos también en el SPN (203) y viceversa. Figure 2 is a logic block diagram of a part of the ICE (100) showing a synchronized data transfer system (200) implemented in accordance with an embodiment of the present invention. The switch (101) has been shown coupled to several SPN (103) by way of example, including an SPN (201) of a first user, an SPN (203) of a second user and a management SPN (205) 35 ( MGMT). As previously indicated, many of the SPNs (103) are coupled to the switch (101) and only two user SPNs (201, 203) have been shown for illustrative purposes and have been physically implemented the same than any SPN (103) same as described above. The SPN (205) MGMT is physically implemented just like any other SPN (103), but it generally performs management functions instead of user-specific functions. The SPN (201) shows certain functions and the SPN (203) shows other functions of each user SPN (103). However, it will be understood that each user SPN (103) is configured to perform similar functions, so that the functions (and processes) described for the SPN (201) are also arranged in the SPN (203) and vice versa.

Tal como se ha descrito en lo anterior, el conmutador (101) funciona a 1 Gbps por puerto, de 45 manera que cada subfragmento (aproximadamente 512 KB) requiere aproximadamente 4 ms para pasar de un SPN a otro. Cada SPN (103) de usuario ejecuta uno o varios procesos de usuario (UP), cada uno de los cuales soporta a un usuario situado más abajo. Cuando se necesita un nuevo fragmento de un título para rellenar un tampón de salida de usuario (no mostrado), se solicitan los cinco siguientes subfragmentos de la lista de los otros SPN de usuario que almacenan estos subfragmentos. Dado que muchos UP requieren 50 potencialmente múltiples subfragmentos sustancialmente al mismo tiempo, la duración de transmisión del subfragmento superaría de otro modo la capacidad de tampón de casi cualquier conmutador GbE con un puerto único y mucho más para todo el conjunto. Esto es cierto para el conmutador (101) que se ha mostrado. Si no se controla la transmisión de subfragmentos, ello resultaría en que se devolverían simultáneamente todos los cinco subfragmentos de cada UP simultáneamente, superando la anchura de 55 banda del puerto de salida. Es deseable ajustar la temporización de las transmisiones de SPN del ICE (100), de manera que los datos más críticos se transmitan en primer lugar e intactos. As described above, the switch (101) operates at 1 Gbps per port, so that each sub-fragment (approximately 512 KB) requires approximately 4 ms to pass from one SPN to another. Each user SPN (103) executes one or more user processes (UP), each of which supports a user located below. When a new fragment of a title is needed to fill a user exit buffer (not shown), the following five sub-fragments of the list of other user SPNs that store these sub-fragments are requested. Since many UPs require 50 potentially multiple sub-fragments substantially at the same time, the transmission duration of the sub-fragment would otherwise exceed the buffer capacity of almost any GbE switch with a single port and much more for the entire assembly. This is true for the switch (101) shown. If the transmission of subfragments is not controlled, this would result in all five subfragments of each UP being returned simultaneously, exceeding the bandwidth of the output port. It is desirable to adjust the timing of ICE SPN transmissions (100), so that the most critical data is transmitted first and intact.

El SPN (201) se ha mostrado llevando a cabo un UP (207) para el servicio de un usuario correspondiente situado más abajo. El usuario pide un título (por ejemplo, una película), cuya petición es enviada al UP (207). El UP (207) transmite la petición de título (TR) al VFS (209) (que se describe más 60 adelante) situado en el SPN (205) MGMT. El VFS (209) devuelve una dirección de directorio (DE) al UP (207), que normalmente almacena el DE mostrado en (211). El DE (211) comprende una lista que localiza The SPN (201) has been shown performing a UP (207) for the service of a corresponding user located below. The user asks for a title (for example, a movie), whose request is sent to the UP (207). The UP (207) transmits the title request (TR) to the VFS (209) (described below 60) located in the SPN (205) MGMT. The VFS (209) returns a directory address (DE) to the UP (207), which normally stores the DE shown in (211). The DE (211) comprises a list that locates

cada subfragmento del título (SC1, SC2, etc.), incluyendo cada entrada el SPNID que identifica un SPN (103) específico de usuario, identificando el número de unidad de disco (DD#) una unidad de disco específica (111) del SPN identificado (103) y una dirección o LBA que proporciona la localización específica del subfragmento en la unidad de disco identificada. El SPN (201) inicia una petición de lectura con sello de tiempo (TSRR) para cada subfragmento en el DE (211), una cada vez. En el ICE (100) las peticiones se 5 hacen inmediata y directamente. En otras palabras, el SPN (201) empieza llevando a cabo las peticiones de los subfragmentos inmediata y directamente al SPN (103) de usuario específico almacenando los datos. En la configuración que se ha mostrado, las peticiones son realizadas del mismo modo, aunque estén almacenadas localmente. En otras palabras, aunque el subfragmento pedido se encuentre en una unidad de disco local en el SPN (201), envía la petición mediante el conmutador (201) aunque esté situado en 10 posición remota. La red es la localización que puede ser configurada para reconocer que se está enviando una petición desde un SPN al mismo SPN. Es más simple manipular todos los casos igual, especialmente en instalaciones grandes en las que es menos probable que la petición sea realmente local. each sub-fragment of the title (SC1, SC2, etc.), including each entry the SPNID that identifies a user-specific SPN (103), identifying the disk unit number (DD #) a specific disk unit (111) of the SPN identified (103) and an address or LBA that provides the specific location of the sub-fragment in the identified disk drive. The SPN (201) initiates a time stamp read request (TSRR) for each subfragment in the DE (211), one at a time. In ICE (100) requests are made immediately and directly. In other words, the SPN (201) begins carrying out the subfragment requests immediately and directly to the specific user SPN (103) storing the data. In the configuration shown, the requests are made in the same way, even if they are stored locally. In other words, even if the requested sub-fragment is in a local disk drive in the SPN (201), it sends the request through the switch (201) even if it is located in a remote position. The network is the location that can be configured to recognize that a request is being sent from an SPN to the same SPN. It is simpler to handle all cases the same, especially in large facilities where the request is less likely to be really local.

Si bien las peticiones son enviadas inmediata e indirectamente, cada uno de los subfragmentos son devueltos de manera completamente gestionada. Cada TSRR es enviada al SPN de usuario específico 15 utilizando el SPNID e incluye el DD# y LBA para el SPN de usuario objetivo para recuperar y devolver los datos. La TSRR puede incluir además cualquier otra información de identificación suficiente para asegurar que el subfragmento pedido es devuelto apropiadamente al peticionario apropiado y para posibilitar que el peticionario identifique el subfragmento (por ejemplo, identificador UP para distinguirse entre muchos UP que se ejecutan en el SPN de destino, un identificador de subfragmento para distinguir entre los 20 subfragmentos para cada fragmento de datos, etc). Cada TSRR comprende también un sello de tiempo (TS) identificando el tiempo específico cuando se realizó la petición original. El TS identifica la prioridad de la petición a efectos de retransmisión síncrona, en la que la prioridad se basa en el tiempo, de manera que las peticiones anteriores tienen una prioridad más elevada. Una vez recibidos, los subfragmentos devueltos del título pedido son almacenados en una memoria de título local (213) para proceso posterior y suministro 25 al usuario que ha pedido el título. While requests are sent immediately and indirectly, each sub-fragment is returned in a fully managed manner. Each TSRR is sent to the specific user SPN 15 using the SPNID and includes the DD # and LBA for the target user SPN to retrieve and return the data. The TSRR may also include any other identifying information sufficient to ensure that the requested sub-fragment is properly returned to the appropriate petitioner and to enable the petitioner to identify the sub-fragment (for example, UP identifier to distinguish between many UPs that are executed in the SPN of destination, a subfragment identifier to distinguish between the 20 subfragments for each piece of data, etc). Each TSRR also includes a time stamp (TS) identifying the specific time when the original request was made. The TS identifies the priority of the request for synchronous retransmission purposes, in which the priority is based on time, so that the previous requests have a higher priority. Once received, the subfragments returned from the requested title are stored in a local title memory (213) for further processing and supply to the user who has requested the title.

El SPN (203) de usuario muestra el funcionamiento de un proceso de transferencia (TP) (215) y funciones de soporte que se ejecutan en cada SPN de usuario (por ejemplo, (201, 203)) para recibir las TSRR y para devolver los subfragmentos pedidos. El TP (215) comprende un proceso de almacenamiento (no mostrado) o se encuentra en interfaz con el mismo, que interconecta las unidades locales de disco 30 (111) en el SPN (203) para pedir y tener acceso a los subfragmentos almacenados. El proceso de almacenamiento puede ser implementado de cualquier manera deseada, tal como una situación máquina (“state machine”) o similar y puede ser un proceso separado interconectado entre el TP (215) y las unidades locales de disco (111), tal como es conocido por los técnicos en la materia. Tal como se ha mostrado, el TP (215) recibe una o varias TSRR de uno o varios UP ejecutándose en los SPN (103) de otro 35 usuario y almacena cada petición en una cola de peticiones de lectura (RRQ) (217) en su memoria local (109). La RRQ (217) almacena una lista de peticiones de subfragmentos SCA, SCB, etc. La unidad de disco que almacena los subfragmentos solicitados elimina las peticiones correspondientes de la RRQ (217), las clasifica por orden físico y, a continuación, ejecuta cada lectura en el orden clasificado. Los accesos a subfragmentos en cada disco son gestionados en grupos. Cada grupo es clasificado por orden físico, de 40 acuerdo con una operación de “búsqueda de ascensor” (barrido desde abajo hacia arriba, barrido siguiente de arriba hacia abajo, etc., de manera que el cabezal del disco efectúa barrido hacia atrás y hacia delante sobre la superficie del disco parando para leer el siguiente subfragmento secuencial). Las peticiones para lecturas satisfactorias son almacenadas en una cola de lectura satisfactoria (SRQ) (218) clasificada en orden TS. Las peticiones de lectura que han fallado (si existen) son almacenadas en una cola de lectura de 45 fallos (FRQ) (220) y la información que ha fallado es enviada a un sistema de gestión de red (no mostrado) que determina el error y la acción correctiva apropiada. Se debe observar que en la configuración mostrada las colas (217, 218 y 220) almacenan información de petición en vez de los verdaderos subfragmentos. The user SPN (203) shows the operation of a transfer process (TP) (215) and support functions that are executed in each user SPN (for example, (201, 203)) to receive the TSRRs and to return Subfragments ordered. The TP (215) comprises a storage process (not shown) or is in interface with it, which interconnects the local disk drives 30 (111) in the SPN (203) to request and access the stored sub-fragments. The storage process can be implemented in any desired manner, such as a machine state (state machine) or the like and can be a separate process interconnected between the TP (215) and the local disk drives (111), such as It is known by those skilled in the art. As shown, the TP (215) receives one or more TSRRs from one or more UPs running in the SPN (103) of another user and stores each request in a queue of read requests (RRQ) (217) in your local memory (109). The RRQ (217) stores a list of requests for subfragments SCA, SCB, etc. The disk drive that stores the requested subfragments removes the corresponding RRQ requests (217), classifies them by physical order and then executes each reading in the classified order. Access to subfragments on each disk are managed in groups. Each group is classified by physical order, according to an “elevator search” operation (sweep from bottom to top, next sweep from top to bottom, etc.), so that the disk head sweeps back and forth ahead on the surface of the disk stopping to read the next sequential subfragment). Requests for satisfactory readings are stored in a satisfactory read queue (SRQ) (218) classified in TS order. Read requests that have failed (if any) are stored in a 45-fault read queue (FRQ) (220) and the information that has failed is sent to a network management system (not shown) that determines the error and appropriate corrective action. It should be noted that in the configuration shown the queues (217, 218 and 220) store request information instead of the true sub-fragments.

Cada subfragmento que es leído satisfactoriamente es colocado en memoria reservada para una LRU de reserva de subfragmentos recientemente pedidos. Para cada subfragmento recuperado, el TP 50 (215) crea un correspondiente mensaje (MSG) que incluye el TS para el subfragmento, la fuente (SRC) del subfragmento (por ejemplo, el SPNID del que se está transmitiendo el subfragmento y su localización de memoria física, junto con cualquier otra información de identificación) y el SPN de destino (DST) al que se tiene que transmitir el subfragmento (por ejemplo, el SPN (201)). Tal como se ha mostrado, la SRQ (218) incluye mensajes MSGA, MSGB, etc., para subfragmentos SCA, SCB, etc., respectivamente. Después de 55 que se han leído y reservado los subfragmentos pedidos, el TP (215) envía MSG correspondientes a un gestor de conmutador sincronizado (SSM) (219) que se ejecuta en el SPN (205) MGMT. Each subfragment that is read successfully is placed in memory reserved for a reserve LRU of recently requested subfragments. For each recovered sub-fragment, TP 50 (215) creates a corresponding message (MSG) that includes the TS for the sub-fragment, the source (SRC) of the sub-fragment (for example, the SPNID from which the sub-fragment is being transmitted and its location). physical memory, along with any other identification information) and the destination SPN (DST) to which the sub-fragment has to be transmitted (for example, SPN (201)). As shown, the SRQ (218) includes messages MSGA, MSGB, etc., for sub-fragments SCA, SCB, etc., respectively. After the requested subfragments have been read and reserved, the TP (215) sends corresponding MSGs to a synchronized switch manager (SSM) (219) running on the MGMT SPN (205).

El SSM (219) recibe y prioriza múltiples MSG recibidos del TP a partir del SPN del usuario y eventualmente envía una petición de transmisión (TXR) al TP (215) identificando uno de los MSG en su SRQ (218), tal como utilizando un identificador de mensaje (MSGID) o similar. Cuando el SSM (219) envía 60 un TXR al TP (215) con un MSGID que identifica un subfragmento en el SRQ (218), la lista de peticiones es desplazada del SRQ (218) a un proceso de transferencia de red (NTP) (221) que constituye los paquetes utilizados para transferir el subfragmento al SPN del usuario de destino (designando el término “desplaza” The SSM (219) receives and prioritizes multiple MSGs received from the TP from the user's SPN and eventually sends a transmission request (TXR) to the TP (215) identifying one of the MSGs in its SRQ (218), such as using a message identifier (MSGID) or similar. When the SSM (219) sends 60 a TXR to the TP (215) with an MSGID that identifies a sub-fragment in the SRQ (218), the request list is moved from the SRQ (218) to a network transfer process (NTP) (221) which constitutes the packets used to transfer the subfragment to the SPN of the destination user (designating the term "displace"

la retirada de la petición de la SRQ (218)). El orden en el que las listas de peticiones de subfragmentos son retiradas de la SRQ (218) no es necesariamente secuencial, a pesar de que la lista se encuentra en el orden del sello de tiempo, dado que solamente el SSM (219) determina el orden apropiado. El SSM (219) envía un TXR a cada uno de los otros SPN (103) teniendo, como mínimo, un subfragmento para enviar, excepto que el subfragmento tenga que ser enviado a un UP en un SPN(103) ya programado para recibir 5 un subfragmento con una prioridad igual o superior, tal como se describe más adelante. El SSM (219) envía a continuación una sola orden de transmisión (TX CMD) a todos los SPN de usuario (103). El TP (215) ordena al NTP (221) la transmisión del subfragmento al UP que lo ha pedido del SPN (103) del usuario como respuesta a la orden TX CMD enviada por el SSM (219). De esta manera, cada SPN (103) que ha recibido una TXR del SSM (219) transmite simultáneamente a otro SPN (103) de usuario que lo pide. 10 the withdrawal of the SRQ request (218)). The order in which the subfragment request lists are removed from the SRQ (218) is not necessarily sequential, although the list is in the order of the time stamp, since only the SSM (219) determines the appropriate order. The SSM (219) sends a TXR to each of the other SPN (103) having at least one sub-fragment to send, except that the sub-fragment has to be sent to a UP in an SPN (103) already programmed to receive 5 a subfragment with an equal or higher priority, as described below. The SSM (219) then sends a single transmission order (TX CMD) to all user SPNs (103). The TP (215) orders the NTP (221) to transmit the subfragment to the UP that has requested it from the SPN (103) of the user in response to the TX CMD command sent by the SSM (219). In this way, each SPN (103) that has received a TXR from the SSM (219) simultaneously transmits to another user SPN (103) that requests it. 10

El VFS (109) en el SPN (205) MGMT gestiona la lista de títulos y sus localizaciones en el ICE (100). En sistemas de ordenador típicos, los directorios (información de datos) residen usualmente en el mismo disco en el que se encuentran los datos. No obstante, en el ICE (100), el VFS (209) está situado centralmente para gestionar los datos distribuidos, dado que los datos de cada título son distribuidos por múltiples discos en el dispositivo de disco, que a su vez son distribuidos mediante múltiples SPN (103) de 15 usuario. Tal como se ha descrito en lo anterior, las unidades de disco (111) de los SPN (103) de usuario almacenan principalmente los subfragmentos de los títulos. El VFS (209) comprende identificadores para la localización de cada subfragmento con intermedio de SPNID, DD# y la LBA, tal como se ha descrito en lo anterior. El VFS (209) comprende también identificadores para otras partes de la ICE (100) que son externas, tales como el almacenamiento óptico. Cuando el usuario pide un título, se pone a disposición del 20 UP, que se ejecuta en el SPN (103) de usuario que ha aceptado la petición de usuario, un conjunto completo de información de directorio (ID/direcciones). Desde allí, la tarea consiste en transferir los subfragmentos desde la unidad de disco a memorias (tampones), desplazándolos con intermedio del conmutador (101) al SPN (103) de usuario que lo pide, que reúne un fragmento completo en un tampón, lo suministra al usuario y repite hasta la terminación. 25 The VFS (109) in the SPN (205) MGMT manages the list of titles and their locations in the ICE (100). In typical computer systems, directories (data information) usually reside on the same disk on which the data is located. However, in ICE (100), the VFS (209) is centrally located to manage distributed data, since the data of each title is distributed by multiple disks in the disk device, which in turn are distributed by multiple SPN (103) of 15 users. As described above, the disk drives (111) of the user SPNs (103) primarily store the sub-fragments of the titles. The VFS (209) comprises identifiers for the location of each sub-fragment through SPNID, DD # and the LBA, as described above. The VFS (209) also comprises identifiers for other parts of the ICE (100) that are external, such as optical storage. When the user requests a title, it is made available to the 20 UP, which is executed in the user SPN (103) that has accepted the user request, a complete set of directory information (ID / addresses). From there, the task is to transfer the subfragments from the disk drive to memories (buffers), moving them through the switch (101) to the user SPN (103) that asks for it, which gathers a complete fragment in a buffer, supply the user and repeat until termination. 25

El SSM (219) crea una lista de mensajes “preparados” en orden de sello de tiempo en una lista de mensajes preparados (RDY MSG) (223). El orden en el que se reciben los mensajes de los TP en los SPN (103) de usuario no se encuentra necesariamente en el orden del sello de tiempo, sino que están organizados en el orden TS en la lista RDY MSG (223). Justamente antes del siguiente conjunto de transferencias, el SSM (219) escanea la lista RDY MSG (223) empezando por el sello de tiempo más 30 antiguo. El SSM (219) identifica en primer lugar el TS más antiguo en la lista RDY MSG (223) y genera y envía el correspondiente mensaje TXR al TP (215) del SPN (103) de usuario almacenando el subfragmento correspondiente para iniciar la transferencia pendiente de dicho subfragmento. El SSM (219) continúa escaneando la lista (223) para cada subfragmento subsiguiente en orden TS generando los mensajes TXR para cada subfragmento cuya fuente y destino no se han involucrado ya en una transferencia de 35 subfragmento pendiente. Para la emisión de cada TX CMD a todos los SPN (103) de usuario, cada SPN (103) de usuario transmite solamente un subfragmento en una sola vez y solamente recibe un subfragmento cada vez, si bien puede hacer ambos de manera simultánea. Por ejemplo, si se envía un mensaje TXR al TP del SPN #10 para programar una transferencia de subfragmento pendiente al SPN #2, entonces el SPN #10 no puede enviar simultáneamente otro subfragmento. No obstante, el SPN #10 puede 40 recibir simultáneamente un subfragmento de otro SPN. Además, el SPN #2 no puede recibir simultáneamente otro subfragmento mientras recibe el subfragmento de SPN #10, si bien el SPN #2 puede transmitir simultáneamente a otro SPN a causa de la naturaleza duplex completa de cada uno de los puertos del conmutador (101). The SSM (219) creates a list of “prepared” messages in order of time stamp in a list of prepared messages (RDY MSG) (223). The order in which the messages of the TPs are received in the user SPN (103) is not necessarily in the order of the time stamp, but they are organized in the order TS in the RDY MSG list (223). Just before the next set of transfers, the SSM (219) scans the RDY MSG list (223) starting with the oldest 30 timestamp. The SSM (219) first identifies the oldest TS in the RDY MSG list (223) and generates and sends the corresponding TXR message to the TP (215) of the user SPN (103) storing the corresponding sub-fragment to initiate the pending transfer of said subfragment. The SSM (219) continues to scan the list (223) for each subsequent sub-fragment in TS order generating the TXR messages for each sub-fragment whose source and destination have not already been involved in a pending sub-fragment transfer. For the issuance of each CMD TX to all user SPN (103), each user SPN (103) transmits only one sub-fragment at one time and only receives one sub-fragment at a time, although it can do both simultaneously. For example, if a TXR message is sent to the TP of SPN # 10 to schedule a pending sub-fragment transfer to SPN # 2, then SPN # 10 cannot simultaneously send another sub-fragment. However, SPN # 10 may simultaneously receive a subfragment of another SPN. In addition, SPN # 2 cannot simultaneously receive another sub-fragment while receiving the sub-fragment of SPN # 10, although SPN # 2 can simultaneously transmit to another SPN because of the full duplex nature of each of the switch ports (101 ).

El SSM (219) continúa escaneando la lista RDY MSG (223) hasta que todos los SPN (103) de 45 usuario hayan sido tenidos en cuenta o cuando se ha alcanzado el final de la lista RDY MSG (223). Cada una de las inscripciones en la lista RDY MSG (223) que corresponde a un mensaje TXR es retirada eventualmente de la lista RDY MSG (223) (cuando se envía el mensaje TXR o después de haber completado la transferencia). Cuando se ha terminado la última transferencia del periodo anterior, el SSM (219) envía un paquete TX CMD que señala a todos los SPN (103) de usuario que empiecen la siguiente 50 ronda de transmisiones. Cada transferencia tiene lugar de manera sincronizada dentro de un periodo de aproximadamente 4 a 5 ms para la configuración específica que se ha mostrado. Durante cada ronda de transferencia se envían MSG adicionales al SSM (219) y nuevos mensajes TXR a los SPN (103) de usuario para programar la siguiente ronda de transmisiones, y se repite el proceso. El periodo entre TX CMD sucesivos es aproximadamente igual al periodo necesario para transmitir todos los bites de un 55 subfragmento, incluyendo estructura del paquete y retraso entre paquetes, además de un periodo para eliminar todas las reservas que puedan haber tenido lugar en el conmutador durante la transmisión del subfragmento, típicamente 60 microsegundos (μs), más un periodo para tener en cuenta cualquier oscilación provocada por un retraso de reconocimiento del TX CMD por un SPN individual, típicamente menos de 100 μs. 60 The SSM (219) continues scanning the RDY MSG list (223) until all SPN (103) of 45 user have been taken into account or when the end of the RDY MSG list (223) has been reached. Each of the entries in the RDY MSG list (223) corresponding to a TXR message is eventually removed from the RDY MSG list (223) (when the TXR message is sent or after the transfer is completed). When the last transfer of the previous period has been completed, the SSM (219) sends a TX CMD packet that signals all user SPNs (103) to begin the next 50 round of transmissions. Each transfer takes place synchronously within a period of approximately 4 to 5 ms for the specific configuration shown. During each round of transfer additional MSGs are sent to the SSM (219) and new TXR messages to the user SPN (103) to schedule the next round of transmissions, and the process is repeated. The period between successive TX CMDs is approximately equal to the period necessary to transmit all the bits of a sub-fragment, including packet structure and delay between packets, in addition to a period to eliminate all reservations that may have taken place in the switch during the switch. Subfragment transmission, typically 60 microseconds (μs), plus a period to take into account any oscillation caused by a recognition delay of the CMD TX by an individual SPN, typically less than 100 μs. 60

En una realización, un SPN MGMT duplicado o simétrico a espejo (no mostrado) es imagen especular del SPN MGMT primario (205), de manera que el SSM (219), el VFS (209) y el expedidor son duplicados en un par de SPN MGMT especializados redundantes. En una realización, la emisión TX CMD In one embodiment, a mirror or mirror-symmetric MGMT SPN (not shown) is a mirror image of the primary MGMT SPN (205), so that the SSM (219), the VFS (209) and the shipper are duplicated in a pair of Specialized redundant MGMT SPN. In one embodiment, the TX CMD broadcast

de sincronización actúa como impulso de sincronización indicando la “salud” del SPN (205) MGMT. El impulso de sincronización es una señal hacia el SPN MGMT secundario de que todo es conforme. En ausencia del impulso de sincronización, el SPN MGMT secundario se hace cargo de todas las funciones de gestión dentro de un periodo determinado de tiempo, tal como, por ejemplo, 5 ms. Synchronization acts as a synchronization pulse indicating the "health" of the SPN (205) MGMT. The synchronization pulse is a signal to the secondary MGMT SPN that everything is compliant. In the absence of the synchronization pulse, the secondary MGMT SPN takes over all management functions within a certain period of time, such as, for example, 5 ms.

Si bien la presente invención ha sido descrita en considerable detalle con referencia a ciertas 5 versiones de la misma, otras versiones y variaciones son posibles y se contemplan dentro de la invención. While the present invention has been described in considerable detail with reference to certain 5 versions thereof, other versions and variations are possible and are contemplated within the invention.

Claims

1. Synchronized data transfer system (100) comprising:

a series of processor nodes (103);

a network core switch (101) coupled to said series of processor nodes to enable communication between said series of processor nodes; 5

a series of storage devices (111) distributed on said series of processor nodes and storing a series of titles, each title being divided into a series of sub-fragments (113) that are distributed on said series of storage devices;

a series of transfer processes (215), each executed on a corresponding processor node of a series of processor nodes and being operative to send a message (MSG) to a synchronous switch management process (219) for each sub-fragment that has to be transmitted from a local storage device of a source processor node (203) to a destination processor node, each message including a source node identifier (SRC) that identifies one of a series of processor nodes as said source processor node and a destination node identifier (DST) that identifies one of said series of processor nodes as said destination processor node; Y

said synchronous switching management process, executed in at least one of said series of processor nodes (205), which periodically sends a transmission order (TX CMD) to said series of processor nodes to start each of the periods of a series of sequential transmission periods that receives a series of messages (MSG), and before 20 of each transmission period, which selects from said series of messages to ensure that each process node sends to a sub-fragment and receives up to a sub-fragment during a following period, and that sends a series of transmission requests (TXR) corresponding to selected messages; Y

wherein each transfer process that has sent at least one message and that has received a transmission request (TXR) of said synchronous switching management process, identifying a corresponding sub-fragment, sends said corresponding sub-fragment (SC) during the following period of transmission initiated by a transmission order sent.

2. Synchronized data transfer system according to claim 1, wherein each of a series of messages comprises a time stamp (TS) and wherein said synchronous switching management process prioritizes said series of messages based on in the order of the time stamp and sends said plurality of transmission requests in the order of the time stamp.

3. Synchronized data transfer system according to claim 2, further comprising:

a series of user processes (207), each executed on a node of a series of processor and operational nodes to send a series of time stamp read requests (TSRR); Y

in which each transfer process incorporates a time stamp (TS) of a corresponding read request with time stamp in a corresponding message (MSGA).

4. Synchronized data transfer system according to claim 3, wherein said synchronous switching management process organizes said series of messages in a list of prepared messages (223) in order of time stamp, scanning said list of messages prepared in the order of the time stamp just before each of said series of sequential transmission periods and selects messages based on the priority of the time stamp.

5. Synchronized data transfer system according to claim 4, wherein said synchronous switching management process selects a message if an identified source processor node has not already been selected to transmit a sub-fragment during a subsequent transmission period and if an identified destination processor node has not already been selected to receive a sub-fragment during said next transmission period.

6. Synchronized data transfer system according to claim 1, further comprising:

each of said series of transfer processes stores requests for reading subfragments received in a queue for read requests (217), each request for subfragment reading indicating a locally stored subfragment (SCA);

each of said series of storage devices reading subfragments identified in a local read request queue by physical order;

each of said series of processor nodes makes a list of subfragments read successfully by a corresponding storage device in a read queue successfully (218); and 5

each of said series of transfer processes sends a message (MSG) for each enrollment in a corresponding satisfactory read queue (218) to said synchronous switching management process.

7. Synchronized data transfer system according to claim 6, wherein each of said subfragment read requests comprises a read request with time stamp 10 (TSRR), so that the inscriptions in each of said queues read successfully are listed in the order of the time stamp, and in which each transfer process (215) sends a message for each inscription in a corresponding queue of successful reading in the order of the time stamp.

8. Synchronized data transfer system according to claim 6, further comprising:

each of said series of transfer processes (215) removes an inscription from a corresponding satisfactory request queue in which said inscription is associated with a sub-fragment identified by a corresponding transmission request; Y

a series of network transfer processes (221), each executed on a corresponding node of a series of processor nodes (103), and each of them being operative to constitute network packets used to transfer an identified sub-fragment to a destination processor node in response to a transmission order.

9. Synchronized data transfer system according to claim 1, wherein said network switch (101) comprises a Gigabit Ethernet switch with a series of ports, and wherein each of said processor nodes is coupled to a corresponding body of said network switch 25.

10. Synchronized data transfer system according to claim 1, wherein said series of processor nodes comprises a management node ((205) executing said synchronous manager process (219).

11. Synchronized data transfer system according to claim 1, wherein said series of processor nodes comprises a first management node (205) executing said synchronous switching manager process and a second management node executing a Symmetric mirror synchronous switching management process.

12.- Procedure for synchronous transfer of distributed sub-fragments of data between a series of processor nodes (103) coupled (105) to a network switch (101), with a series of 35 storage devices (111) distributed on said series of processing nodes and storing a series of titles, each title being divided into a series of sub-fragments (113) that are distributed on said series of storage devices, the procedure of which comprises:

periodically issue, by a management process (219) executed in at least one of the processing nodes, a transmission order (TX CMD) to said series of process nodes 40 to start each of said series of sequential transmission periods;

send to the management process, for each processor node (203) that has at least one sub-fragment (SC) to send, a message (MSG) for each sub-fragment (SCA) to be sent, from a node (203) of the local storage device to one destination node, each message identifying a node in the series of process nodes as the source processor node (SRC) and one in the series of processor nodes as the destination processor node (DST);

select, before each transmission period, by the management process, messages received from the processing nodes to ensure that each processor node that has been identified as the source processor node sends up to a sub-fragment during a subsequent transmission period and that each processor node which has been identified as the destination processor node 50 receives up to a sub-fragment during the following transmission period;

send, by the management process, a series of transmission requests (TXR), each transmission request being sent to a process node (203) that has sent a corresponding message that has been selected; Y

transmit, for each processor node (203) that receives a transmission request (TXR), an identified sub-fragment (MSGID) by the transmission request received to a destination processor node in response to the next transmission order.

13. Method according to claim 12, further comprising:

before said sending of a message for each sub-fragment to be sent, the time stamp 5 (TS) of each message;

said selection comprising prioritization based on the order of time stamping (223); Y

said sending comprising a series of transmission requests sending transmission requests in the order of time stamping. 10

14. Method according to claim 13, further comprising:

the sending, at least by a processor node (201), of a series of read requests with time stamp (TSRR); Y

wherein said time stamp of each message comprises the incorporation of a time stamp (TS) of a read request with time stamp received (TSRR) in a corresponding message 15 (MSG1).

15. Method according to claim 14, further comprising:

sorting by the management process (219) of the messages received in a list of prepared messages (223) in the order of time stamp; Y

scan through the management process (219) the list of prepared messages (223) in order 20 of time stamping just before each transmission period.

16. A method according to claim 15, wherein said scanning comprises the selection of a message if the identified source processor node has not already selected a sub-fragment for transmission during the following transmission period and if the identified destination processor node It has not already been selected to receive a sub-fragment during the next transmission period. 25

17. Method according to claim 16, wherein said scanning is completed when the complete list of prepared messages has been scanned or if all of the processor nodes have been selected to transmit a sub-fragment or if all of the nodes Processor have been selected to receive a subfragment.

18. Method according to claim 12, further comprising:

storing subfragment read requests received in a read request queue (217), each subfragment read request indicating a request for a locally stored subfragment (SCA);

reading by a local disk drive (111) of subfragments identified in the read request queue in physical order; 35

make the list of subfragment inscriptions read successfully in a successful read queue (218); Y

said message sending for each sub-fragment to be sent comprises sending a message for each inscription in said satisfactory reading queue.

19. A method according to claim 18, wherein each subfragment read request 40 comprises a time stamp read request (TSRR), such that said list of subfragment inscriptions read successfully in a satisfactory read queue it comprises the list of inscriptions in order of time stamping, and in which said sending of a message for each inscription in said satisfactory reading queue comprises sending messages in the order of time stamping.

20. Method according to claim 18, further comprising:

remove an inscription from the satisfactory request queue that is associated with a sub-fragment identified by a corresponding transmission request (TXR); Y

construct (221) network packets (SC) used to transfer the identified sub-fragment to a destination processor node (DST) in response to a transmission order (TX CMD).

21. Method according to claim 12, further comprising the execution by the management processor (219) in a first management node (205) and the execution of a mirror symmetric management process in a symmetric management node a mirror, which is the mirror symmetry of the first management node.

5