HK1118110A

HK1118110A - A caching engine in a messaging system

Info

Publication number: HK1118110A
Application number: HK08109004.7A
Authority: HK
Inventors: 巴利．J．汤普森; 库．辛格; 皮埃尔．费沃
Original assignee: 特维拉有限公司
Priority date: 2005-01-06
Filing date: 2005-12-23
Publication date: 2009-01-30

Description

Cache engine in a messaging system

Reference to previously filed application

The present application claims priority from U.S. provisional application serial No.60/641,988 entitled "Event Router System And method" filed on 6.1.2005 And U.S. provisional application serial No.60/688,983 entitled "Hybrid Feed Handlers And Latency Measurement" filed on 8.6.2005 And incorporated by reference.

This application is related To U.S. patent application Ser. No. 11/316,778 (attorney docket No. 50003-.

Technical Field

The present invention relates to data messaging (messaging), and more particularly to a caching engine in a messaging system having a publish and subscribe (hereinafter "publish/subscribe") middleware architecture.

Background

The increasing levels of performance required by data messaging infrastructures have forced the development of networking infrastructures and protocols. Basically, data distribution involves various data sources and destinations, as well as various types of interconnect architectures and communication modes between the data sources and destinations. Examples of existing data messaging architectures include hub-and-spoke (hub-and-spoke), peer-to-peer, and store-and-forward.

With the hub-and-spoke system configuration, all communications are transmitted through the hub, which often results in performance bottlenecks when throughput is high. Thus, such messaging systems create latency. One way to circumvent this bottleneck is to deploy more servers and distribute the network load among these different servers. However, this architecture presents scalability and operational issues. Systems with peer-to-peer configurations create unnecessary stress on applications to process and filter data, and are only as fast as their slowest customers or nodes, as compared to systems with hub-and-spoke configurations. While systems with store-and-forward system configurations store data before forwarding it to the next node in the path in order to provide persistence. Storage operations are typically implemented by indexing and writing messages to a storage disk, which can create a performance bottleneck. Furthermore, as the amount of messages increases, the indexing and writing tasks may be quite slow, thus possibly introducing additional latency.

To provide data consistency, these store-and-forward systems must provide the ability to recover from any logical or physical disaster without data loss. This is typically accomplished using remote disk mirroring or database replication techniques. The challenge for such an implementation is to always ensure data consistency between the primary and secondary sites while having low latency. One option is to implement a synchronous solution in which each block of data written at the primary site is considered complete after it is mirrored at the secondary site. A problem with this kind of synchronization implementation is that it affects the overall performance of the message layer. An alternative is to implement an asynchronous approach. However, in the case of this approach, the challenge to avoid data loss or corruption is to maintain data consistency while a disaster occurs. Another challenge is to ensure ordering of data updates.

Existing data messaging architectures share some deficiencies. A common deficiency is that data messaging in existing architectures relies on software residing at the application layer. This means that the messaging infrastructure is subject to OS (operating system) queuing and network I/O (input/output), which can create performance bottlenecks. Another common deficiency is that existing architectures use data transfer protocols statically rather than dynamically, even though other protocols may be more appropriate in some situations. Some examples of common protocols include routable multicast, broadcast, or unicast. In fact, the Application Programming Interfaces (APIs) in existing architectures are not designed to switch between transport protocols in real time.

In addition, network configuration decisions are typically made at deployment time and are typically defined to optimize a set of network and messaging conditions under certain assumptions. The limitations associated with static (fixed) configurations preclude real-time dynamic network reconfiguration. In other words, existing architectures are configured for a particular transport protocol that does not always fit all network data transport load conditions, and therefore, existing architectures are always unable to handle changing or increasing load capacity demands in real time.

Furthermore, existing messaging architectures use routable multicast to transport data across a network when data messaging is destined for a particular recipient or group of recipients. However, in a system set up for multicast, there is a limit to the number of multicast groups that can be used to distribute data, and as a result, the messaging system no longer sends data to destinations that are not subscribed to it (i.e., customers that are not subscribers). This increases the data processing load and drop rate for the client due to data filtering. Thus, a client that for any reason becomes overloaded and cannot keep up with the data flow eventually drops incoming data and later requires retransmission. Retransmissions have an impact on the overall system because all clients receive duplicate transmissions and all clients reprocess incoming data. Therefore, the retransmission may cause a multicast storm and may eventually crash the entire system.

When the system is set up for unicast messaging as one method of reducing drop rate, the messaging system may experience bandwidth saturation due to data replication. For example, if more than one customer subscribes to a given topic of interest, the messaging system must deliver the data to each subscriber, and in fact, the system sends a different copy of the data to each subscriber. Although this solves the problem of clients filtering out non-subscribed data, unicast transmission is not scalable and therefore basically not suitable for situations where a large group of clients subscribe to specific data or where consumption patterns are extremely overlapping.

Another common deficiency of existing architectures is that their protocol conversion is slow and very numerous. This is due to IT (information technology) equity (band-aid) policies in the field of enterprise application integration (EIA), where more and more new technologies are integrated with legacy systems.

Accordingly, there is a need in a number of fields to improve data messaging system performance. Examples where performance may need to be improved are speed, resource allocation, latency, etc.

Disclosure of Invention

The present invention is based, in part, on the foregoing observations and the insight that this deficiency can be solved with better results using different approaches. These observations have led to the development of an end-to-end message publish/subscribe architecture for high volume low latency messaging while having guaranteed delivery quality of service through data caching. Thus, a messaging infrastructure with such an architecture (publish/subscribe middleware system) also includes a Caching Engine (CE) with caching and storage services as will be described in more detail later.

Generally, a Messaging Appliance (MA) receives and routes messages. When tightly coupled with a CE, it first records all or a subset of the routed messages by sending a copy to the CE. The recorded messages are then available for retransmission when requested by any component in the messaging system for a predetermined period of time, thereby providing a combined, guaranteed-simultaneous-connected and guaranteed-simultaneous-disconnected quality of service and partial data publication service.

To support this service, the CE is designed to keep up with the forwarding rate of the MA. For example, a CE is designed to have: a high-throughput connection between the MA and the CE for pushing messages as fast as possible, a high-throughput and smart caching mechanism for inserting and replaying messages from the back-end CE database, and a high-throughput, persistent storage. One of the considerations in this design is to reduce the latency of replay requests.

Thus, for the purposes of the present invention, as shown and broadly described herein, an exemplary system includes a cache engine. Messaging devices and interface media. The cache engine includes: a message layer operative to send and receive messages; a caching layer having an indexing service operative to first index received messages and to maintain a mirror of received portions of published messages, a storage device operative to store all or a subset of received messages in the storage device, and a storage service; one or more physical channel interfaces for transmitting received and transmitted messages; and a message transport layer with channel management for controlling the sending and receiving of messages over each of the one or more physical channel interfaces. The physical medium between the messaging appliance and the caching engine is network fabric independent and is configured as ethernet, memory-based direct connection, or Infiniband.

In addition, the aforementioned system may be implemented with a provisioning and management system linked via the interface medium and configured to exchange management messages with each messaging appliance. The caching engine configuration is communicated via management messages from the P & M system and via the MA directly connected to the caching engine. The caching engine effectively acts as another neighbor in the neighbor-based messaging architecture.

Various methods of utilizing a caching engine as described above can provide quality of service in messaging. One such method is performed in a caching engine having a message transport layer, a management message layer, and a caching layer with an indexing and storage service and associated storage. The method comprises the following steps: receiving data and management messages through the message transport layer; forwarding the management message to the management message layer and forwarding the data message to the caching layer, wherein message retrieval request messages forwarded to the management message layer are routed to the caching layer. The method further comprises the following steps: indexing the data message in the indexing service, the indexing being topic-based; and storing the data message in a storage based on the indexing, wherein the data message is held in the storage for a predetermined period of time during which the data message is available for retransmission in response to a message retrieval request message.

Because the data messages are either complete data messages or partially published data messages and each data message has an associated topic, the indexing service maintains a master image of each complete data message. Then, for a received data message that is a partially complete message, the indexing service compares the received data message to the most recent master image of the complete message having a related topic similar to the topic of the partially published message to determine how the master image should be updated. Both the partially published message and the master image are indexed and both available for retransmission.

These caching engines may be configured and deployed as fault-tolerant pairs of primary and secondary CEs, or as fault-tolerant groups of more than two CE nodes. If two or more CEs are logically linked to each other, they subscribe to the same data and thereby maintain a unique and consistent view of the subscribed data. Note that, much like Application Programming Interfaces (APIs), the CE's subscription to data is topic-based. In the event of data loss, a CE may request replay of the lost data from other CE members in the fault tolerant group. Data synchronization between CEs in the same fault-tolerant group is parallelized by the messaging fabric that intelligently and efficiently forwards multiple copies of subscribed messaging traffic to all caching engine instances via the MA. As a result, this enables asynchronous data consistency for fault tolerant and disaster recovery deployments, where data synchronization and persistence are performed and guaranteed through the messaging network fabric, rather than through the use of storage/disk mirroring or database replication techniques.

These and other features, aspects, and advantages of the present invention will become better understood with regard to the following description, appended claims, and accompanying drawings as will be described hereinafter.

Drawings

The accompanying drawings incorporated in and forming a part of the specification illustrate various aspects of the present invention and, together with the description, serve to explain the principles of the invention. Wherever convenient, the same reference numbers will be used throughout the drawings to refer to the same or like elements.

FIG. 1 illustrates an end-to-end middleware architecture in accordance with the principles of the present invention.

Fig. 1a is a diagram illustrating an overlay network (overlay network).

FIG. 2 is a diagram illustrating an enterprise infrastructure implemented using an end-to-end middleware architecture in accordance with the principles of the present invention.

Fig. 3 illustrates the architecture of a channel-based messaging system.

Fig. 4 illustrates one possible topic-based message format.

Fig. 5 illustrates a topic-based message routing and routing table.

Fig. 6 shows an interface for communication between a MA and a CE.

FIG. 7 is a block diagram illustrating a CE (caching engine) configured according to one embodiment of the invention.

FIG. 8 shows a fault tolerant configuration with primary and secondary caching engines, and shows the different stages in case of failure.

Detailed Description

Before summarizing the details of various embodiments according to various principles and aspects of the present invention, the following is a brief description of some terms that may be used throughout the description. Note that this description is for clarity only and gives the reader an understanding of how these terms might be used, but does not limit these terms to the context in which they are used, nor does it limit the scope of the claims accordingly.

The term "middleware" is used as a general term in the computer industry for any programming that is coordinated between two separate, usually pre-existing programs. Generally, middleware programs provide messaging services so that different applications can communicate. Linking different applications together systematically, often through the use of middleware, is known as Enterprise Application Integration (EAI). However, in this context, "middleware" may be a broader term used in the context of messaging between sources and destinations and facilities deployed to enable such messaging; thus, the middleware architecture, alone or in combination with the one described below, covers networking and computer hardware and software components that enable efficient data messaging. Further, the term "messaging system" or "middleware system" may be used in the context of a publish/subscribe system in which a messaging server manages message routing between publishers and subscribers. In fact, the paradigm of publish/subscribe in messaging middleware is extensible and thus a powerful model.

The term "client" may be used in the context of a client-server application or the like. In one example, a client is a system or application that registers with a middleware system using an Application Programming Interface (API) to subscribe to information and receive data delivered by the middleware system. The API inside the middleware architecture boundary is a client; and the external client is any publish/subscribe system (or external data destination) that does not use the API, and the message is to be converted by a protocol (to be described later) in order to communicate with it.

The term "external data source" may be used in the context of data distribution and message publish/subscribe systems. In one example, an external data source is considered a system or application, located within or external to an enterprise private network, that publishes messages using one of the commonly used protocols or its own messaging protocol. One example of an external data source is a market data exchange that publishes stock market offers, which are distributed to traders via a middleware system. Another example of an external data source is transactional data. Note that in a typical implementation of the present invention, which will be described in more detail later, the middleware architecture employs its unique native protocol into which data from external data sources is translated once it enters the middleware system domain, thereby avoiding the multi-protocol transformations typical of conventional systems.

The term "external data destination" is also used in the context of data distribution and message publish/subscribe systems. For example, an external data destination is a system or application located within or outside an enterprise private network that subscribes to information routed via a local/global network. One example of an external data destination may be the aforementioned market data exchange that processes a transaction order issued by a trader. Another example of an external data destination is transactional data. Note that in the aforementioned middleware architecture, messages destined for an external data destination are translated from a native protocol to an external protocol associated with the external data destination.

From the description herein, it can be appreciated that the present invention can be implemented in various ways using Cache Engines (CEs) that are each implemented in various configurations in a middleware architecture. The description thus begins with an example of the end-to-end middleware architecture shown in fig. 1.

This exemplary architecture combines a number of beneficial features including: messaging common concepts, APIs, fault tolerance, setup and management (P & M), quality of service (QoS-merged, best effort, guaranteed simultaneous connected, guaranteed simultaneous unconnected, etc.), persistent caching with guaranteed delivery of QoS, management of namespaces and security services, publish/subscribe ecosystems (core, ingress and egress components), transport transparent messaging, neighbor-based messaging (a model that is a hybrid between hub-spoke, peer-to-peer, and store-and-forward that can propagate subscriptions to all neighbors using a subscription-based routing protocol), late binding mode, partial publication (as opposed to all data, only publishing changing information) and dynamic allocation of network and system resources when necessary. As will be described later, the publish/subscribe system advantageously incorporates a fault tolerant design of the middleware architecture. Note that the core MA portion of the publish/subscribe ecosystem uses the aforementioned native messaging protocol (native to the middleware system), while the ingress and egress portions, the edge MAs, translate to and from the native protocol, respectively.

In addition to publish/subscribe system components, the diagram of FIG. 1 also shows the logical connections and communications between them. As can be seen from the figure, the middleware architecture shown is that of a distributed system. In a system having this architecture, logical communication between two distinct physical components is established using a message stream and an associated message protocol. The message stream contains one of two types of messages: management and data messages. Management messages are used to manage and control the different physical components, manage subscriptions to data, and so on. Data messages are used to transfer data between a source and a destination, and in typical publish/subscribe messaging, there are multiple senders and multiple recipients of data messages.

With the structural configuration and logical communication shown, the distributed publish/subscribe system with middleware architecture is designed to perform a variety of logical functions. One logical function is message protocol translation, which is advantageously performed at the edge messaging facility (MA) component. The second logical function is to route messages from publishers to subscribers. Note that these messages are routed throughout the publish/subscribe network. Thus, the routing function is performed by each MA in which messages are propagated, i.e., from edge MA106 a-b (or API) to core MA 108a-c, from one core MA to another, and finally to the edge MA (e.g., 106b) or API 110 a-b. The APIs 110a-b communicate with the application 112 via an interprocess communication bus (sockets, shared memory, etc.)_1-nAnd (4) communication.

A third logical function is to store messages for different types of guaranteed delivery quality of service, including for example guaranteed simultaneous connected and guaranteed simultaneous unconnected. The fourth function is to deliver these messages to the subscriber. As shown, the APIs 106a-b deliver messages to the subscribing application 112_1-n。

In this publish/subscribe middleware architecture, system configuration functions as well as other management and system performance monitoring functions are managed by the P & M102, 104 system. Configuration includes physical and logical configuration of publish/subscribe middleware system networks and components. Monitoring and reporting includes monitoring the health of all network and system components and reporting the results to a log automatically or on demand. The P & M system performs its configuration, monitoring and reporting functions via management messages. In addition, the P & M system allows a system administrator to define a message namespace associated with each of the messages routed through the entire publish/subscribe network. Accordingly, the publish/subscribe network may be physically and/or logically divided into namespace-based subnets.

The P & M system manages the publish/subscribe middleware system with one or more MAs. These MAs are deployed either as edge MAs or core MAs depending on their role in the system. Edge MAs are similar in most respects to core MAs, except that they include protocol translation engines that translate messages from external protocols to native protocols and vice versa. Thus, in general, the boundaries of the publish/subscribe middleware architecture (i.e., the end-to-end publish/subscribe middleware system boundaries) are characterized by its edges in which the MAs 106a-b and APIs 110a-b exist; and within these boundaries, there are core MAs 108 a-c.

Note that the system architecture is not limited to a particular restricted geographic area, and in fact, the system architecture is designed to transcend regional or national boundaries, even across continents. In this case, an edge MA in one network may communicate with an edge MA in another network that is geographically distant via existing networking infrastructure.

In a typical system, the core MAs 108a-c route messages issued within the system to edge MAs or APIs (e.g., APIs 110 a-b). Especially the routing graph in the core MA is designed for maximum, low latency and efficient routing. Furthermore, routing between core MAs can be dynamically changed in real time. For a given messaging path through multiple nodes (core MAs), real-time changes in routing are based on one or more metrics including network utilization, total end-to-end latency, traffic, network delay, loss, and jitter.

Alternatively, instead of dynamically selecting the best performing path from two or more different paths, the MA may perform multi-path routing based on message replication and thereby send the same message through all paths. All MAs located at the aggregation point of different paths will drop the duplicated messages, forwarding only the first arriving message. This routing method has the advantage of optimizing the low latency messaging infrastructure; although the disadvantage of this routing is that the infrastructure requires more network bandwidth to carry the replicated traffic.

The edge MA has the ability to: converting any external message protocol of the incoming message to a native message protocol of the middleware system; and an external protocol that translates from a native message protocol to an outbound message. That is, as messages enter the publish/subscribe network domain (portal), the external protocols are translated to local (e.g., Tervela)^TM) A message protocol; and when the message leaves the publish/subscribe network domain (egress), the native protocol is translated to an external protocol. Another function of the edge MA is to deliver published messages to subscribed external data destinations.

In addition, both the edge and core MAs 106a-b and 108a-c are capable of storing messages before forwarding them. One way in which this functionality may be implemented is by using Cache Engines (CEs) 118 a-b. One or more CEs may be connected to the same MA. In theory, the APIs are not considered to have such store-and-forward capabilities, although in practice the APIs 110a-b may store messages before delivering them to an application, and it may store messages received from an application before delivering them to a core MA, an edge MA, or another API.

When an MA (edge or core MA) has an active connection to a CE, it forwards all or a subset of the routed messages to the CE, which writes them to a storage area to achieve persistence. The recorded messages may be used for retransmission when requested for a predetermined period of time. Examples where such features are implemented are data relaying, partial publishing and various quality of service levels. Partial publication is effective in reducing network and customer load because it requires only updated information to be sent, not all information.

To illustrate how a routing graph may implement routing, several examples of publish/subscribe routing paths are shown in fig. 1. In this illustration, the middleware architecture of the publish/subscribe network provides five or more communication paths between publishers and subscribers.

The first communication path links an external data source to an external data destination. From external data sources 114_1-nThe received published message is translated to local (e.g., Tervela)^TM) The message protocol, and then routed by the edge MA106 a. One route by which native protocol messages may be routed from the edge MA106a is to the external data destination 116 n. This path is referred to as communication path 1 a. In this case, the native protocol message is converted into an external protocol message suitable for the external data destination. Another route by which native protocol messages may be routed from the edge MA106 b is internally through the core MA 108 b. This path is referred to as a communication path 1 b. Along this path, the core MA 108b routes the local message to the edge MA106 a. However, native protocol messages are routed at the edge MA106a to the external data destination 116₁Before it converts them to fit the external data destination 116₁The external message protocol of (1). It can be seen that this communication path does not require an API to route messages from publishers to subscribers. Thus, if the publish/subscribe system is used for external source-to-destination communication, the system need not include an API.

Another communication path, referred to as communication path 2, links the external data source 114n to an application using the API 110 b. Published messages received from external data sources are translated into native message protocols at the edge MA106a and then routed by the edge MA to the core MA 108 a. From the first core MA 108a, the messages are routed through another core MA 108c to the API 110 b. From the API, the messages are deliveredTo subscribing applications (e.g., 112)₂). Because the communication path is bidirectional, in another example, messages may follow a reverse path from the subscribing application 112_1-nTo the external data destination 116 n. In each instance, the core MA receives native protocol messages and routes native protocol messages, while the edge MA receives external or native protocol messages and routes native or external protocol messages, respectively (edge MA translates/translates such external message protocol to/from native message protocol). Each edge MA may route ingress messages to both the native protocol channel and the external protocol channel. As a result, each edge MA can route ingress messages to both external and internal clients simultaneously, with internal clients consuming native protocol messages and external clients consuming external protocol messages. This capability enables the messaging infrastructure to integrate seamlessly and smoothly with legacy applications and systems.

Another communication path, referred to as communication path 3, links two applications, both of which utilize APIs 110 a-b. At least one of these applications publishes a message or subscribes to a message. Delivery of published messages to or from a subscribing application is accomplished using an API located at the edge of the publish/subscribe network. When an application subscribes to a message, one of the core or edge MAs routes the message to the API, which then notifies the subscribing application when data is ready to be delivered to them. Messages published from an application are sent via the API to the core MA 108c to which the API is "registered".

Note that by "registering" (logging) to a MA, the API becomes logically connected to the MA. The API initiates a connection to the MA by sending a registration ("login" request) message to the MA. After registration, the API may subscribe to a particular topic of interest by sending its subscription message to the MA. Topics are used for publish/subscribe messaging to define shared access domains and targets for messages, and thus subscribing to one or more topics allows for receiving and transmitting messages with such topic annotations. P&M sends periodic authorization updateTo the MAs in the network, each MA updates its own table accordingly. Thus, if an API is found to be authorized to subscribe to a particular topic (the MA verifies the authorization of the API using the routing authorization table), the MA activates a logical connection to the API. Then, if the API is properly registered with the core MA 108c, the core MA 108c routes the data to the second API 110, as shown. In other examples, the core MA 108b may route messages through additional one or more core MAs (not shown) that route messages to the API 110b, which the AP 110b then delivers to the subscribing application 112_1-n。

It can be seen that the communication path 3 does not require the presence of the edge MA, since it does not involve any external data message protocol. In one embodiment, which gives an example of the communication paths herein, an enterprise system is configured with a news server that publishes to employees the latest news on a variety of topics. To receive news, employees subscribe to topics of interest to them via a news browser application that utilizes an API.

Note that the middleware architecture allows subscription to one or more topics. Further, this architecture subscribes to a group of related topics with a single subscription request by allowing wildcards in the message annotations.

Yet another communication path, referred to as communication path 4, is one of the multiple paths associated with the P & M systems 102 and 104, each of which links the P & M to one of the MAs in the publish/subscribe network middleware architecture. The messages that go back and forth between the P & M system and each MA are management messages that are used to configure and monitor the MA. In one system configuration, the P & M system communicates directly with the MA. In another system configuration, the P & M system communicates with some MAs through other MAs. In yet another configuration, the P & M system may communicate directly or indirectly with the MA.

In a typical implementation, the middleware architecture may be deployed on a network having switches, routers, and other networking devices, and which employs channel-based messaging capable of communicating over any type of physical medium. One exemplary implementation of such network configuration independent channel-based messaging is an IP-based network. In this environment, all communication between all publish/subscribe physical components is performed over UDP (datagram protocol), and transport reliability is achieved by the message transport layer. Figure 1a illustrates an overlay network in accordance with the present principles.

As shown, overlay communications 1, 2, and 3 may occur between the three core MAs 208a-c via switches 214a-c, routers 216, and subnets 218 a-c. In other words, these communication paths may be built on top of an underlying network comprising networking infrastructure such as subnets, switches and routers, and as noted above, such an architecture may span a large geographic area (different countries or even different continents).

The foregoing and other end-to-end middleware architectures in accordance with the principles of the present invention may be implemented in a variety of enterprise infrastructures in a variety of business environments. One such implementation is illustrated in fig. 2.

In the enterprise infrastructure, market data distribution plants 12 are built on top of a publish/subscribe network for communicating data from various market data exchange devices 320_1-nTo a trader (application not shown). Such overlay solutions rely on the underlying network to provide, for example, inter-MA and such MA and P&Interconnection between M systems. To API 310_1-nIs based on application ordering. Using this infrastructure, traders utilizing applications (not shown) will come from the API 310_1-nIs placed back to the market data exchange device 320 (via the core MAs 308 a-b and the edge MA 306b) through the publish/subscribe network_1-n。

Logically, the physical components of the publish/subscribe network are built on top of a messaging layer similar to layers 1 to 4 of the Open Systems Interconnection (OSI) reference model. Layers 1 to 4 of the OSI model are the physical layer, the data link layer, the network layer and the transport layer, respectively.

Thus, in one embodiment of the invention, a publish/subscribe network may be deployed directly into an underlying network/fabric by, for example, inserting one or more messaging line cards in all or a subset of the network switches and routers. In another embodiment of the invention, the publish/subscribe network may be deployed as a mesh overlay network (where all physical components are connected to each other). For example, a full mesh network of 4 MAs is one in which each MA is connected to each of its 3 peer MAs. In a typical implementation, the publish/subscribe network is a mesh network of the following components: one or more external data sources and/or destinations, one or more setup and management (P & M) systems, one or more Messaging Appliances (MAs), one or more optional Caching Engines (CEs), and one or more optional Application Programming Interfaces (APIs).

It is apparent that communication throughout the publish/subscribe network is accomplished using native protocols for messages that are independent of the underlying transport logic. This is why this architecture is called transport transparent channel-based messaging architecture.

Fig. 3 illustrates the channel-based messaging architecture 320 in more detail. In general, each communication path between a messaging source and destination is considered a messaging channel. Each channel 326_1-nUsing an interface 328 between a channel source and a channel destination_1-nEstablished through physical media. Each such channel is established for a specific messaging protocol, such as local (e.g., Tervela)^TM) Messaging protocols, or otherwise. Only edge MAs (those that manage the ingress and egress of the publish/subscribe network) utilize the channel message protocol (external message protocol). Based on the channel message protocol, channel management layer 324 determines whether the incoming and outgoing messages require protocol translation. At each edge MA, if the channel message protocol of the incoming message is different from the native protocol, the channel pipeThe management layer 324 will perform protocol translation by sending messages to be processed through a Protocol Translation Engine (PTE)332 before passing them to the local message layer 330. Also, at each edge MA, if the native message protocol of the outgoing message is different from the channel message protocol (external message protocol), the channel management layer 324 routes the message being processed to the transport channel 326_1-nPreviously, they were sent through a Protocol Translation Engine (PTE)332 to perform protocol translation. Thus, the channel pairs interface 328 with the physical medium_1-nThe particular network and transport logic associated with the physical medium, and the message components or fragments.

In other words, the channel manages OSI transport to physical layer 322. Optimization of channel resources is performed on a per-channel basis (e.g., message density optimization of the physical medium based on consumption patterns including bandwidth, message size distribution, channel destination resources, and channel health statistics). Then, no particular type of architecture is required because the communication channel is network configuration independent. Virtually any fabric medium will work, e.g., ATM, Infiniband, or Ethernet.

Incidentally, when, for example, a single message is divided into a plurality of frames or a plurality of messages are packed into a single frame, message segmentation or reassembly may be required. Message segmentation or reassembly is performed before the message is delivered to the channel management layer.

Fig. 3 further illustrates a number of possible channel implementations in a network with a middleware architecture. In one implementation 340, the communication is performed via a network-based channel using multicast over an ethernet-switched network that serves as the physical medium for such communication. In this implementation, the source sends a message from its IP address via its UDP port to a group of destinations (defined as an IP multicast address) with their associated UDP ports. In a variation 342 of this implementation, the communication between the source and destination is implemented using UDP unicast over an ethernet switched network. The source sends a message from its IP address via its UDP port to a selected destination having a UDP port at its corresponding IP address.

In another implementation 344, the channel is established over an Infiniband interconnect using a native Infiniband transport protocol, where the Infiniband fabric is the physical medium. In this implementation, the channel is node-based, and the communication between the source and destination is node-based using their respective node addresses. In yet another implementation 346, the channel is memory based, such as RDMA (remote direct memory access), and is referred to herein as a Direct Connection (DC). With this type of channel, messages are sent directly from the source machine to the memory of the destination machine, bypassing CPU processing to handle messages from the NIC to the application memory space, and possibly avoiding the network overhead of encapsulating messages into network packets.

As for the local protocol, one method utilizes the aforementioned local Tervela^TMAnd (4) message protocol. Conceptually, Tervela^TMThe messaging protocol is similar to the IP-based protocol. Each message contains a message header and a message payload. The message header contains a plurality of fields, one of which is used for topic information. As described above, topics are used by customers to subscribe to a shared information domain.

Fig. 4 illustrates one possible topic-based message format. As shown, the message includes a header 370 and bodies 372 and 374, the bodies 372 and 374 including a payload. Two types of messages, namely data and management messages, are shown, the two types of messages having different message bodies and payload types. The header includes fields for: source and destination namespace identifications, source and destination session identifications, topic sequence numbers, and wish timestamps, and additionally, it includes a topic notation field (which is preferably variable length).

A topic may be defined as a tag-based string, e.g., T1.T2.T3.T4, where T1, T2, T3, and T4 are variable-length strings. In one example, a topic may be defined as nyse. rtf. IBM 376, which string is a topic notation of a message containing IBM stock real-time quotes. In some implementations, the topic notations in the message may be encoded or mapped to a key, which may be one or more integer values. Each topic would then be mapped to a unique key and a mapping database between topics and keys would be maintained by the P & M system and updated to all MAs over the wire. As a result, the MA is able to return the associated unique key for the topic field of the message when the API subscribes to or publishes a topic.

Preferably, the subscription format will follow the same format as the message topic. However, the subscription format also supports wildcards that match any topic substring or that pattern match a topic regular expression. The processing of the mapping of wildcards to actual topics can be tied to the P & M system or handled by the MA according to the complexity of the wildcard or pattern matching request.

The pattern matching may follow the rules provided in the examples below.

Example # 1: with wildcard T1.^*T3.t4 string will match t1.t2a. t3.t4, t1.t2b. t3.t4, but not t1.t2.t3.t4.t5

Example # 2: with wildcard T1.^*.T3.T4.^*Will not match t1.t2a.t3.t4, t1.t2b.t3.t4, but will match t1.t2.t3.t4.t5

Example # 2: with wildcard T1.^*.T3.T4.[^*]The string of characters (the fifth element is optional) will match with t1.t2a. t3.t4, t1.t2b. t3.t4, and t1.t2.t3.t4.t5, but not with t1.t2.t3.t4.t5.t6

Example # 4: with wildcard character T1.T2^*T3.t4 string will match t1.t2a. t3.t4, t1.t2b. t3.t4, but not t1.t5a. t3.t4

Example # 5: with wildcard T1.^*A string of t3.t4 > (any number of trailing elements) would be aligned with t1.t2a.t3.t4, t1.t2b.t3.t4, t1.t2.t3.t4.t5 andt1.T2.T3.T4.T5.T6 match

Fig. 5 illustrates topic-based message routing. As shown, a topic may be defined as a tag-based string, e.g., T1.t2.t3.t4, where T1, T2, T3, and T4 are variable-length strings. As can be seen, incoming messages with a particular topic annotation 400 are selectively routed to a communication channel 404, and routing determinations are made based on a routing table 402. The mapping of topic subscriptions to channels defines routes and is used to deliver messages throughout the publish/subscribe network. A superset of all these routes or mappings between subscriptions and channels defines the routing table. The routing table is also referred to as a subscription table. The subscription table for routing with string-based topics can be constructed in a number of ways, but is preferably configured to optimize its size and routing lookup speed. In one implementation, the subscription table may be defined as a dynamic hash map structure, while in another implementation, the subscription table may be arranged in a tree structure, as shown in the diagram in fig. 5.

The tree includes nodes (e.g., T) connected by edges₁、…、T₁₀) Wherein each substring of the topic subscription corresponds to a node in the tree. The channels mapped to a given subscription are stored on the leaf nodes of the subscription, each leaf node indicating a list of channels from which the topic subscription came (i.e., through which subscription requests were received). The list indicates which channel should receive a copy of the message whose topic notation matches the subscription. As shown, the message routing lookup takes a message topic as input and then parses the tree with each substring of that topic to locate the different channels associated with the incoming message topic. E.g. T₁，T₂，T₃，T₄And T₅Directed channels 1, 2, and 3; t is₁、T₂And T₃A steered channel 4; t is₁、T₆、T₇、T_*And T₉Directed channels 4 and 5; t is₁，T₆，T₇，T₈And T₉Guided channel 1(ii) a And T₁、T₆、T₇、T_*And T₁₀Is directed to channel 5.

Although the choice of the structure of the routing table is to optimize the lookup of the routing table, the performance of the lookup also depends on the search algorithm used to find one or more topic subscriptions that match the incoming message topic. Therefore, the routing table structure should be able to accommodate such an algorithm and vice versa. One way to reduce the size of the routing table is to allow the routing algorithm to selectively propagate subscriptions throughout the publish/subscribe network. For example, if a subscription appears to be a subset of another subscription that has already been propagated (e.g., a portion of the entire string), there is no need to propagate the subset subscription because the MA already has information for a superset of the subscription.

Based on the foregoing, the preferred message routing protocol is a topic-based routing protocol in which authorization is indicated in a mapping between subscribers and corresponding topics. Authorization is specified for each subscriber or group/category of subscribers, indicating what messages the subscription has the right to consume or which messages the producer (publisher) can produce (publish). These authorizations are defined in the P & M system, transmitted to all MAs in the publish/subscribe network, and then used by the MAs to create and update their routing tables.

All messages routed in the publish/subscribe network are received or sent on a particular channel. The MAs use these channels to communicate with all other physical components in the publish/subscribe network. However, sometimes these interfaces are interrupted or the destination cannot keep up with the load. In these situations, or other similar situations, the message may be recalled from the storage device and retransmitted. Thus, an MA may be effectively associated with a Cache Engine (CE) whenever store and forward functionality is required. In addition, because reliability, availability, and consistency are often necessary in enterprise operations, a publish/subscribe middleware system may be designed for fault tolerance, with many of its components deployed as fault tolerant systems.

For example, MAs may be deployed as fault-tolerant MA pairs, with a first MA referred to as a primary MA and a second MA referred to as a secondary MA or fault-tolerant MA (ft MA). Then, for store and forward operations, the CE (cache engine) may be connected to the primary or secondary core/edge MA. When a primary or secondary MA has an active connection to a CE, it forwards all or a subset of the routed messages to the CE, which writes them to a storage area to achieve persistence. These messages are then available for retransmission when requested for a predetermined period of time. In addition, as shown in FIG. 2, CEs may be deployed as fault tolerant CE pairs, where a secondary CE takes over a primary CE upon failure.

As shown in fig. 6, the CE is directly connected to the MA via a physical medium, and it is designed to provide features of a store-and-forward architecture in a high-capacity and low-latency messaging environment. Next, FIG. 7 is a block diagram illustrating a CE configured according to one embodiment of the present invention.

The CE 700 performs many functions. For message data persistence, one function includes receiving data messages forwarded by the MA, indexing them with different message header fields, and storing them in the memory area 710. Another function includes responding to message retrieval requests from the MA and retransmitting messages that have been lost or not received (and thus requested again by the client).

In general, CEs are built on top of the same logical layer as MAs. However, it is local (e.g. Tervela)^TM) The messaging layer is greatly simplified. Because all messages are processed locally at the CE and delivered to its administrative message layer 714 or its caching layer 702, as opposed to being routed to another physical component in the publish/subscribe network, no routing engine logic is required. As previously described, management messages are typically used for management purposes in addition to retrieval requests that are forwarded to the caching layer 702. All data messages are forwarded to a caching layer, which indexes the messages first with topic-based indexes using an indexing service 712, and then to a storage service 708 for storage of the messages in a storage area 701 (e.g., RAID, diskEtc.). All data messages are kept in the storage area 710 for a predefined period of time, the storage area 710 often being a redundant persistent memory. The indexing service 712 is responsible for "garbage collection" activities and informs the storage service 708 when expired data messages need to be discarded from the storage area.

The CE may be a software-based solution or an embedded solution. More specifically, the CE may be configured as a software application running on an Operating System (OS) in a high-end server. Such a server may include a high performance NIC (network interface card) to increase the data transfer rate to and from the MA. In another configuration, the CE is an embedded solution for accelerating network I/O (input/output) to and from the MA and accelerating storage I/O to and from the storage area. Such embedded solutions may be designed to efficiently stream data to one or more disks. Thus, for overall improved performance, CE implementations are designed to maximize the MA-CE-storage data transfer rate and to minimize the retrieval latency of requested messages.

For example, to maximize data transfer between the MA and CE, their communication links are implemented as direct 10Gigabit/s Ethernet fiber optic interconnects or any other high throughput and low latency interconnect, such as Myrinet. Also, to increase throughput on the link, the CE may encapsulate as many messages as possible into a single large frame. In addition, the software-based CE communicates with the MA via remote direct memory access, which bypasses the CPU (central processing unit) and OS, thereby maximizing throughput and minimizing latency. The CE then distributes disk I/O among the multiple storage devices in order to maximize storage I/O efficiency. In one implementation, the CE uses a combination of distributed database logic and distributed high-performance redundant storage techniques. In addition, to minimize retrieval latency for requested messages, one implementation of the CE uses RAM (random access memory) to keep the index and the most recent or most frequently retrieved messages before dumping (flush) the messages to a storage device.

When interfacing with the MA, the CE handles two types of messages, one type being normal or complete data messages and the other being incomplete or partially published data messages. Specifically, when the indexing service 712 of the CE 700 receives a partially published message, it compares the message against the last known complete message of the same topic, which is also depicted as the master image of the partially published message. The indexing service 712 maintains a master image in RAM (not shown) for all complete messages. A partially published message (a message update with a new value) replaces the old value in the master image of the message while maintaining an unchanged value that is not updated thereby. Like any other data message, partially published messages are indexed and available for retransmission. Also, like any other message recorded by a CE, the primary image may be used for retransmission in addition to possibly being provided as a different message type or its message header flag may have a different value indicating that it is the primary image. In fact, the master image may be valuable to applications, and using their respective APIs, such applications may request a master image of a partially published message stream at any given time. Such an application then receives the partially published message updates.

In order to provide a combined, guaranteed simultaneous connected and guaranteed simultaneous disconnected quality of service (QoS), the message network fabric must always provide data persistence and integrity. To provide a fault-tolerant persistent caching solution, these caching engines may be configured and deployed as fault-tolerant pairs consisting of primary and secondary CE pairs, or as fault-tolerant groups consisting of more than two CE nodes. If two or more caching engines are logically linked to each other via subscription based on the same topic, they subscribe to the same data and thus maintain a unique and consistent view of the subscribed data. In the event of a data loss, the caching engine may request retransmission of the lost data from other caching engine members in the fault-tolerant group. Data synchronization between caching engines in the same fault-tolerant group is parallelized by the messaging fabric that intelligently and efficiently forwards multiple copies of subscribed messaging traffic to all caching engine instances via the MA. As a result, this enables asynchronous data consistency for fault tolerant and disaster recovery deployments, where data synchronization and persistence are performed and guaranteed through a messaging network fabric, as opposed to using storage/disk mirroring or database replication techniques.

One of the benefits of utilizing messaging network fabric for redundancy and data consistency is reduced bandwidth usage caused by synchronization traffic compared to data and index (for database replication) and/or disk storage overhead (for remote disk mirroring) because only data is synchronized between cache engines. A second benefit is that message ordering is resolved because the messaging layer has guaranteed the order of messages for any given subscription.

For further illustration, fig. 8 shows a messaging appliance with a cache engine fault tolerant pair configuration and describes the failover process of an API from a primary MA to a secondary MA.

Before the CE down time, i.e., at phase #1, both caching engines receive the same subscription messaging traffic because they both subscribe to the same topic. When the primary cache engine fails, event #2, the MA detects the failure and then transitions to the secondary MA (which takes over the primary MA), which in turn causes the API to also transition to the secondary MA. At a later time, the primary cache engine resumes operation, event # 3; it will restart its subscription and when data is received it will detect data loss on all its subscriptions. The missing data will be requested by sending one or more retransmission requests per subscription to the secondary caching engine. The data synchronization phase will start between the primary and secondary caching engines, using messaging logic.

In one embodiment of the invention, data synchronization traffic will construct a synchronization path #1 through the messaging network as shown in FIG. 8. The path may be configured not to exceed a predefined message rate or a predefined bandwidth. This may be critical to disaster recovery configurations where the primary and secondary caching engines are located in different geographical locations, utilizing reduced bandwidth inter-site links such as WAN links or dedicated fiber connections.

Alternatively, in another embodiment of the invention, data synchronization traffic will go through an alternative high-speed interconnect direct link or switch, such as Infiniband or Myrinet, to isolate the synchronization traffic from conventional messaging traffic. Such an alternate synchronization path #2 may be used as a primary or backup link for synchronization traffic. The link may be statically configured as a dedicated synchronization path or may be dynamically selected in real time based on the total messaging network fabric load. The caching engine or messaging appliance may make a decision to move the synchronization traffic away from the messaging network fabric towards the alternate synchronization path.

When synchronization is complete, event #4, the primary CE is ready to take over. At this point, the primary MA may become active or remain inactive until the secondary CE and/or MA fails.

In summary, the present invention provides a new method for transporting messages, and more particularly, an end-to-end publish/subscribe middleware architecture with fault tolerant persistent caching capability that improves the efficiency of messaging systems, simplifies the manageability of caching solutions, and reduces recovery latency for various levels of guaranteed delivery quality of service. Although the present invention has been described in considerable detail with reference to certain preferred versions thereof, other versions are possible. Therefore, the spirit and scope of the appended claims should not be limited to the description of the preferred versions contained herein.

Claims

1. A messaging system, comprising:

one or more applications;

a plurality of messaging appliances operative to receive and route messages, including to and from the applications; and

a plurality of caching engines arranged in a fault tolerant configuration, wherein one or more caching engines are connected to each designated messaging appliance among the plurality of messaging appliances, and wherein each of the plurality of caching engines respectively subscribes to a topic and is logically linked to any of the designated messaging appliances connected to caching engines respectively subscribing to the same topic to provide redundancy such that all caching engines of a group of caching engines subscribing to the same topic receive the same message data and maintain a consistent, synchronized view of all message traffic related to the topic.

2. A messaging system as in claim 1, having a messaging network fabric for routing the message traffic, the messaging network fabric comprising the plurality of messaging appliances and being operative to provide a consistent, synchronized view via the messaging network fabric or, if a direct connection exists between caching engines, a consistent, synchronized view via the direct connection, wherein real-time failover is determined by a messaging appliance or a caching engine based on a load of the messaging network fabric.

3. A messaging system as in claim 2, wherein the direct connection comprises a high speed direct connection or a switch.

4. A messaging system as in claim 3, wherein the high speed direct connection comprises an Infiniband or Myrinet interconnect.

5. A messaging system as in claim 1, wherein to maintain a consistent synchronized view, each caching engine is operative to obtain the message data using a predefined bandwidth and/or message rate.

6. A messaging system as claimed in claim 1, operative such that when one or more cache engines fail, any other cache engine remaining active connected to the same messaging appliance takes over the failed cache engine and another messaging appliance logically linked to the cache engine of the failed messaging appliance takes over the failed messaging appliance if there are no active cache engines remaining or when any other failure involves the messaging appliance, wherein any take over is transparent to the one or more applications logically connected to the failed cache engine and/or messaging appliance.

7. The messaging system of claim 6, further operative to cause any failed cache engine that has recovered to retrieve lost data by requesting another cache engine that remains active to send it the lost data.

8. A messaging system as in claim 1, wherein each caching engine has:

a message layer operative to send and receive messages,

a caching layer with an indexing service that operates to first index received messages and to maintain a mirror of received partially published messages,

a storage device and a storage service, the storage service operative to store all or a subset of received messages in the storage device,

one or more physical channel interfaces for transmitting received and transmitted messages, an

A message transport layer with channel management for controlling the sending and receiving of messages over each of the one or more physical channel interfaces.

9. A messaging system as in claim 8, wherein the storage device in each caching engine is operative to allow stored received messages to temporarily remain available for retransmission upon request from the caching engine.

10. A messaging system as in claim 1, further comprising a messaging fabric and a provisioning and management system linked to the messaging appliances via the messaging fabric and configured for exchanging management messages with each messaging appliance.

11. A messaging system as in claim 1, wherein each messaging appliance is further operative to perform routing of messages by dynamically selecting a message transmission protocol and a message routing path.

12. A messaging system as in claim 1, wherein the messaging fabric comprises an interconnect that is a channel-based, fabric-independent physical medium.

13. A messaging system as in claim 12, wherein the interconnect is configured as ethernet, memory-based direct connection, or Infiniband.

14. A messaging system as in claim 12, wherein the interconnect is configured to operate as a direct 10Gigabit ethernet fiber optic interconnect or a Myrinct interconnect for high throughput and low latency.

15. A messaging system as in claim 1, wherein the message is constructed with a schema and a payload that are separated from each other when a message enters the messaging system and combined when a message leaves the messaging system.

16. A messaging system as in claim 10, wherein the messages and administrative messages have a topic-based format, each message having a header and a payload, the header including a topic field in addition to source and destination namespace identification fields.

17. A messaging system as in claim 1, wherein the message comprises a subscription message having a topic field with a variable length string having any number of wildcard characters for matching any topic substring with the subscription message if such topic substring and the subscription message have the same number of topic substrings.

18. A messaging system as in claim 1, wherein the caching engine is operative to provide quality of service functions including message data store and forward functions.

19. A messaging system as in claim 8, wherein the storage associated with each caching engine comprises a plurality of storage devices operative for distributed message input/output.

20. A messaging system as in claim 8, wherein the message layer in each cache engine comprises a management message layer operative to process management messages.

21. A messaging system as in claim 8, wherein the message layer in each message caching engine is operative to retrieve the requested message from the caching layer and to format the received message with a header field and a payload.

22. The messaging system of claim 8, wherein the caching layer further comprises a random access memory, and wherein the indexing service is further operative to maintain the mirror in the random access memory.

23. A messaging system as in claim 8, wherein the image of each partial publication message received and maintained by the caching layer includes updated and old values that have not been changed by the updates.

24. A messaging system as in claim 9, wherein the time period during which the message is temporarily held in the storage device so as to be available for retransmission is predetermined.

25. A messaging system as in claim 8, wherein the storage device is a redundant persistent storage device.

26. A messaging system as in claim 1, provided as a software-based or embedded-based configuration.

27. A messaging system as in claim 1, embedded in a software application running on an operating system.

28. A messaging system as in claim 1, wherein the consistent, synchronized view of the messaging traffic enables the messaging system to provide a messaging quality of service that includes one or a combination of partial publication, merging, guaranteed simultaneous connectivity, and guaranteed simultaneous non-connectivity.

29. A method for providing quality of service in a messaging system, comprising: provide for

Arranging a messaging network fabric having a plurality of messaging devices;

arranging a plurality of caching engines in a fault tolerant configuration, wherein one or more caching engines are connected to each designated messaging appliance among the plurality of messaging appliances;

logically linking each of the plurality of caching engines, by subscription topic, to any of the designated messaging appliances to which one or more other caching engines, co-subscribing with such caching engine to a similar topic, are connected, to provide redundancy,

for each group of caching engines subscribing to the same topic, synchronizing all caching engines in the group such that all caching engines in the group receive the same message data and maintain a consistent, synchronized view of all message traffic related to such topic, and wherein such synchronization enables the provision of messaging quality of service.

30. The method of claim 29, wherein the messaging quality of service includes partial publish, merge, guaranteed simultaneous connected, and guaranteed simultaneous disconnected messaging.

31. The method of claim 29, further comprising, when one or more cache engines fail, taking over the failed cache engine through any other cache engines connected to the same messaging appliance and remaining active, taking over the failed messaging appliance through another messaging appliance logically linked to the failed cache engine if there is no active cache engine remaining or when any other failure involves the messaging appliance.

32. The method of claim 29, further comprising interfacing between each of the cache engines and one or more applications via their respective designated messaging appliances, wherein any failover is transparent to the one or more applications logically connected to the failed cache engine and/or messaging appliance.

33. The method of claim 29, wherein maintaining a consistent, synchronized view is accomplished via the messaging network fabric or, if there is a direct connection between caching engines, via such a direct connection, wherein real-time failover is decided by the messaging appliance or caching engine based on the load of the messaging network fabric.

34. A method for providing quality of service with a cache engine, comprising

In a caching engine having a message transport layer, a management message layer, and a caching layer with an indexing service and associated storage, performing the steps of:

receiving data and management messages through the message transport layer;

forwarding the management message to the management message layer and the data message to the caching layer, wherein message retrieval request messages forwarded to the management message layer are routed to the caching layer;

indexing the data message in the indexing service, the indexing being topic-based; and

storing the data message in a storage based on the indexing, wherein the data message is maintained in the storage for a predetermined period of time during which the data message is available for retransmission in response to a message retrieval request message.

35. A method for providing quality of service using a caching engine as claimed in claim 34, wherein the data message is a complete data message or a partially published data message.

36. A method for providing quality of service utilizing the caching engine of claim 35, wherein each data message has an associated topic, wherein the indexing service maintains a master image of each complete data message, and for a received data message that is a partial complete message, the indexing service compares the received data message against a most recent master image of a complete message having an associated topic similar to the topic of the partial published message to determine how the master image should be updated.

37. A method for providing quality of service using a caching engine as claimed in claim 35, wherein the partially published messages are indexed and available for retransmission.

38. A method for providing quality of service utilizing the caching engine of claim 36, wherein the master image is indexed and available for retransmission.

39. A caching engine in a messaging system, comprising:

a message layer operative to send and receive messages;

a caching layer having an indexing service operative to first index received messages and to keep a mirror of received partial published messages, a storage service operative to store all or a subset of received messages in the storage device where the messages are temporarily kept available for retransmission when requested;

one or more physical channel interfaces for transmitting received and transmitted messages; and

40. The cache engine of claim 41, deployed as part of a pair or set of fault tolerant cache engines and having fault tolerance capability, wherein a secondary cache engine takes over a primary cache engine when a failure occurs.

41. A cache engine as defined in claim 42, wherein the message layer comprises a management message layer operative to process management messages.

42. A cache engine as defined in claim 39, wherein the message layer is operative to retrieve the requested message from the cache layer and to format the received message with a header field and a payload.

43. A cache engine as defined in claim 39, wherein the caching layer further comprises a random access memory, and wherein the indexing service is further operative to maintain the mirror in the random access memory.

44. A cache engine as defined in claim 39, wherein the image of each partial publication message received and maintained by the cache layer includes updated and old values that are not changed by updates.

45. A cache engine as defined in claim 39, wherein the time period during which the message is temporarily held in the storage device so as to be available for retransmission is predetermined.

46. A cache engine according to claim 39, wherein the storage device is a redundant persistent storage device.

47. A cache engine according to claim 39, provided as a software-based or embedded-based configuration.

48. A cache engine as recited in claim 39, embedded in a software application running on an operating system.

49. A cache engine as defined in claim 39, operative to provide partial data publication services and guaranteed-simultaneous-connected and guaranteed-simultaneous-disconnected message delivery quality of service.

50. The cache engine of claim 39, wherein the storage device comprises a plurality of storage devices operative for distributed message input/output

51. A messaging system as in claim 1, further comprising a setup and management system that operates management operations for the caching engine.

52. A messaging system as in claim 1, further comprising one or more application programming interfaces operative to allow the application to publish and subscribe in a native message format.

53. A messaging system as in claim 1, further comprising one or more protocol translation engines associated with any of the messaging appliances and operative to allow the application to publish and subscribe in external message formats.