IBM and Globus Announcement - Convergence of Grid and WebServices

After progressing along somewhat independent paths of evolution and maturity, Web Services as prodded by the vision from IBM, and the Grid, as implemented by the Globus Toolkit are converging. Among the BLOG's that accompanied the Jan. 20, 2004 announcement by a broad standards body of WS Notification WS Resource , that included marketing spin from Globus and from IBM, there emerged an important, new specification for WS-ResourceProperties.

As will be explained in the following sections, the B2BPO framework is PUB/ SUB messaging from its core Architecture on out, implementing both the Globus Tools (Grid FTP) and Web Services ( Apache Axis ), providing for extensive collaboration between the respective API's. For those evaluating Web Services related software and researching the degree of concordance with emerging standards, the section below on "implementing WS Resource in B2BPO" should be of particular interest. ( TBD )

B2BPO Channel Data Command (CDC) - as WS Resource

Definition - CDC's

CDC's are mobile, descriptive, and stateful resources collaborating with the CORE B2BPO channel communication services responsible for getting Connections between computer network nodes that are located at different companies and that are linked in a Virtual BPN over the Internet. In ComputerSpeak, Channels are special facilities connecting components as in the Link between the CPU and a Storage Device. Because there are numerous companies in a Supply Chain, each with requirements for exchanging partners data, DataChannels are important as both software design abstractions and as physical entities operating in Layer 4 of IP.

Internetworking and data interchange operations of the B2BPO framework utilize resource descriptors that are distinct from the actual services being used. These resource descriptors, covering both the what and how of allocating physical connections implementing the desired transport protocol, describing file system properties used in moving B2B data between partners, collaborate with 2 other principal features in the software architecture in order to complete the data transfer work.

  • The PostOffice based Web services listen for and then accept client "calls", formatted SOAP WebService "request" messages conveying in message attachments, the encapsulated, Business data component, itself an implementation of any number of flexible data formats .
  • A context factory capable of looking at PUB/SUB properties surrounding an intention to perform a specific TopicAction ( PUBLISH "Shipping DC Departure - Newark DE DC" ) before taking steps to "bootstap" the environment in which the Web Service eventually will be performed.
Context is a little bit complex topic, but, in general its similar to Servlet Contexts. A B2BPO ChannelContext Factory is able to marshall properties from the intended data movement between partners and to answer questions like:
  • WHERE the request originated
  • WAS the request from the Data Source saying "I have data i would like to Publish"
  • WAS the request from Intended Target saying "Please check to see if there is any NEW DATA" for me
  • WILL the service actually be performed Synchronously here at the Post Office in response to the Request
  • WILL a message bearing a ChannelData Command (CDC) be sent out to another node from the Post Office, instructing the Run-time on the remote node to Perform a "WebService CALL" process after encapsulating the CDC inside a request message envelope.

Think of what FTP does? It gets a data connection between 2 computers and processes "get" or "put" commands on the connection. Processing for each of these commands result in the movement of a file (push or pull ) across the data connection A list of properties used in a typical FTP session would be ( IP, userID, password, GET OR PUT, fileName, remoteFileName ). In addition to FTP, if you extend the protocol list to include SSH, SOAP, HTTP, GridFTP, there is not a big change in the list of properties needed to define a connection and to move data on the connection. If you were to wrap all these properties ( keys and values) in a chunk of XML it would be very similar to a ChannelData Command So, the definition for a Channel Data Command is an XML node describing every property needed to build data connections between machines and to move files on the connection. When new data is created, 2 sets of CD commands control the data distribution that runs the full pub/sub circuit. A single CD publish command moves data from the client machine where the data was created to the PostOffice. A set of CD subscribe Commands dispatch copies of data to subscribers from the PO - one command used for each subscriber to the Topic.

Business Semantic - CDC Controls the movement of data

Alot of data is out there. With a simple List of destinations, using this software, you can easily control and re-direct data in your organization or amoung organizations. Whenever you create data, an automatic process gets the data to every other system that needs the data. Data is created and then linked to a dataset event. Used as a wrapper for files (dataSet) or XML (xmlNodeSet), the event's capabilities include directing a series of transport operations affecting the data's subsequent location on the network. Aware of the business semantic of the wrapped data - InvoiceDataSet for example - the event initiates all the infrastructure connections and file movement commands required to get the invoice dataSet sent to every other financial system where it will be needed. Two things are needed to accomplish this :

  1. List of Destinations - where is the data needed
  2. Transport Process - move the data to destination(s)

Channel Data Commands accomplish the 2nd item by controlling the movement of data from a source to a target. A data distribution center or PostOffice is the first stop for the new data, and then the PO forwards data to any subscribers that have expressed their interest in the data event. Following standard messaging behavior, the concept of data distribution topic (Topic ) is the mechanism that binds interested subscribers to the Topic designated in the properties of the data event. Anywhere that you create data in your business, you can issue the data event, running a simple Java Client program ( an application), supplying the program with 2 command-line arguments covering the Data Topic and Data Location. The location may have components: like FileName OR XML-RootNodeName. In the Post Office server, data movement commands are invoked in response to 2 kinds of Client requests. In the first type, the client Posts actual data, while in the second more involved scenario, the client can send the Channel Data Command but no data in which case the server's act of executing the CDC includes the responsibility to initiate the connection on which the data will be Pulled to the Post Office. In the simple scenario, as long as the Topic is known, any process that writes data can publish the data by getting an event data wrapper and then firing the event. Clients that generate new data and that want to move that data up to the PO signal an "I have data event".

Architecture - general topic includes commands

Architecture - Feature List

Description Comment
data events trigger things
Client application program creates/fires events
Browser also may be used to fire events - HTTP Request wrapper on an event
handlers consume the data events - standard Java Event propagation model
event handlers dispatch commands to do useful things
event handlers dispatch ChannelData commands
command dispatch decoupled from modules that process (consume) the command
Servlets consume CD commands (HTTP context only)
Thread pools consume CD commands (Client application program context)
protocol determines details for actual data transport ( such as FTP )
Servlets/thread Pools rely on protocol Implementations for data transport
Protocol Implementation sits behind Java Interfaces
Actual data movement via List of alternative implementations (FTP, GridFTP, HTTP, SOAP, SSH)
Very easy client-side runtime to publish the "data-is-created" eventSimple program creates the Java Event
Data transparency whether its in Files or in XML Nodes. There are other b2b data specifications like RSS for XML/Data that impose needless requirements on its items - enclosure element such as explicit attribute values for length and MIME Type. B2BPO, in contrast, seamlessly accepts any of 6 hi-level Java DataSource Interfaces with no requirements for extra properties .
Local data can be wrapped and streamed as a Remote dataSource
systems running on other hosts have transparent access to Remote data
serialization and "on-the-wire" encoding inherited from Default Type Mappings Apache Axis 1.1 Defaults
Configuration for encryption using Deployment Descriptor - no recompiles
Data Distribution from single "Published Event" Configurable for each "Subscriber" ( connection type and security )
Pubic, Interface based API's allow alternate implementations each binding to the desired Protocol ( FTP, SSH, SOAP )
Post Office architected to Public API based on XML RPC (totally open for client's via .NET or J2EE )J2EE compliant Server

Events and Listeners as PUB SUB Implementation

Anywhere in the business, on any server, Data is created and then linked to a special kind of event - data-has-been-created-event. Utilizing Listeners - standard modules designed to respond to events - this software launchs activities that move the new data to any and all systems where it is needed. The event-based paradigm means that no scripting or batch jobs (JCL) are needed - the software simply responds to the event and accomplishs the data distribution. A data distribution center or PostOffice is the first stop for the new data, and then the PO forwards data to any subscribers that have expressed their interest in the event. Following standard messaging behavior, the concept of a data distribution Topic is the mechanism that binds listeners to events. Anywhere that data is created in your business, you can run a simple client program ( an application with 4 lines of computer code) in order to trigger the data event. Event handlers then create Channel Data (CD ) commands. CD Commands wrap the process for getting connections and for moving data. Command consumers initiate all the infrastructure connections and file movement commands to get the data everywhere that its supposed to go. Depending on the context, these Commands are consumed either by a Servlet or by a Thread Pool. The actual file transport protocol is determined by a config file declaration listing what type of Factory to use for the implementation of the CD command consumer.

Context Factory

By examining a couple of diagrams explaining activities of the dispatcher in Response to the receipt by the PostOffice of Published Data, additional appreciation of Context Factorys is possible. In the center of this picture are a Lookup on Channel Data and a Work Que both of which collaborate with ChannelContext in assuring the proper distribution of Data to Publisher(s) once data arrives at the PostOffice. Bound to the Topic on which the data was published and uploaded to the PO is a List of Subscriber(s) each represented by a CDC describing What and How to's for delivering their data. The CDC, the "What and the How", the Data Dispatcher, and the idea of a Context all intersect in the Post Office activities near the Caption #7 "File Distribution" on the middle left of the picture in the link above. As the dispatcher iterates over each subscriber, a ContextFactory provides some of the classes shown to the left and center this picture. The yellow discription boxes for the following objects will help understand what's going on here as the dispatcher responsible for each subscription goes about the work of forwarding data to the Subscribers. Objects to see ( ChannelContext , ChannelDataCommand, ExchangeSession ).

TBD section from legal on the 2 scenarios.. as practical exmpl of why they matter in practice

Architecture - Java Source for selected Programs

ProgramDescription
Java Client program data-has-been-created-event Client application program creates and fires the event
FileTopicHandler event-based paradigm, this performs actions in response to data-events
Servlet Program CD Command consumer , note that constructor args include Request and NetworkFileExchangeType an instance of a Channel data command
CreateCDCommand Servlet UI requests to Create and Process a CD command to move data call on this program