Migrating the Web toward Distributed Objects

Dan Larner, larner@parc.xerox.com

Copyright 1996 Xerox Corporation -- All Rights Reserved

Abstract

Although current Web technologies have been doing a remarkable job in providing information exchange, they are often a collection of ad hoc mechanisms being pushed to limits for which they were not originally intended. Object oriented design approaches have some distinct advantages when dealing with complexity. They provide for explicit separation of the concepts of interface, communication and implementation, resulting in more reliable, manageable and extensible systems. When extended to distributed object systems, these concepts are bolstered by transparency of location and language. Distributed object systems seem to be a fine foundation for creating complex distributed systems, Web services of the future in particular. However one cannot ignore the past, and any new infrastructure must support the existing mechanisms for a lengthy migratory period. Inter-Language Unification (ILU), a distributed object system from Xerox PARC, is sufficiently flexible to provide such migratory functionality. ILU allows existing services and distributed object services to each appear native to the other; in fact, interoperation between ILU and existing HTTP clients and servers is already in place.

The Current State of the Web

The Web today provides a great deal of power. Using this medium, the gathering and exchange of information has exploded over the past few years. The Web not only offers a tremendous tool for the present; indications are that its use will only expand in the future.

However, Web users are experiencing the effects of the limitations of the technologies underlying the Web's services. Access can be problematic due to clogged servers and bandwidth limitations. Integration of the Web with other user tools is spotty at best, and non-existent at worst. A peek under the covers reveals that all this power is being provided by myriad collections of ad hoc mechanisms, custom implementations, and protocols (often quickly modified for the moment) that were never really designed with this kind of growth in mind. An excellent critique of the current Web can be found in [1].

At the most fundamental level, all of the activity on the Web is aimed at performing operations on distributed collections of resources [2]. Many of today's Web technologies provide little, if any, separation between a resource's 'interface' (i.e., the specification of operations that it supports), the communication 'protocols' used to convey these operations from point to point, and the 'implementation' (i.e. the actual program that carries out the computation associated with the operations). The separation of these aspects is well recognized for its central importance to the construction of reliable, maintainable, composable and extensible systems, especially in the face of tackling problems of increasing complexity. These same concepts brought object-oriented programming to mainstream design, caused the layered design of communication protocols, and produced the notion of an API. Without explicit manifestations of these concepts, systems become fragile, and difficult to understand and modify, which, in turn, limits their life and usefulness, and reduces return on investment.

Distributed Object Systems

In object-oriented programming systems, these separate concepts of interface, communication, and implementation are central themes. A distributed object system promotes the advantages of object-oriented programming languages to a network wide context, making it a natural foundation for well designed distributed systems such as the Web. It does this primarily by providing transparency of location, and transparency of implementation language. It provides the freedom to "Use the most appropriate tools for the task".

Location transparency allows a programmer to perform operations on objects without specific regard to whether the object is local to the program, in a different process, or on a distant machine. No special syntax distinguishes local from remote calls, and the semantics of a call are consistent no matter what the case. The distributed object system deals with all the low-level details of establishing network connections, marshaling arguments un-marshaling return values, etc. This frees the programmer to concentrate on the real problem at hand rather than all the low-level plumbing. Location transparency allows systems to be built using different systems for different parts. Each part can run where it makes the most sense (e.g. a driver can run close to a particular peripheral, a math intensive routine can run on a high speed processor, etc.) Location transparency also enables the creation of systems which are more reliable, by facilitating the design of redundant functionality. For example, in the event that an existing server goes off-line, unbeknownst to the client the system can arrange to contact a different server.

Language transparency allows a system to be constructed whose parts may be implemented in different programming languages. That is, a program written in one language, say C++ or Python, can perform operations on objects written in another language, say Java or Lisp. The program does this without even knowing that the other object's implementation is non-native. This is accomplished by using precise descriptions of object interfaces written in a language neutral interface specification language. Language-specific compilers input these descriptions and output language specific 'stub' code that allows programs written in a particular language to operate with objects fitting those descriptions. This concept of an interface specification language makes it especially easy to provide object oriented interfaces for legacy code - the object's implementation simply converts method invocations to calls to the existing code. This allows legacy systems to continue to provide value in new situations. Language transparency pushes the potential for code reuse to even greater levels, readily admitting a choice in the most appropriate language for various parts of the overall task.

There is a cost for any technology that enables the creation of new and better systems, there is a cost, and distributed object systems are no exception. Location can't be totally transparent in today's systems because of its timing implications. Calls which are to remote systems take roughly an order of magnitude longer than calls that occur between processes on a single system, and these take roughly an order of magnitude longer than calls that occur within a process. Thus, at the system architecture level, it's still important to partition functions with a high degree of cohesion into units which are colocated.

Because of the parallelism often induced in distributed systems, more attention must be paid to resource allocation and sharing. In addition, if not accounted for in the design, a failure on one node can bring the entire system to a halt. If replication is introduced to make the design more fail-safe, then issues of data replication and degree of consistency come into play. Separation of computation into multiple processes and machines and languages also makes debugging more difficult.

Whether the benefits of distributed object systems outweigh the difficulties they bring with them depends on the situation of course. No technology is a substitute for careful system design and engineering, but distributed object systems should enable the creation of systems which were not easily possible before. Distributed object system technology can provide a well-defined, strong, extensible foundation for future Web technologies.

In support of these tenets, are the efforts of the Object Management Group (OMG) [3]. The OMG is a non-profit consortium of over 600 developers and users, including many major forces in the computing industry. Its promotes the use of object technologies for the development of distributed computing systems by developing and standardizing common architectural frameworks for object-oriented applications.

One of the OMG's specified frameworks is the Common Object Request Broker Architecture (CORBA). It specifies a foundational infrastructure that lets objects operate with one another. Mappings of these concepts to major programming languages are defined, as are common communication protocols to be used by Object Request Brokers (ORBs). This allows interactions to be independent of the actual platforms and languages used to implement the objects. By complying with the CORBA standard, portability and interoperability of objects is achieved when using tools supplied by different vendors. Additional OMG specifications, at various levels of standardization include Common Object Services (e.g. Naming, Events, etc.), Common Facilities, and more.

ILU

Inter-Language Unification (ILU) from Xerox PARC [4] builds on the CORBA distributed object standard from the OMG, but offers a number of advantages over other distributed object systems that make it particularly attractive as a foundation for new Web technologies: Additional espousing on the use of ILU for the Web can be found in [5].


Pragmatics & Migration

In spite of the fact that distributed object systems can provide a foundation for Web technologies with a life far into the future, one would be a naïve to think they can replace the existing services in one fell swoop. A means must be provided for a period of migration to any new approach. During this period, the distributed object implementations of Web services must be accessible to existing clients, and clients constructed with the new approach must be able to interact with existing services.

For example, a Web browser must be able to contact an object specified with a URL, and perform operations such as HTTP's GET HEAD and POST operations [6]. Similarly, a client based on distributed objects must be able to accept a URL and see it as an object that supports these operations. Leveraging the capabilities of both existing Web services and distributed object systems together in a freely interactive manner, immensely bolsters their usefulness.

Approaches to the integration of the Web with distributed object systems have focused on bridging techniques such as [7], and/or CGI replacement/enhancement, such as [8]. The approach described in this paper is different in that it incorporates HTTP support directly into the ORB as a general communication protocol, and additionally defines a particular object type designed to allow interaction with existing Web services.

Treating Web resources as Objects

In the remainder of this paper, the focus is on the support provided for HTTP/1.0 in ILU. Other protocols, such as FTP can be addressed in a similar vein, as, presumably, can other distributed object systems. ILU support for HTTP/1.0 became available in ILU 2.0 alpha8.

The key to providing the integration desired lies in the similarity between the URL's used in HTTP, and the String Binding Handles (SBH) ILU uses to identify objects. An SBH specifies everything that is needed to identify and contact a particular object, just as a URL does for Web resources Let's consider an SBH as an example: A full SBH for an object in one of ILU's sample programs is:

ilu:TimingTest.figtree.parc.xerox.com/0;ilu%3Ac0FGHuC8UdTAO+ETWz1Nl5RjWa3;sunrpc_2_0x61a78_1139713249@sunrpcrm=tcp_13.1.100.126_1588

The breakdown of this SBH is (where the initial /, ; and @ are simply token dividers):

Now lets take a look at a URL used with HTTP. For example: http://www.parc.xerox.com:80/index.html Here we have:

A URL specifies the host, protocol, transport/port, and resource identifier. These map to direct analogs in an SBH. Consider what happens if, in ILU, we create an ilu_Server entity on host www.parc.xerox.com at TCP port 80, that 'understands' HTTP. Then any Web browser that tries to access the URL http://www.parc.xerox.com:80/index.html will end up sending its HTTP request to that ilu_Server. The ilu_Server can treat the /index.html as an object identifier, and invoke the appropriate method (GET, HEAD, or POST) on the object that has that identifier. This method's implementation is determined by the programmer, and different derived types of HTTP resources can have different method implementations.

Methods might perform a simple task, such as packing a files contents into an entity body, emulating typical document retrieval. Methods may instead perform complex computations to dynamically determine response content, based on the composition of the request and other outside information (e.g. previous requests, real-world values, data base content, interaction with other services, etc.). The latter is what is typically done with current CGI approaches; when using ILU, however, the method is efficiently invoked within the same process. (Note that one possible method implementation could be calling out to existing CGI programs when necessary. This is an example of the legacy code encapsulation mentioned previously.)

The return value from the call on the object can then be packaged up in an HTTP response, and sent back to the Web Browser. A natural question to ask at this point is "Doesn't that mean that all the objects that could possibly be referenced have to be loaded up and ready to go all the time?" The answer is no. ILU allows functionality to be associated with a server that will create/load objects dynamically as needed. (A simple Web Server program, webserver, included in the httest test suite mentioned below illustrates on-the-fly object creation.)

On the other hand, an ILU application needs to be able to treat a resource being served up by an existing Web server as if it were an object, with GET HEAD and POST methods. In ILU, as in other distributed object systems, non-local objects are represented by 'surrogate' objects (other systems may call them 'proxies'). The job of a surrogate is to act as a stand-in for the actual object - forwarding any calls it receives to the actual remote object, and returning any results accordingly. (again, this is grossly over simplified, but conceptually correct).

So if the surrogate knows how to use HTTP as its communication protocol, it can format up an HTTP request embodying the arguments passed to the call on the object, send it over to the server, get back the response, and package that up in the form that the caller expects. In ILU, a function that creates a surrogate object from an ILU SBH has been enhanced to also accept a URL and return the appropriate type of surrogate. Now, an ILU application can think of the world as objects, whether they be real objects in the distributed object sense, or objects whose implementations are actually supplied by existing HTTP servers.

A test suite called httest (one of the examples that now comes with ILU) illustrates Web Browser-to-ILU, ILU-to-Web Server, and general ILU-to-ILU over HTTP. The latter is of interest not because of general object method invocation with HTTP (there are more efficient protocols for this), but rather because it allows general object interaction to be carried on between systems separated by firewalls, thus letting HTTP pass through. A brief description of the httest example, and sample output are in the Appendix.

HTTP in ILU

An ILU application that wishes to interact with an existing Web resource must be able not only to get an object (a surrogate, actually) representing the resource; it must also have some means for specifying the HTTP headers and entity body that should be sent with the request. Similarly, an ILU server functioning as an HTTP-accessible Web resource must be able to set status, header and entity body content.

Arbitrary programmers interpretations of these HTTP components cannot be generally mapped into HTTP. A specific signature is needed for the GET HEAD and POST methods, so that the ILU implementation of the HTTP protocol knows how to map arguments into actual HTTP format. In addition, we need a way to distinguish these methods, which are meant to be used with existing Web services, from other methods that may happen to have the same name but different signatures.

We address this need by defining a specific base object type with declarations for structuring the arguments to, and return values from, the GET HEAD and POST operations. Any GET HEAD or POST operation invoked on an object that is an instance of this base type (or an instance of a type derived directly or indirectly from this base type) has a particular signature that ILU knows how to map to HTTP. This base type is only a slight modification to the type defined in the ILU-Requester work [9]. (The modification is basically to omit the "connection" argument - this sort of information can quite easily be contained in a normal header name/value field.) An abbreviated definition of this type is shown below using ILU's Interface Specification Language.


(*-------------------- Header related Types ----------------------- *)

TYPE field-name = ilu.CString;              (* a header field-name  *)
TYPE field-value = ilu.CString;             (* a header field-value *)
TYPE optional-field-value = OPTIONAL field-value; (* value optional *)

TYPE Header = RECORD                              (* message header *)
   name  : field-name,
    value : optional-field-value
  END;

TYPE HTTPHeader = Header;
TYPE HTTPHeaders = SEQUENCE of HTTPHeader;       (* all the headers *)

(* -------------------- Entity Body related Types ----------------- *)

TYPE EntityBody = SEQUENCE of BYTE;              (* the entity body *)
TYPE OptionalEntityBody = OPTIONAL EntityBody;   (* bodies optional *)

(* -------------------- Request URI related Types ----------------- *)

TYPE RequestURI = ilu.CString;

(* -------------------- Full Request Types ------------------------ *)

TYPE Request = RECORD               (* 'mostly' a http full request *)
    URI      : RequestURI,              
(* This can be the absoluteURI or abs_path uri - including params,
queries, etc. (if it's the full absoluteURI or abs_path, then the
scheme, netpath, and path portion of this should be http:, the
netpath should agree with the server id, and the path the same as
the object ID although this isn't checked), OR more commonly it can
be just the params, queries, e.g. ;foo;bar?zap *)

    headers : HTTPHeaders,              
(* the general, request and entity headers NOTE: if the user didn't
supply a Content-Length header, ILU's http will automatically put
in a Content-Length header if an Entity body is supplied. Note that
when responding to a HEAD method then (since there is no body) the
user should supply a Content-Length header. *)

    body     : OptionalEntityBody       (* may or may not be a body *)

  END;

(* -------------------- Response related Types -------------------- *)

TYPE StatusCode = ENUMERATION  (* some possible status return codes *)
    OK = 200,
    Created = 201,
    etc.
  END;

TYPE Response = RECORD                      (* a http full response *)
    status  : StatusCode,        (* status of servicing the request *)
    headers : HTTPHeaders,  (* general, response and entity headers *)
    body    : OptionalEntityBody        (* may or may not be a body *)
END;


(* -------------------- Resource Object --------------------------- *)

TYPE Resource = OBJECT   (* object that knows standard http methods *)
  TYPEID Ilu_Http_1_0_resource_object

   (* std. http 1.0 methods, each takes request, & returns response *)
  METHODS       

    GET  (request: Request) : Response, 
    HEAD (request: Request) : Response,
    POST (request: Request) : Response

  END;

Thus, a method named GET HEAD or POST, invoked on an object that is a direct or indirect instance of this type, automatically has its Request and Response mapped to/from HTTP in a manner compatible with existing Web services. The fairly straightforward mapping is coarsely described in Table 1 below:

Table 1 : ILU HTTP Interface to HTTP Mapping

ILU HTTP Interface HTTP Protocol
ILU Method NameMethod name in Request's Request-Line
(If using a Proxy server, scheme + location of object +) ILU Object ID + any params/queries present in the Request.URI field Request-URI in Request's Request-Line
Request.headersRequest-Headers
Request.bodyEntity-Body in Request
Response.statusStatus-Code and Reason-Phrase in Response's Status-Line
Response.headers Response-Headers
Response.bodyEntity-Body in Response




The implementation will automatically insert a Content-Length header if possible and when necessary, and takes care of the colon separators between header names and values. It will also deal with older servers that sometimes omits the CR from the required CRLF line termination.

For other situations, i.e., general ILU-to-ILU communication that just happens to be occurring over HTTP, the mapping is still consistent with the HTTP protocol, but a more general format is used. ILU specific information such as the ilu_Server ID is placed in a header, and the marshaling of arguments is done entirely within the entity body. In keeping with some idea of human readability, marshaled arguments, with the exception of potentially huge byte-vectors, are encoded as readable ASCII strings - e.g. 3.1416 encodes as "3.1416". Readers concerned about efficiency should note that for general ILU-ILU communication, another protocol such as ONC RPC is a much better choice than the current HTTP implementation. The HTTP protocol implementation could, however, be easily changed to use a more efficient encoding, similar to what's used in ONC RPC for example.


Directions

Currently, the implementation of HTTP in ILU supports only HTTP/1.0. At the time of this writing, HTTP/1.1 was nearing the horizon, **** OK to add an editor's footnote, pointing reader to completed HTTP/1.1 spec in this issue? ABSOLUTELY!, danl **** and when actual implementations appear, the ILU support should be extended as necessary to work with, and take advantage of, the new version as appropriate.

Second, the C and C++ language mappings in the current ILU implementation require that the entire response be in memory before it can begin to be passed across the wire. While this is not a problem in situations involving relatively small content, if many large messages are to be processed, a significant copying overhead results. As a result, we must address various approaches to accommodate these transfer needs by allowing method arguments and return values to be indirectly referenced (e.g., via pipes).

Finally, HTTP is only one of many protocols in use on the Web. FTP, Gopher, News, etc. also fit the fundamental pattern of performing operations on distributed collections of resources. Just as ILU was extended so HTTP-based resources could be viewed as distributed objects and vice versa, ILU could be extended to embrace these other protocols as well.

References

[1] W3Objects: Bringing Object-Oriented Technology to the Web, David Ingham, Mark Little, Steve Caughey, Santosh Shnvastava, The World Wide Web Journal, Issue 1, Dec 95, O'Reilly, http://www.w3.org/pub/WWW/Journal/1/ingham.141/paper/141.html

[2] WWW and OOP, Dan Connolly, http://www.w3.org/pub/WWW/OOP/Activity.html

[3] Object Management Group Home Page, http://www.omg.org/

[4] Inter-Language Unification, Xerox PARC, ftp://parcftp.parc.xerox.com/pub/ilu/ilu.html

[5] Why ILU? -- /OOP and the Web, Dan Connolly, http://www.w3.org/pub/WWW/OOP/WhyILU.html

[6] Hypertext Transfer Protocol -- HTTP/1.0, T. Berners-Lee, R. Fielding, H. Frystyk, RFC 1945, http://ds.internic.net/rfc/rfc1945.txt

[7] A Web of Distributed Objects, Owen Rees, Nigel Edwards, Mark Madsen, Mike Beasley, Ashley McClenaghan, The World Wide Web Journal, Issue 1, Dec 95, O'Reilly, http://www.w3.org/pub/WWW/Journal/1/rtor.085/paper/085.html

[8] CorbaWeb: A Generic Object Navigator, Philippe Merle, Christophe Gransart, Jean-Marc Geib, Fifth International World Wide Web Conference, May 6-10, 1996, Paris, France, http://www5conf.inria.fr/fich_html/papers/P33/Overview.html

[9] The ILU Requester: Object Services in HTTP Servers, Paul Everitt, W3C Informational Draft 07-Mar-96, http://www.w3.org/pub/WWW/TR/WD-ilu-requestor

Appendix - httest Example - Description and Output

The httest example contains 2 programs, htserver and htclient, used to test and demonstrate the use of the HTTP protocol within ILU. These programs show


htserver overview

The htserver program creates 2 objects;

Each object is serviced by it's own ilu server (that's just how it was written) and each object is also 'Published' using ILU's simple publish and lookup functions.

Usage: htserver [port_number [ HOSTNAME [verbose] ] ]

htclient overview

The htclient program accepts a URL as it's first argument. This is treated as an identifier for an http_Resource object (which may reside in side an existing Web server). GET HEAD and POST calls are made on this object and the results displayed. (If the URL is literally NIL, then this test is skipped.)

If the second argument is provided, then htclient assumes that is should call the GET HEAD and POST methods on the httpderived_obj0 implemented by htserver, as well as call its flipcase method using the argument as the argument to the method. Results of these calls are displayed.

Usage: htclient HttpURL [[string_to_flipcase] [ HOSTNAME ]]

Use of Proxy Servers

If your site requires use of proxy servers for access outside your site, and if you wish to try running operations across this 'firewall', then set the environment variable ILU_HTTP_PROXY_INFO to be the hostname of your proxy server, followed by a colon (:) and by the port number of the proxy (e.g., wwwproxy.my.site.com:8000).

Examples of Running

Note: Only abbreviated output for GET operations between ILU and existing services is shown. ILU automatically adds Content-Length headers where required - these are not shown in the program's output.

1. To illustrate ILU operating with / obtaining an existing Web server document

>htclient http://pundit.parc.xerox.com/simple.txt

[The Request sent to the Web Server]
---------------- Resource Test --------------------
Request: (Note all values are shown between >< s)
URI = >http://pundit.parc.xerox.com/simple.txt<
Number of headers = >1<
Header 0
        field-name = >User-Agent<
        optional-field-value = >ILU-HTTP-Object-Client/1.0<
Body is:
>Sample Request Body Bytes<

[The response from the Web server]
---------------------------------------------------
Calling GET on http_obj ---------------------------
Response: (Note all values are shown between >< s)
Status = >200<
Number of headers = >7<
Header 0
        field-name = >Server<
        optional-field-value = >HTTPS/0.96<
Header 1
        field-name = >Allow<
        optional-field-value = >GET HEAD POST<
Header 2
        field-name = >MIME-version<
        optional-field-value = >1.0<
Header 3
        field-name = >Content-type<
        optional-field-value = >text/plain<
Header 4
        field-name = >Date<
        optional-field-value = >Thursday, 18-Apr-96 4:12:27 GMT<
Header 5
        field-name = >Last-modified<
        optional-field-value = >Thursday, 18-Apr-96 4:20:9 GMT<
Header 6
        field-name = >Content-length<
        optional-field-value = >75<
Body is:
>This is the first line of simple.txt
This is the last line of simple.txt
<

2. To illustrate an existing Web Browser accessing the htserver supplied object

[Begin the server program]
>htserver 80 pundit verbose

[Now a browser is asked to retrieve the URL
http://pundit.parc.xerox.com/http_obj0]

------------------------------------------
_server_http_Resource_GET called
Request: (Note all values are shown between >< s)
URI = >/http_obj0<
Number of headers = >4<
Header 0
        field-name = >Connection<
        optional-field-value = >Keep-Alive<
Header 1
        field-name = >User-Agent<
        optional-field-value = >Mozilla/2.0 (WinNT; I)<
Header 2
        field-name = >Host<
        optional-field-value = >pundit.parc.xerox.com<
Header 3
        field-name = >Accept<
        optional-field-value = >image/gif, image/x-xbitmap,
image/jpeg,  
image/pjpeg, */*<
Body is:
>NIL<


[The Browser's display now contains]

server_http_Resource_GET

3. To illustrate ILU interacting with ILU using HTTP as the means of General Object Method Invocation

        htserver 2718 pundit t

        htclient http://pundit.parc.xerox.com:2718/http_obj0
FlipMyCase

or to show raising an exception

        htclient http:// pundit.parc.xerox.com:2718/http_obj0
raiseerror