For End-to-end Network Services - TESLA: A Transparent, Extensible Session-Layer Architecture

Page created by Brandon Green
 
CONTINUE READING
T ESLA: A Transparent, Extensible Session-Layer Architecture
                         for End-to-end Network Services

                         Jon Salz                                         Alex C. Snoeren
            MIT Laboratory for Computer Science                  University of California, San Diego
                    jsalz@lcs.mit.edu                                  snoeren@cs.ucsd.edu
                                             Hari Balakrishnan
                                     MIT Laboratory for Computer Science
                                              hari@lcs.mit.edu

   Session-layer services for enhancing functionality and       • Encryption services for sealing or signing ¤ows.
improving network performance are gaining in impor-             • General-purpose compression over low-bandwidth
tance in the Internet. Examples of such services in-              links.
clude connection multiplexing, congestion state shar-
ing, application-level routing, mobility/migration sup-         • Traf£c shaping and policing functions.
port, and encryption. This paper describes T ESLA, a             These examples illustrate the increasing importance of
transparent and extensible framework allowing session-        session-layer services in the Internet—services that oper-
layer services to be developed using a high-level ¤ow-        ate on groups of ¤ows between a source and destination,
based abstraction. T ESLA services can be deployed            and produce resulting groups of ¤ows using shared code
transparently using dynamic library interposition and         and sometimes shared state.
can be composed by chaining event handlers in a graph
structure. We show how T ESLA can be used to imple-              Authors of new services such as these often imple-
ment several session-layer services including encryption,     ment enhanced functionality by augmenting the link, net-
SOCKS, application-controlled routing, ¤ow migration,         work, and transport layers, all of which are typically im-
and traf£c rate shaping, all with acceptably low perfor-      plemented in the kernel or in a shared, trusted interme-
mance degradation.                                            diary [12]. While this model has suf£ced in the past,
                                                              we believe that a generalized, high-level framework for
                                                              session-layer services would greatly ease their develop-
1     Introduction                                            ment and deployment. This paper argues that Internet
                                                              end hosts can bene£t from a systematic approach to de-
Modern network applications must meet several increas-        veloping session-layer services compared to the largely
ing demands for performance and enhanced function-            ad-hoc point approaches used today, and presents T ESLA
ality. Much current research is devoted to augmenting         (a Transparent, Extensible Session Layer Architecture),
the transport-level functionality implemented by stan-        a framework that facilitates the development of session-
dard protocols as TCP and UDP. Examples abound:               layer services like the ones mentioned above.
                                                                 Our work with T ESLA derives heavily from our own
    • Setting up multiple connections between a source        previous experience developing, debugging, and deploy-
      and destination to improve the throughput of a sin-     ing a variety of Internet session-layer services. The earli-
      gle logical data transfer (e.g., £le transfers over     est example is the Congestion Manager (CM) [4], which
      high-speed networks where a single TCP connec-          allows concurrent ¤ows with a common source and des-
      tion alone does not provide adequate utilization [2,    tination to share congestion information, allocate avail-
      15]).                                                   able bandwidth, and adapt to changing network condi-
                                                              tions. Other services include Resilient Overlay Networks
    • Sharing congestion information across connections
                                                              (RON) [3], which provides application-layer routing in
      sharing the same network path.
                                                              an overlay network, and the Migrate mobility architec-
    • Application-level routing, where applications route     ture [23, 24], which preserves end-to-end communica-
      traf£c in an overlay network to the £nal destination.   tion across relocation and periods of disconnection.
    • End-to-end session migration for mobility across          Each these services was originally implemented at the
      network disconnections.                                 kernel level, though it would be advantageous (for porta-
bility and ease of development and deployment) to have        new network services. This paper describes the design
them available at the user level. Unfortunately this tends    and implementation of T ESLA, a generic framework for
to be quite an intricate process. Not only must the im-       development and deployment of session-layer services.
plementation specify the internal logic, algorithms, and      T ESLA consists of a set of C++ application program in-
API, but considerable care must be taken handling the         terfaces (APIs) specifying how to write these services,
details of non-blocking and blocking sockets, interpro-       and an interposition agent that can be used to instantiate
cess communication, process management, and integrat-         these services for use by existing applications.
ing the API with the application’s event loop. The end re-
sult is that more programmer time and effort is spent set-
ting up the session-layer plumbing than in the service’s      2    Related Work
logic itself. Our frustrations with the development and
implementation of these services at the user level were       Today’s commodity operating systems commonly al-
the prime motivation behind T ESLA and led to three ex-       low the dynamic installation of network protocols on
plicit design goals.                                          a system-wide or per-interface basis (e.g., Linux ker-
   First, it became apparent to us that the standard BSD      nel modules and FreeBSD’s netgraph), but these exten-
sockets API is not a convenient abstraction for program-      sions can only be accessed by the super-user. Some
ming session-layer services. Sockets can be duplicated        operating systems, such as SPIN [5] and the Exoker-
and shared across processes; operations on them can be        nel [12], push many operating system features (like net-
blocking or non-blocking on a descriptor-by-descriptor        work and £le system access) out from the kernel into
basis; and reads and writes can be multiplexed using sev-     application-speci£c, user-con£gurable libraries, allow-
eral different APIs (e.g., select and poll). It is undesir-   ing ordinary users £ne-grained control. Alternatively,
able to require each service author to implement the en-      extensions were developed for both operating systems to
tire sockets API. In response, T ESLA exports a higher        allow applications to de£ne application-speci£c handlers
level of abstraction to session services, allowing them to    that may be installed directly into the kernel (Plexus [14]
operate on network ¤ows (rather than simply socket de-        and ASHs [27]).
scriptors) and treat ¤ows as objects to be manipulated.          Operating systems such as Scout [21] and x-
   Second, there are many session services that are re-       kernel [16] were designed explicitly to support sophis-
quired ex post facto, often not originally thought of by      ticated network-based applications. In these systems,
the application developer but desired later by a user. For    users may even rede£ne network or transport layer proto-
example, the ability to shape or police ¤ows to conform       col functions in an application-speci£c fashion [6]. With
to a speci£ed peak rate is often useful, and being able       T ESLA, our goal is to bring some of the power of these
to do so without kernel modi£cations is a deployment          systems to commodity operating systems in the context
advantage. This requires the ability to con£gure session      of session-layer services.
services transparent to the application. To do this, T ESLA      In contrast to highly platform-dependent systems such
uses an old idea—dynamic library interposition [9]—           as U-Net [26] and Alpine [11], T ESLA does not attempt
taking advantage of the fact that most applications today     to allow users to replace or modify the system’s network
on modern operating systems use dynamically linked li-        stack. Instead, it focuses on allowing users to dynami-
braries to gain access to kernel services. This does not,     cally extend the protocol suite by dynamically compos-
however, mean that T ESLA session-layer services must         ing additional end-to-end session-layer protocols on top
be transparent. On the contrary, T ESLA allows services       of the existing transport- and network-layer protocols,
to de£ne APIs to be exported to enhanced applications.        achieving greater platform independence and usability.
   Third, unlike traditional transport and network layer         T ESLA’s modular structure, comprising directed
services, there is a great diversity in session services      graphs of processing nodes, shares commonalities with
as the examples earlier in this section show. This im-        a number of previous systems such as x-kernel, the Click
plies that application developers can bene£t from com-        modular router [19], and UNIX System V streams [22].
posing different available services to provide interesting    Unlike the transport and network protocols typically con-
new functionality. To facilitate this, T ESLA arranges for    sidered in these systems, however, we view session-layer
session services to be written as event handlers, with a      protocols as an extension of the application itself rather
callback-oriented interface between handlers that are ar-     than a system-wide resource. This ensures that they are
ranged in a graph structure in the system.                    subject to the same scheduling, protection, and resource
  We argue that a generalized architecture for the de-        constraints as the application, minimizing the amount of
velopment and deployment of session-layer functionality       effort (and privileges) required to deploy services.
will signi£cantly assist in the implementation and use of         To avoid making changes to the operating system or to
the application itself, T ESLA transparently interposes it-
                                                                    Input flow
self between the application and the kernel, intercepting        from upstream                                     Zero or more
and modifying the interaction between the application                              Flow handler                    output flows
                                                                                                                 to downstreams
and the system—acting as an interposition agent [18].
It uses dynamic library interposition [9] to modify the
interaction between the application and the system. This
technique is popular with user-level £le systems such as      Figure 1: A ¤ow handler takes as input one network ¤ow
IFS [10] and Ufo [1], and several libraries that provide      and generates zero or more output ¤ows.
speci£c, transparent network services such as SOCKS
[20], Reliable Sockets [29], and Migrate [23, 24]. Each                                                               Upstream

of these systems provides only a speci£c service, how-
ever, not an architecture usable by third parties.                   Application                  Application

   Conductor [28] traps application network operations                      f                             f
and transparently layers composable “adaptors” on TCP
                                                                 TESLA                        TESLA
connections, but its focus is on optimizing ¤ows’ per-
formance characteristics, not on providing arbitrary ad-              Encryption                  Encryption
ditional services. Similarly, Protocol Boosters [13] pro-
                                                                                                          g
poses interposing transparent agents between communi-
cation endpoints to improve performance over particu-                       g                     Migration
lar links (e.g., compression or forward error correction).
While Protocol Boosters were originally implemented
in the FreeBSD and Linux kernel, they are an excel-                                      h1     h2                   hn

lent example of a service that could be implemented in                 C library                     C library
a generic fashion using T ESLA. Thain and Livny re-
cently proposed Bypass, a dynamic-library based inter-
                                                                                                                   Downstream
position toolkit for building split-execution agents com-
monly found in distributed systems [25]. However, be-
                                                              Figure 2: Two T ESLA stacks. The encryption ¤ow han-
cause of their generality, none of these systems provides
                                                              dler implements input ¤ow f with output ¤ow g. The mi-
any assistance in building modular network services, par-
                                                              gration ¤ow handler implements input ¤ow g with output
ticularly at the session layer. To the best of our knowl-
                                                              ¤ows h 1 . . . hn .
edge, T ESLA is the £rst interposition toolkit to speci£-
cally support generic session-layer network services.
                                                              tions on the byte stream, such as transparent ¤ow migra-
                                                              tion, encryption, compression, etc.
3   Architecture                                                 A ¤ow handler, illustrated in Figure 1, is explicitly de-
                                                              £ned and constructed to operate on only one input ¤ow
As discussed previously, many tools exist that, like          from an upstream handler (or end application), and is
T ESLA, provide an interposition (or “shim”) layer be-        hence devoid of any demultiplexing operations. Concep-
tween applications and operating system kernels or li-        tually, therefore, one might think of a ¤ow handler as
braries. However, T ESLA raises the level of abstraction      dealing with traf£c corresponding to a single socket only
of programming for session-layer services. It does so by      (as opposed to an interposition layer coded from scratch,
introducing a ¤ow handler as the main object manipu-          which must potentially deal with operations on all open
lated by session services. Each session service is imple-     £le descriptors). A ¤ow handler generates zero or more
mented as an instantiation of a ¤ow handler, and T ESLA       output ¤ows, which map one-to-one to downstream han-
takes care of the plumbing required to allow session ser-     dlers (or the network send routine).1 While a ¤ow han-
vices to communicate with one another.                        dler always has one input ¤ow, multiple ¤ow handlers
   A network ¤ow is a stream of bytes that all share the      may coexist in a single process, so they may easily share
same logical source and destination (generally identi£ed      global state. (We will expound on this point as we further
by source and destination IP addresses, source and desti-     discuss the T ESLA architecture.)
nation port numbers, and transport protocol). Each ¤ow           The left stack in Figure 2 illustrates an instance of
handler takes as input a single network ¤ow, and pro-         T ESLA where stream encryption is the only enabled han-
duces zero or more network ¤ows as output. Flow han-          dler. While the application’s I/O calls appear to be
dlers perform some particular operations or transforma-       reading and writing plaintext to some ¤ow f , in reality
T ESLA intercepts these I/O calls and passes them to the          class ¤ow handler {
                                                                     protected:
encryption handler, which actually reads and writes ci-                 ¤ow handler *plumb(int domain, int type);
phertext on some other ¤ow g. The right stack in Fig-                   handler* const upstream;
ure 2 has more than two ¤ows. f is the ¤ow as viewed                    vector downstream;
by the application, i.e., plaintext. g is the ¤ow between
                                                                       public:
the encryption handler and the migration handler, i.e., ci-              // DOWNSTREAM methods
phertext. h1 , h2 , . . . , hn are the n ¤ows that a migration           virtual int connect(address);
¤ow handler uses to implement ¤ow g. (The migration                      virtual int bind(address);
                                                                         virtual pair accept();
handler initially opens ¤ow h 1 . When the host’s network
                                                                         virtual int close();
address changes and h1 is disconnected, the migration                    virtual bool write(data);
handler opens ¤ow h 2 to the peer, and so forth.) From                   virtual int shutdown(bool r, bool w);
the standpoint of the encryption handler, f is the input                 virtual int listen(int backlog);
                                                                         virtual address getsockname();
¤ow and g is the output ¤ow. From the standpoint of the                  virtual address getpeername();
migration handler, g is the input ¤ow and h 1 , h2 , . . . , hn          virtual void may avail(bool);
are the output ¤ows.                                                     virtual string getsockopt(. . .);
                                                                         virtual int setsockopt(. . .);
   Each T ESLA-enabled process has a particular handler                  virtual int ioctl(. . .);
con£guration, an ordered list of handlers which will be
used to handle ¤ows. For instance, the con£guration for                  // UPSTREAM methods (’from’ is a downstream ¤ow)
                                                                         virtual void connected(¤ow handler *from, bool success);
the application to the right in Figure 2 is “encryption;
                                                                         virtual void accept ready(¤ow handler *from);
migration.” Flows through this application will be en-                   virtual bool avail(¤ow handler *from, data);
cryption and migration-enabled.                                          virtual void may write(¤ow handler *from, bool may);
                                                                  };

3.1    Interposition                                              Figure 3: An excerpt from the ¤ow handler class de£-
                                                                  nition. address and data are C++ wrapper classes for
To achieve the goal of application transparency, T ESLA           address and data buffers, respectively.
acts as an interposition agent at the C-library level. We
provide a wrapper program, tesla, which sets up the
environment (adding our libtesla.so shared library to             instantiates the next downstream handler, namely migra-
LD PRELOAD) and invokes an application, e.g.:                     tion. Its plumb method creates a T ESLA-internal handler
                                                                  to implement an actual TCP socket connection.
      tesla +crypt -key=sec.gpg +migrate ftp mars
                                                                     To send data to a downstream ¤ow handler, a ¤ow
  This would open an FTP connection to the host named             handler invokes the latter’s write method, which in turn
mars, with encryption (using sec.gpg as the private key)          typically performs some processing and invokes its own
and end-to-end migration enabled.                                 downstream handler’s write method. Downstream han-
   We refer to the libtesla.so library as the T ESLA stub,        dlers communicate with upstream ones via callbacks,
since it contains the interposition agent but not any of the      which are invoked to make events available to the up-
handlers themselves (as we will discuss later).                   stream ¤ow.
                                                                    Flow handler methods are asynchronous and event-
                                                                  driven—each method call must return immediately,
3.2    The ¤ow handler API                                        without blocking.
Every T ESLA session service operates on ¤ows and is
implemented as a derived class of ¤ow handler, shown              3.3      Handler method semantics
in Figure 3. To instantiate a downstream ¤ow handler, a
¤ow handler invokes its protected plumb method. T ESLA            Many of the virtual ¤ow handler methods presented in
handles plumb by instantiating handlers which appear af-          Figure 3—write, connect, getpeername, etc.—have se-
ter the current handler in the con£guration.                      mantics similar to the corresponding non-blocking func-
   For example, assume that T ESLA is con£gured to per-           tion calls in the C library. (A handler class may override
form compression, then encryption, then session migra-            any method to implement it specially, i.e., to change the
tion on each ¤ow. When the compression handler’s con-             behavior of the ¤ow according to the session service the
structor calls plumb, T ESLA responds by instantiating the        handler provides.) These methods are invoked by a han-
next downstream handler, namely the encryption han-               dler’s input ¤ow. Since, in general, handlers will prop-
dler; when its constructor in turn calls plumb, T ESLA            agate these messages to their output ¤ows (i.e., down-
stream), these methods are called downstream methods          write call which returns a number of bytes or an EA-
and can be thought of as providing an abstract ¤ow ser-       GAIN error message (we will explain this design deci-
vice to the upstream handler.                                 sion shortly). Our interface lacks a read method; rather,
  (One downstream method has no C-library analogue:           we use a callback, avail, which is invoked by a down-
may avail informs the handler whether or not it may           stream handler whenever bytes are available. The up-
make any more data available to its upstream handler.         stream handler handles the bytes immediately, generally
We further discuss this method in Section 3.5.)               by performing some processing on the bytes and passing
                                                              them to its own upstream handler.
   In contrast the £nal four methods are invoked by the
handler’s output ¤ows; each has a from argument identi-          A key difference between ¤ow handler semantics and
fying which downstream handler is invoking the method.        the C library’s I/O semantics is that, in ¤ow handlers,
Since typically handlers will propagate these messages to     writes and avails are guaranteed to complete. In partic-
their input ¤ow (i.e., upstream), these methods are called    ular, a write or avail call returns false, indicating fail-
upstream methods and can be thought of callbacks which        ure, only when it will never again be possible to write
are invoked by the upstream handler.                          to or receive bytes from the ¤ow (similar to the C li-
                                                              brary returning 0 for write or read). Contrast this to the
  • connected is invoked when a connection request            C library’s write and read function calls which (in non-
    (i.e., a connect method invocation) on a downstream       blocking mode) may return an EAGAIN error, requiring
    has completed. The argument is true if the request        the caller to retry the operation later. Our semantics make
    succeeded or false if not.                                handler implementation considerably simpler, since han-
                                                              dlers do not need to worry about handling the common
  • accept ready is invoked when listen has been called       but dif£cult situation where a downstream handler is un-
    and a remote host attempts to establish a connec-         able to accept all the bytes it needs to write.
    tion. Typically a ¤ow handler will respond to this
                                                                 Consider the simple case where an encryption han-
    by calling the downstream ¤ow’s accept method to
                                                              dler receives a write request from an upstream handler,
    accept the connection.
                                                              performs stream encryption on the bytes (updating its
  • avail is invoked when a downstream handler has re-        state), and then attempts to write the encrypted data to the
    ceived bytes. It returns false if the upstream handler    downstream handler. If the downstream handler could re-
    is no longer able to receive bytes (i.e., the connec-     turn EAGAIN, the handler must buffer the unwritten en-
    tion has shut down for reading).                          crypted bytes, since by updating its stream encryption
                                                              state the handler has “committed” to accepting the bytes
  • may write, analogous to may avail, informs the han-
                                                              from the upstream handler. Thus the handler would have
    dler whether or not it may write any more data
                                                              to maintain a ring buffer for the unwritten bytes and reg-
    downstream. This method is further discussed in
                                                              ister a callback when the output ¤ow is available for writ-
    Section 3.5.
                                                              ing.
   Many handlers, such as the encryption handler, per-           Our approach (guaranteed completion) bene£ts from
form only simple processing on the data ¤ow, have ex-         the observation that given upstream and downstream
actly one output ¤ow for one input ¤ow, and do not mod-       handlers that support guaranteed completion, it is easy to
ify the semantics of other methods such as connect or         write a handler to support guaranteed completion.2 This
accept. For this reason we provide a default implemen-        is an inductive argument, of course—there must be some
tation of each method which simply propagates method          blocking mechanism in the system, i.e., completion can-
calls to the downstream handler (in the case of down-         not be guaranteed everywhere. Our decision to impose
stream methods) or the upstream handler (in the case of       guaranteed completion on handlers isolates the complex-
upstream methods).                                            ity of blocking in the implementation of T ESLA itself (as
                                                              described in Section 5.1), relieving handler authors of
   We further discuss the input and output primitives,
                                                              having to deal with multiplexing between different ¤ows
timers, and blocking in the next few sections.
                                                              (e.g., with select).3

3.4    Input/output semantics
                                                              3.5    Timers and ¤ow control
Since data input and output are the two fundamental op-
erations on a ¤ow, we shall describe them in more de-         As we have mentioned before, T ESLA handlers are
tail. The write method call writes bytes to the ¤ow. It re-   event-driven, hence all ¤ow handler methods must return
turns a boolean indicating success, unlike the C library’s    immediately. Clearly, however, some ¤ow control mech-
anism is required in ¤ow handlers—otherwise, in an ap-       3.7    Process management
plication where the network is the bottleneck, the appli-
cation could submit data to T ESLA faster than T ESLA
could ship data off to the network, requiring T ESLA to      Flow handlers comprise executable code and runtime
buffer a potentially unbounded amount of data. Sim-          state, and in designing T ESLA there were two obvious
ilarly, if the application were the bottleneck, T ESLA       choices regarding the context within which the ¤ow han-
would be required to buffer all the data ¤owing upstream     dler code would run. The simplest approach would place
from the network until the application was ready to pro-     ¤ow handlers directly within the address space of appli-
cess it.                                                     cation processes; T ESLA would delegate invocations of
                                                             POSIX I/O routines to the appropriate ¤ow handler meth-
   A ¤ow handler may signal its input ¤ow to stop send-
                                                             ods. Alternatively, ¤ow handlers could execute in sepa-
ing it data, i.e., stop invoking its write method, by in-
                                                             rate processes from the application; T ESLA would man-
voking may write(false); it can later restart the handler
                                                             age the communication between application processes
by invoking may write(true). Likewise, a handler may
                                                             and ¤ow handler processes.
throttle an output ¤ow, preventing it from calling avail,
by invoking may avail(false). Flow handlers are required        The former, simpler approach would not require any
to respect may avail and may write requests from down-       interprocess communication or context switching, mini-
stream and upstream handlers. This does not compli-          mizing performance degradation. Furthermore, this ap-
cate our guaranteed-completion semantics, since a han-       proach would ensure that each ¤ow handler has the same
dler which (like most) lacks a buffer in which to store      process priority as the application using it, and that there
partially-processed data can simply propagate may writes     are no inter-handler protection problems to worry about.
upstream and may avails downstream to avoid receiving           Unfortunately, this approach proves problematic in
data that it cannot handle.                                  several important cases. First, some session services
   A handler may need to register a time-based call-         (e.g., shared congestion management as in CM) require
back, e.g., to re-enable data ¤ow from its upstream or       state to be shared across ¤ows that may belong to differ-
downstream handlers after a certain amount of time has       ent application processes, and making each ¤ow handler
passed. For this we provide a timer facility allowing han-   run linked with the application would greatly complicate
dlers to instruct the T ESLA event loop to invoke a call-    the ability to share session-layer state across them. Sec-
back at a particular time.                                   ond, signi£cant dif£culties arise if the application pro-
                                                             cess shares its ¤ows (i.e., the £le descriptors correspond-
                                                             ing to the ¤ows) with another process, either by creat-
3.6     Handler-speci£c services                             ing a child process through fork or through £le descrip-
                                                             tor passing. Ensuring the proper sharing semantics be-
Each ¤ow handler exposes (by de£nition) the                  comes dif£cult and expensive. A fork results in two iden-
¤ow handler interface, but some handlers may need            tical T ESLA instances running in the parent and child,
to provide additional services to applications that wish     making it rather cumbersome to ensure that they coordi-
to support them speci£cally (but still operate properly if   nate correctly. Furthermore, maintaining the correct se-
T ESLA or the particular handler is not available or not     mantics of asynchronous I/O calls like select and poll is
enabled). For instance, the end-to-end migration handler     very dif£cult in this model.
can “freeze” an application upon network disconnection,         For these reasons, we choose to separate the context
preserving the process’s state and re-creating it upon       in which T ESLA ¤ow handlers run from the application
reconnection. To enable this functionality we introduced     processes that generate the corresponding ¤ows. When
the ioctl method to ¤ow handler. A “Migrate-aware”           an application is invoked through the tesla wrapper (and
application (or, in the general case, a “T ESLA-aware”       hence linked against the stub), the stub library creates
application) may use the ts ioctl macro to send a control    or connects to a master process, a process dedicated to
message to a handler and receive a response:                 executing handlers.
      struct freeze params t params = { . . . };                Each master process has its own handler con£gura-
      struct freeze return t ret;                            tion (determined by the user, as we shall describe later).
      int ret = ts ioctl(fd, “migrate”, MIGRATE FREEZE,      When the application establishes a connection, the mas-
          &params, sizeof params, &ret, sizeof ret);         ter process determines whether the con£guration con-
   We also provide a mechanism for handlers to send          tains any applicable handlers. If so, the master instan-
events to the application asynchronously; e.g., the migra-   tiates the applicable handlers and links them together in
tion handler can be con£gured to notify the application      what we call a T ESLA instance.
when a ¤ow has been migrated between IP addresses.             The application and master process communicate via
Application Process                    Master Process #1                       Master Process #2

                                               Instance #1A                             Instance #2A

              Unmodified    stub
              Application                                     E                              CM
                                                   M
                                                                                                            stub
                                                              E
                                                                                             CM
                                                                     stub
                              to C library
                                               Instance #1B
                                                                                             CM

                                                   M          E

                                                                                                              to C library
                                                                                               Shared CM state
                                                                                       CM flow across flows
                                                                                       handlers
      Upstream                                                                                            Downstream

Figure 4: Possible interprocess data ¤ow for an application running under T ESLA. M is the migration handler, E is
encryption, and CM is the congestion manager handler.

a T ESLA-internal socket for each application-level ¤ow,          ager handler) must be trusted.
and all actual network operations are performed by the               In general, T ESLA support for setuid and setgid ap-
master process. When the application invokes a socket             plications, which assume the protection context of the
operation, T ESLA traps it and converts it to a message           binary iself rather than the user who invokes them, is a
which is transmitted to the master process via the mas-           tricky affair: one cannot trust a user-provided handler
ter socket. The master process returns an immediate re-           to behave properly within the binary’s protection con-
sponse (this is possible since all T ESLA handler opera-          text. Even if the master process (and hence the ¤ow
tions are non-blocking, i.e., return immediately).                handlers) run within the protection context of the user,
   Master processes are forked from T ESLA-enabled ap-            user-provided handlers might still modify network se-
plications and are therefore linked against the T ESLA            mantics to change the behavior of the application. Con-
stub also. This allows T ESLA instances to be chained, as         sider, for example, a setuid binary which assumes root
in Figure 4: an application-level ¤ow may be connected            privileges, performs a lookup over the network to de-
to an instance in a £rst master process, which is in turn         termine whether the user is a system administrator and,
connected via its stub to an instance in another master           if so, allows the user to perform an administrative op-
process. Chaining enables some handlers to run in a pro-          eration. If a malicious user provides ¤ow handlers that
tected, user-speci£c context, whereas handlers that re-           spoof a positive response for the lookup, the binary may
quire a system-scope (like system-wide congestion man-            be tricked into granting him or her administrative privi-
agement) can run in a privileged, system-wide context.            leges.
                                                                    Nevertheless, many widely-used applications (e.g.,
                                                                  ssh) are setuid, so we do require some way to support
3.8     Security considerations                                   them. We can make the tesla wrapper setuid root, so that
                                                                  it is allowed to add libtesla.so to the LD PRELOAD envi-
Like most software components, ¤ow handlers can po-               ronment variable even for setuid applications. The tesla
tentially wreak havoc within their protection context if          wrapper then invokes the requested binary, linked against
they are malicious or buggy. Our chaining approach al-            libtesla.so, with the appropriate privileges. libtesla.so
lows ¤ow handlers to run within the protection context            creates a master process and begins instantiating han-
of the user to minimize the damage they can do; but in            dlers only once the process resets its effective user ID
any case, ¤ow handlers must be trusted within their pro-          to the user’s real user ID, as is the case with applications
tection contexts. Since a malicious ¤ow handler could             such as ssh. This ensures that only network connections
potentially sabotage all connections in the same master           within the user’s protection context can be affected (ma-
process, any ¤ow handlers in master processes intended            liciously or otherwise) by handler code.
to have system-wide scope (such as a Congestion Man-
4     Example handlers                                         class crypt handler : public ¤ow handler {
                                                                   des3 cfb64 stream in stream, out stream;

We now describe a few ¤ow handlers we have imple-                   public:
mented. Our main goal is to illustrate how non-trivial                crypt handler(init context& ctxt) : ¤ow handler(ctxt),
                                                                        in stream(des3 cfb64 stream::ENCRYPT),
session services can be implemented easily with T ESLA,                 out stream(des3 cfb64 stream::DECRYPT)
showing how its ¤ow-oriented API is both convenient                   {}
and useful. We describe our handlers for transparent use
of a SOCKS proxy, traf£c shaping, encryption, compres-                bool write(data d) {
                                                                          // encrypt and pass downstream
sion, and end-to-end ¤ow migration.                                       return downstream[0]−>write(out stream.process(d));
   The compression, encryption, and ¤ow migration han-                }
dlers require a similarly con£gured T ESLA installation               bool avail(¤ow handler *from, data d) {
on the peer endpoint, but the remaining handlers are use-                 // decrypt and pass upstream
ful even when communicating with a remote application                     return upstream−>avail(this, in stream.process(d));
that is not T ESLA-enabled.                                           }
                                                               };

                                                                Figure 5: A transparent encryption/decryption handler.
4.1    Traf£c shaping
It is often useful to be able to limit the maximum
                                                               host. When its connected method is invoked, it does not
throughput of a TCP connection. For example, one
                                                               pass it upstream, but rather negotiates the authentication
might want prevent a background network operation
                                                               mechanism with the server and then passes it the actual
such as mirroring a large software distribution to im-
                                                               destination address, as speci£ed by the SOCKS protocol.
pact network performance. T ESLA allows us to provide
a generic, user-level traf£c-shaping service as an alterna-       If the SOCKS server indicates that it was able to es-
tive to building shaping directly into an application (as in   tablish the connection with the remote host, then the
the rsync --bwlimit option) or using an OS-level packet-       SOCKS handler invokes connected(true) on its upstream
£ltering service (which would require special setup and        handler; if not, it invokes connected(false). The upstream
superuser privileges).                                         handler, of course, is never aware that the connection is
                                                               physically to the SOCKS server (as opposed to the desti-
   The traf£c shaper keeps count of the number of bytes it
                                                               nation address originally provided to connect).
has read and written during the current 100-millisecond
timeslice. If a write request would exceed the outbound           We utilize the SOCKS server to provide transparent
limit for the timeslice, then the portion of the data which    support for Resilient Overlay Networks [3], or RON, an
could not be written is saved, and the upstream han-           architecture allowing a group of hosts to route around
dler is throttled via may write. A timer is created to no-     failures or inef£ciencies in a network. We provide a
tify the shaper at the end of the current timeslice so it      ron handler to allow a user to connect to a remote host
can continue writing and unthrottle the upstream handler       transparently through a RON. Each RON server exports
when appropriate. Similarly, once the inbound bytes-per-       a SOCKS interface, so ron handler can use the same
timeslice limit is met, any remaining data provided by         mechanism as socks handler to connect to a RON host
avail is buffered, downstream ¤ows are throttled, and a        as a proxy and open an indirect connection to the remote
timer is registered to continue reading later.                 host. In the future, ron handler may also utilize the end-
                                                               to-end migration of migrate handler (presented below in
   We have also developed latency handler, which de-
                                                               Section 4.4) to enable RON to hand off the proxy con-
lays each byte supplied to or by the network for a user-
                                                               nection to a different node if it discover a more ef£cient
con£gurable amount of time (maintaining this data in an
                                                               route from the client to the peer.
internal ring buffer). This handler is useful for simulating
the effects of network latency on existing applications.

                                                               4.3       Encryption and compression
4.2    SOCKS and application-level routing
                                                               crypt handler is a triple-DES encryption handler for TCP
Our SOCKS handler is functionally similar to existing          streams. We use OpenSSL [8] to provide the DES im-
transparent SOCKS libraries [7, 17], although its imple-       plementation. Figure 5 shows how crypt handler uses
mentation is signi£cantly simpler. Our handler overrides       the T ESLA ¤ow handler API. Note how simple the
the connect handler to establish a connection to a proxy       handler is to write: we merely provide alternative im-
server, rather than a connection directly to the requested     plementations for the write and avail methods, routing
data £rst through a DES encryption or decryption step         tion creates a socket, T ESLA intercepts the socket library
(des3 cfb64 stream, a C++ wrapper we have written for         call and sends a message to the master process, inquir-
OpenSSL functionality).                                       ing whether the master has any handlers registered for
   Our compression handler is very similar: we merely         the requested domain and type. If not, the master returns
wrote a wrapper class (analogous to des3 cfb64 stream)        a negative acknowledgement and the application process
for the freely available zlib stream compression library.     simply uses the socket system call to create and return
                                                              the request socket.
                                                                  If, on the other hand, there is a handler registered for
4.4    Session migration                                      the requested domain and type, the master process in-
                                                              stantiates the handler class. Once the handler’s construc-
We have used T ESLA to implement transparent support
                                                              tor returns, the master process creates a pair of connected
in the Migrate session-layer mobility service [23]. In
                                                              UNIX-domain sockets, one retained in the master and the
transparent mode, Migrate preserves open network con-
                                                              other passed to the application process. The application
nections across changes of address or periods of discon-
                                                              process notes the received £lehandle (remembering that
nection. Since TCP connections are bound to precisely
                                                              it is a T ESLA-wrapped £lehandle) and returns it as result
one remote endpoint and do not survive periods of dis-
                                                              of the socket call.
connection in general, Migrate must synthesize a logi-
cal ¤ow out of possibly multiple physical connections (a         Later, when the application invokes a socket API call,
new connection must be established each time either end-      such as connect or getsockname, on a T ESLA-wrapped
point moves or reconnects). Further, since data may be        £lehandle, the wrapper function informs the master pro-
lost upon connection failure, Migrate must double-buffer      cess, which invokes the corresponding method on the
in-¤ight data for possible re-transmission.                   handler object for that ¤ow. The master process returns
                                                              this result to the stub, which returns the result to the ap-
   This basic functionality is straightforward to provide
                                                              plication.
in T ESLA. We simply create a handler that splices its
input ¤ow to an output ¤ow and conceals any mobil-               In general the T ESLA stub does not need to specially
ity events from the application by automatically initiat-     handle reads, writes, or multiplexing on wrapped sock-
ing a new output ¤ow using the new endpoint locations.        ets. The master process and application process share
The handler stores a copy of all outgoing data locally        a socket for each application-level ¤ow, so to handle a
in a ring buffer and re-transmits any lost bytes after re-    read, write, or select, whether blocking or non-blocking,
establishing connectivity on a new output ¤ow. Incoming       the application process merely uses the corresponding
data is trickier: because received data may not have been     unwrapped C library call. Even if the operation blocks,
read by the application before a mobility event occurs,       the master process can continue handling I/O while the
Migrate must instantiate a new ¤ow immediately after          application process waits.
movement but continue to supply the buffered data from           Implementing fork, dup, and dup2 is quite simple.
the previous ¤ow to the application until it is completely    Having a single ¤ow available to more than one appli-
consumed. Only then will the handler begin delivering         cation process, or as more than one £lehandle, presents
data received on the subsequent ¤ow(s).                       no problem: application processes can use each copy of
   Note that because Migrate wishes to conceal mobility       the £lehandle as usual. Once all copies of the £lehan-
when operating in transparent mode, it is important that      dle are shut down, the master process can simply detect
the handler be able to override normal signaling mecha-       via a signal that the £lehandle has closed and invokes the
nisms. In particular, it must intercept connection failure    handler’s close method.
messages (the “connection reset” messages typically ex-
perienced during long periods of disconnection) and pre-
vent them from reaching the application, instead taking       5.1    top handler and bottom handler
appropriate action to manage and conceal the changes in
endpoints. In doing so, the handler also overrides the get-   We have described at length the interface between han-
sockname() and getpeername() calls to return the origi-       dlers, but we have not yet discussed how precisely han-
nal endpoints, irrespective of the current location.          dlers receives data and events from the application. We
                                                              have stated that every handler has an upstream ¤ow
                                                              which invokes its methods such as connect, write, etc.;
5     Implementation                                          the upstream ¤ow of the top-most handler for each ¤ow
                                                              (e.g., the encryption handler in Figure 2) is a special ¤ow
The T ESLA stub consists largely of wrapper functions for     handler called top handler.
sockets API functions in the C library. When an applica-        When a T ESLA master’s event loop detects bytes ready
to deliver to a particular ¤ow via the application/master-     service implementation would perform similarly to the
process socket pair which exists for that ¤ow, it invokes      application-internal agent.
the write method of top handler, which passes the write           We provide two benchmarks. In bandwidth(s), a client
call on to the £rst real ¤ow handler (e.g., encryption).       establishes a TCP connection to a server and reads data
Similarly, when a connect, bind, listen, or accept mes-        into a s-byte buffer until the connection closes, measur-
sage appears on the master socket, the T ESLA event loop       ing the number of bytes per second received. (The server
invokes the corresponding method of the top handler for        writes data s bytes at a time as well.) In latency, a client
that ¤ow.                                                      connects to a server, sends it a single byte, waits for an
   Similarly, each ¤ow handler at the bottom of the            1-byte response, and so forth, measuring the number of
T ESLA stack has a bottom handler as its downstream.           round-trip volleys per second.
For each non-callback method—i.e., connect, write,                We run bandwidth and latency under several con£gu-
etc.—bottom handler actually performs the correspond-          rations:
ing network operation.
  top handlers and bottom handlers maintain buffers, in         1. The benchmark program running (a) unmodi£ed
case a handler writes data to the application or network,          and (b) with triple-DES encryption performed di-
but the underlying system buffer is full (i.e., sending data       rectly within the benchmark application.
asynchronously to the application or network results in
                                                                2. The benchmark program, with each connection
an EAGAIN condition). If this occurs in a top handler, the
                                                                   passing through TCP proxy servers both on the
top handler requests via may avail that its downstream
                                                                   client and server hosts. In (a), the proxy servers
¤ow stop sending it data using avail; similarly, if this oc-
                                                                   perform no processing on the data; in (b) the proxy
curs in a bottom handler, it requests via may write that its
                                                                   servers encrypt and decrypt traf£c between the two
upstream ¤ow stop sending it data using write.
                                                                   hosts.
                                                                3. Same as 2(a) and 2(b), except with proxy servers
5.2    Performance                                                 listening on UNIX-domain sockets rather than TCP
                                                                   sockets. This is similar to the way T ESLA works in-
When adding a network service to an application, we                ternally: the proxy server corresponds to the T ESLA
may expect to incur one or both of two kinds of per-               master process.
formance penalties. First, the service’s algorithm, e.g.,
                                                                4. The benchmark program running under T ESLA with
encryption or compression, may require computational
                                                                   (a) the dummy handler enabled or (b) the crypt han-
resources. Second, the interposition mechanism, if
                                                                   dler enabled. (dummy is a simple no-op handler that
any, may introduce overhead such as additional mem-
                                                                   simply passes data through.)
ory copies, interprocess communication, scheduling con-
tention, etc., depending on its implementation. We refer          We can think of the (a) con£gurations as providing a
to these two kinds of performance penalty as algorithmic       completely trivial transparent service, an “identity ser-
and architectural, respectively.                               vice” with no algorithm (and hence no algorithmic over-
   If the service is implemented directly within the appli-    head) at all. Comparing benchmark 1(a) with the other
cation, e.g., via an application-speci£c input/output indi-    (a) con£gurations isolates the architectural overhead of a
rection layer, then the architectural overhead is probably     typical proxy server (i.e., an extra pair of socket streams
minimal and most of the overhead is likely to be algorith-     through which all data must ¤ow) and the architectural
mic. Services implemented entirely within the operating        overhead of T ESLA (i.e., an extra pair of socket streams
system kernel are also likely to impose little in the way      plus any overhead incurred by the master processes).
of architectural overhead. However, adding an interpo-         Comparing the (a) benchmarks, shown on the left of Fig-
sition agent like T ESLA outside the contexts of both the      ure 6, with the corresponding (b) benchmarks, shown on
application and the operating system may add signi£cant        the right, isolates the algorithmic overhead of the service.
architectural overhead.                                           To eliminate the network as a bottleneck, we £rst ran
   To analyze T ESLA’s architectural overhead, we con-         the bandwidth test on a single host (a 500-MHz Pentium
structed several simple tests to test how using T ESLA af-     III running Linux 2.4.18) with the client and server con-
fects latency and throughput. We compare T ESLA’s per-         necting over the loopback interface. Figure 6 shows these
formance to that of a tunneling, or proxy, agent, where        results. Here the increased demand of interprocess com-
the application explicitly tunnels ¤ows through a local        munication is apparent. For large block sizes (which we
proxy server, and to an application-internal agent. Al-        would expect in bandwidth-intensive applications), in-
though we do not consider it here, we expect an in-kernel      troducing TCP proxies, and hence tripling the number
100                                                                                                 3
                                   Internal                                                                                              Internal

                                                                                                      3-DES Throughput (MB/s)
                                   TCP Proxy                                                                                             TCP Proxy

    Null Throughput (MB/s)
                             80
                                   UNIX Proxy                                                                                            UNIX Proxy
                                   TESLA                                                                                         2       TESLA
                             60

                             40
                                                                                                                                 1

                             20

                              0                                                                                                  0
                                   1          16   256          4K      16K   64K                                                        1          16     256          4K       16K     64K
                                                   Block size (bytes)                                                                                      Block size (bytes)

                                                   Figure 6: Results of the bandwidth test on a loopback interface.

of TCP socket streams involved, reduces the throughput
by nearly two thirds. T ESLA does a little better, since it
                                                                                                                           15
uses UNIX-domain sockets instead of TCP sockets.                                                                                         Internal
                                                                                                                                         TCP Proxy

                                                                                      Null Throughput (MB/s)
   In contrast, Figure 7 shows the results of running
                                                                                                                                         UNIX Proxy
bandwidth with the client and server (both 500-MHz Pen-
                                                                                                                           10            TESLA
tium IIIs) connected via a 100-BaseT network.4 Here
neither T ESLA nor a proxy server incurs a signi£cant
throughput penalty. For reasonably large block sizes ei-
                                                                                                                                5
ther the network (at about 89 Mb/s, or 11 MB/s) or the
encryption algorithm (at about 3 MB/s) becomes the bot-
tleneck; for small block sizes the high frequency of small
                                                                                                                                0
application reads and writes is the bottleneck.                                                                                          1          16       256           4K      16K         64K
                                                                                                                                                            Block size (bytes)
   We conclude that on relatively slow machines and very
fast networks, T ESLA—or any other IPC-intensive in-
terposition mechanism—may cause a decrease in peak                                   Figure 7: Results of the bandwidth test on a 100-BaseT
throughput, but when used over typical networks, or                                  network. Only the null test is shown; triple-DES perfor-
when used to implement nontrivial network services,                                  mance was similar to performance on a loopback inter-
T ESLA causes little or no throughput reduction.                                     face (the right graph of Figure 6).

   The latency benchmark yields different results, as il-
lustrated in Figure 8: since data rates never exceed a few
kilobytes per second, the computational overhead of the
algorithm itself is negligible and any slowdown is due
exclusively to the architectural overhead. We see almost                                                                        8000                                                     Internal
no difference between the speed of the trivial service and                                                                                                                               TCP Proxy
                                                                                      Round-trips per second

the nontrivial service. Introducing a TCP proxy server on                                                                                                                                UNIX Proxy
                                                                                                                                6000                                                     TESLA
each host incurs a noticeable performance penalty, since
data must now ¤ow over a total of three streams (rather
than one) from client application to server application.                                                                        4000
T ESLA does a little better, as it uses UNIX-domain sock-
ets rather than TCP sockets to transport data between                                                                           2000
the applications and the T ESLA masters; under Linux,
UNIX-domain sockets appear to have a lower latency
than the TCP sockets used by the proxy server. We con-                                                                               0
                                                                                                                                                    Null                         3-DES
sider this performance hit quite acceptable: T ESLA in-                                                                                                        Handler type
creases the end-to-end latency by only tens of microsec-
onds (4000 round-trips per second versus 7000), whereas                              Figure 8: Results of the latency test over a 100-BaseT
network latencies are typically on the order of millisec-                            network. The block size is one byte.
onds or tens of milliseconds.
6   Conclusion                                                ship, by Intel Corp., and by IBM Corp. under a university
                                                              faculty award.
This paper has outlined the design and implementation           The authors would like to thank Dave Andersen, Stan
of T ESLA, a framework to implement session layer ser-        Rost, and Michael Wal£sh of MIT for their comments.
vices in the Internet. These services are gaining in impor-
tance; examples include connection multiplexing, con-
gestion state sharing, application-level routing, mobil-      References
ity/migration support, compression, and encryption.
                                                               [1] A LEXANDROV, A. D., I BEL , M., S CHAUSER , K. E.,
   T ESLA incorporates three principles to provide a               AND S CHEIMAN , C. J. Ufo: A personal global £le sys-
transparent and extensible framework: £rst, it exposes             tem based on user-level extensions to the operating sys-
network ¤ows as the object manipulated by session ser-             tem. ACM TOCS 16, 3 (Aug. 1998), 207–233.
vices, which are written as ¤ow handlers without deal-         [2] A LLMAN , M., K RUSE , H., AND O STERMANN , S. An
ing with socket descriptors; second, it maps handlers to           application-level solution to TCP’s inef£ciencies. In
processes in a way that enables both sharing and protec-           Proc. WOSBIS ’96 (Nov. 1996).
tion; and third, it can be con£gured using dynamic li-         [3] A NDERSEN , D. G., BALAKRISHNAN , H., K AASHOEK ,
brary interposition, thereby being transparent to end ap-          M. F., AND M ORRIS , R. T. Resilient overlay networks.
plications.                                                        In Proc. ACM SOSP ’01 (Oct. 2001), pp. 131–145.
   We showed how T ESLA can be used to design several          [4] BALAKRISHNAN , H., R AHUL , H. S., AND S ESHAN , S.
interesting session layer services including encryption,           An integrated congestion management architecture for In-
SOCKS and application-controlled routing, ¤ow migra-               ternet hosts. In Proc. ACM SIGCOMM ’99 (Sept. 1999),
tion, and traf£c rate shaping, all with acceptably low per-        pp. 175–187.
formance degradation. In T ESLA, we have provided a            [5] B ERSHAD , B. N., S AVAGE , S., PARDYAK , P., S IRER ,
powerful, extensible, high-level C++ framework which               E. G., B ECKER , D., F IUCZYNSKI , M., C HAMBERS , C.,
makes it easy to develop, deploy, and transparently use            AND E GGERS , S. Extensibility, safety and performance
                                                                   in the SPIN operating system. In Proc. ACM SOSP ’95
session-layer services. We are pleased to report that other
                                                                   (Dec. 1995), pp. 267–284.
researchers have recently found T ESLA useful in deploy-
ing their own session layer services, and even adopted it      [6] B HATTI , N. T., AND S CHLICHTING , R. D. A system for
as a teaching tool.                                                constructing con£gurable high-level protocols. In Proc.
                                                                   ACM SIGCOMM ’95 (Aug. 1995), pp. 138–150.
   Currently, to use T ESLA services such as compression,
                                                               [7] C LOWES , S. tsocks: A transparent SOCKS proxying li-
encryption, and migration that require T ESLA support on           brary. http://tsocks.sourceforge.net/.
both endpoints, the client and server must use identical
                                                               [8] C OX , M. J., E NGELSCHALL , R. S., H ENSON , S., AND
con£gurations of T ESLA, i.e., invoke the tesla wrapper
                                                                   L AURIE , B. Openssl: The open source toolkit for
with the same handlers speci£ed on the command line.               SSL/TLS. http://www.openssl.org/.
We plan to add a negotiation mechanism so that once
                                                               [9] C URRY, T. W. Pro£ling and tracing dynamic library us-
mutual T ESLA support is detected, the endpoint can dy-
                                                                   age via interposition. In Proc. Summer USENIX ’94 (June
namically determine the con£guration based on the in-              1994), pp. 267–278.
tersection of the sets of supported handlers. We also plan
                                                              [10] E GGERT, P. R., AND PARKER , D. S. File systems in user
to add per-¤ow con£guration so that the user can specify
                                                                   space. In Proc. Winter USENIX ’93 (Jan. 1993), pp. 229–
which handlers to apply to ¤ows based on ¤ow attributes            240.
such as port and address. Finally, we plan to explore the
                                                              [11] E LY, D., S AVAGE , S., AND W ETHERALL , D. Alpine:
possibility of moving some handler functionality directly
                                                                   A user-level infrastructure for network protocol develop-
into the application process or operating system kernel to         ment. In Proc. 3rd USITS (Mar. 2001), pp. 171–183.
minimize T ESLA’s architectural overhead.
                                                              [12] E NGLER , D. R., K AASHOEK , M. F., AND O’T OOLE
  T ESLA is available for download at http://nms.lcs.              J R ., J. Exokernel: An operating system architecture for
mit.edu/software/tesla/.                                           application-level resource management. In Proc. ACM
                                                                   SOSP ’95 (Dec. 1995), pp. 251–266.
                                                              [13] F ELDMEIER , D. C., M C AULEY, A. J., S MITH , J. M.,
Acknowledgments                                                    BAKIN , D. S., M ARCUS , W. S., AND R ALEIGH , T. M.
                                                                   Protocol boosters. IEEE JSAC 16, 3 (Apr. 1998), 437–
This work was funded by NTT Inc. under the NTT-MIT                 444.
research collaboration, by Acer Inc., Delta Electronics       [14] F IUCZYNSKI , M. E., AND B ERSHAD , B. N. An extensi-
Inc., HP Corp., NTT Inc., Nokia Research Center, and               ble protocol architecture for application-speci£c network-
Philips Research under the MIT Project Oxygen partner-             ing. In Proc. USENIX ’96 (Jan. 1996), pp. 55–64.
[15] G EVROS , P., R ISSO , F., AND K IRSTEIN , P. Analysis        Notes
     of a method for differential TCP service. In Proc. IEEE
     GLOBECOM ’99 (Dec. 1999), pp. 1699–1708.                         1
                                                                        It is important not to take the terms upstream and
[16] H UTCHINSON , N. C., AND P ETERSON , L. L. The x-             downstream too literally in terms of the ¤ow of actual
     kernel: An architecture for implementing network proto-       bytes; rather, think of T ESLA as the session layer on the
     cols. IEEE Transactions on Software Engineering 17, 1
                                                                   canonical network stack, with upstream handlers placed
     (Jan. 1991), 64–76.
                                                                   closer to the presentation or application layer and down-
[17] I NFERNO N ETTVERK A/S. Dante: A free SOCKS im-               stream handlers closer to the transport layer.
     plementation. http://www.inet.no/dante/.
                                                                      2
[18] J ONES , M. B. Interposition agents: Transparently inter-          The inverse—“given upstream and downstream han-
     posing user code at the system interface. In Proc. ACM        dlers that do not support guaranteed completion, it is
     SOSP ’93 (Dec. 1993), pp. 80–93.                              easy to write a handler that does not support guaranteed
[19] KOHLER , E., M ORRIS , R., C HEN , B., JANNOTTI , J.,         completion”—is not true, for the reason described in the
     AND K AASHOEK , M. F. The Click modular router. ACM           previous paragraph.
     TOCS 18, 3 (Aug. 2000), 263–297.
                                                                       3
                                                                         The guaranteed-completion semantics we discuss in
[20] L EECH , M., G ANIS , M., L EE , Y., K URIS , R., KOBLAS ,
                                                                   this section apply to the T ESLA ¤ow handler API only,
     D., AND J ONES , L. SOCKS Protocol Version 5. IETF,
     Mar. 1996. RFC 1928.
                                                                   not to standard socket I/O functions (read, write, etc.).
                                                                   T ESLA makes sure that the semantics of POSIX I/O
[21] M OSBERGER , D., AND P ETERSON , L. L. Making paths           functions remain unchanged, for transparency’s sake.
     explicit in the Scout operating system. In Proc. OSDI ’96
     (Oct. 1996), pp. 153–167.                                        4
                                                                        Note the anomaly with small block sizes: on our
[22] R ICHIE , D. M. A stream input-output system. AT&T Bell       test machines, using a UNIX-domain socket to connect
     Laboratories Technical Journal 63, 8 (Oct. 1984), 1897–       the application to the intermediary process (whether a
     1910.                                                         proxy server or the T ESLA master) incurs a severe per-
[23] S NOEREN , A. C. A Session-Based Architecture for In-         formance hit. Recon£guring T ESLA to use TCP sockets
     ternet Mobility. PhD thesis, Massachusetts Institute of       internally (rather than UNIX-domain sockets) increases
     Technology, Dec. 2002.                                        T ESLA’s performance to that of the TCP proxy server.
[24] S NOEREN , A. C., BALAKRISHNAN , H., AND                      Since this anomaly does not seem speci£c to T ESLA,
     K AASHOEK , M. F. Reconsidering Internet Mobil-               and occurs only in a somewhat unrealistic situation—a
     ity. In Proc. HotOS-VIII (May 2001), pp. 41–46.               high-bandwidth application using a 1- or 16-byte block
[25] T HAIN , D., AND L IVNY, M. Multiple bypass: Interposi-       size—we do not examine it further.
     tion agents for distributed computing. Cluster Computing
     4, 1 (Mar. 2001), 39–47.
[26]   VON   E ICKEN , T., BASU , A., B UCH , V., AND VOGELS ,
       W. U-Net: A user-level network interface for parallel and
       distributed computing. In Proc. ACM SOSP ’95 (Dec.
       1995), pp. 40–53.
[27] WALLACH , D. A., E NGLER , D. R., AND K AASHOEK ,
     M. F. ASHs: Application-speci£c handlers for high-
     performance messaging. In Proc. ACM SIGCOMM ’96
     (Aug. 1996), pp. 40–52.
[28] YARVIS , M., R EIHER , P., AND P OPEK , G. J. Conductor:
     A framework for distributed adaptation. In Proc. HotOS-
     VII (Mar. 1999), pp. 44–51.
[29] Z ANDY, V. C., AND M ILLER , B. P. Reliable network
     connections. In Proc. ACM/IEEE Mobicom ’02 (Atlanta,
     Georgia, Sept. 2002), pp. 95–106.
You can also read