Mono versus .Net: A Comparative Study of Performance for Distributed Processing

Page created by Megan Chandler
 
CONTINUE READING
Mono versus .Net: A Comparative Study of
             Performance for Distributed Processing
               Gregory J Blajian                                        Maurice Eggen
                 *Roger Eggen                                             Gerald Pitts
        Computer and Information Sciences                       Department of Computer Science
           University of North Florida                                 Trinity University
             Jacksonville, FL USA                                   San Antonio, TX USA
              Phone 904.620.1326                                      Phone 210.999.7487
               Fax 904.620.2988                                        Fax 210.999.7477
                  ree@unf.edu                                       meggen@cs.trinity.edu

Abstract: Microsoft has released .NET, a platform        execution so that subsequent executions are at native
dependent standard for the C# programming                code speed. At a higher level, many languages have
language. Sponsored by Ximian/Novell, Mono, the          compilers that target MSIL for execution in the CLR
open source development platform based on the .NET       whereas the JRE is essentially a Java only
framework, has been developed to be a platform           environment. This paper compares the cross-platform
independent version of the C# programming                CLR implementation, Mono, with Microsoft’s CLR
environment. While .NET is platform dependent,           as applied to parallel distributed computing.
Mono allows developers to build Linux and cross-
platform applications. Mono’s .NET implementation
is based on the ECMA standards for C#. This paper
                                                         2 Fundamentals
examines both of these programming environments
with the goal of evaluating the performance                        As with any application performance study,
characteristics of each. Testing is done with various    a reasonable set of performance criteria must be
algorithms. We also assess the trade-offs associated     established in the beginning. This involves the
with using a cross-platform versus a platform            selection of hardware platform, the operating systems
dependent implementation.                                to be used and the performance measuring code
                                                         requirements. The hardware selection was simply
                                                         based on reasonable availability. The research
Keywords: distributed processing, algorithm              facility is a cluster of six Dell personal computers. [4]
efficiency, native code                                  Each personal computer had an Intel Pentium III
                                                         550MHz processor, the same motherboards, BIOS’s,
1 Introduction                                           the same built in Network Interface Cards, and either
                                                         128MB of RAM or 256MB of RAM. The study was
         This research compares the relative             not meant to test memory management or memory
efficiency of Mono, [1] a cross-platform                 performance so the differences in available memory
implementation of C# with Microsoft’s .Net               would does not impact the results in any significant
environment. [2] Microsoft introduced .Net and the       or measurable way.
Common Language Runtime, CLR, as a runtime                         The next choice was that of operating
environment similar to Sun’s Java Runtime                systems. The choice of Microsoft Windows ™ XP
Environment, JRE. [3] CLR is an environment that         Professional [5] with Service Pack 2 seemed the
accepts an intermediate language, MSIL, which is         appropriate choice for one of the two sets of tests
interpreted at execution time similar to the byte code   since it is the incumbent operating system on which
used by JRE.                                             the first implementation of the CLR runs and for
         While both languages use an intermediate        which the CLR was designed. The expectation was
language, there are some distinct differences between    that the Microsoft implementation of the CLR should
Microsoft’s CLR and Sun’s Java Virtual Machine,          easily outperform any CLR implemented by a
JVM. In Microsoft’s implementation, the MSIL is          competitor. The choice of alternative operating
actually compiled to managed native code on the first    systems was Fedora Core 4 Test 3. [6] Arguably, this
                                                         choice may not have been the best of those Linux
operating systems available, but the choice seemed          4 O(n log2 n)
quite reasonable given the strong ties to Redhat’s
previous commercial offerings. [7] The choice of a
linux operating system was not expected to impose                     Interestingly, an O(n log2 n) algorithm does
any penalties on the study due to the performance           not benefit nearly as much from creating subsets of
characteristics of the operating system. After all, the     the problem set for a distributed environment. In
operating systems would be run on the identical             fact, the contribution of the merge sort to reconstitute
hardware and were expected to perform in a similar          a distributed O(n log2 n) sort can be significant in the
manner regardless of the operating system installed.        overall processing time required for even reasonably
         A variety of algorithms exhibiting varied          sized problem sets. This must be kept in mind when
performance characteristics were chosen to challenge        viewing the results, but it does not invalidate the
the distributed capabilities of each environment.           original assumption that the amount of time required
Using data sets of varying size and algorithms              to move the data between machines would be a
requiring     different    execution    times     tests     reasonably close approximation of the time required
communication as well as runtime efficiency. A              to perform the heap sort on each subset of the overall
bubble sort algorithm, O(n2), requires more processor       problem set. O(n log2 n) to sort dominates O(n) to
time and correspondingly less communication time.           merge.
The second algorithm was chosen to more evenly
align compute and data transfer times. A heap sort          5 O(n)
algorithm running in O(n log2 n) was used. The last
algorithm is a simple exhaustive search algorithm,
O(n) that would place more burden on the network                     The exhaustive search algorithm, O(n), has a
performance than on processors performance.                 somewhat interesting problem when in a distributed
                                                            environment. The problem set is only required to
                                                            move one way and the result set is expected to be
3 O(n2)                                                     significantly smaller than the outbound set. Like the
                                                            O(n log2 n) algorithm, this does not impact the results
          If an O(n2) algorithm receives double the         or invalidate the initial assumption that more time
amount of data then execution time increases four           would is required to marshal and un-marshal the
times, thus the notation O(n2). In a distributed            problem set across the network than is required to
environment, O(n2) algorithm will take more                 perform the algorithm.
execution time when the problem size, n, grows
larger; however, if the problem is split either to
execute in separate threads on a symmetric
                                                            6 Results
multiprocessor (SMP) or to execute as a thread on
separate machines, the problem no longer grows in a                 The machines used in this study were
strictly O(n2) manner. For instance, let us assume a        configured in the fashion indicated in Figure 6.1.
problem of size n is run in a single thread on a singe      Those objects marked S indicate the Servers, the one
machine; the execution time will take O(n2). Now            marked C is the Client machine and the one marked
assume the same problem set is divided by k                 Sw is the 10/100Mbps switch. All the NICs in the
processors, where each subset is distributed, and the       machines were 100Mbps and operated at this
same algorithm is applied to each subset. One can
easily see that the time is now k * (n/k)2 plus the time
to distribute and to reconstitute the full resulting set.
For the purposes of the study the resulting set is
reconstituted using an ordered merge sort in O(n)
time. At this point it becomes difficult to determine
how much performance benefit is gained simply by
decreasing the size of each calculated element and
how much is due to the CLR environment in which it
is running. However, both environments are
organized in exactly the same manner so that the
relative performance characteristics are meaningful.
Linux O(n^2)

                                                                   14:24.00
                                                                   12:57.60
                                                                   11:31.20                                                       1024
                                                                                                                                  2048
                                                                   10:04.80
                                                                                                                                  4096
                                                                   08:38.40
                                                                                                                                  8192

                                                           Tim e
                                                                   07:12.00
                                                                                                                                  16384
                                                                   05:45.60
                                                                                                                                  32768
                                                                   04:19.20
                                                                                                                                  65536
                                                                   02:52.80
                                                                                                                                  131072
                                                                   01:26.40
                                                                   00:00.00
                                                                              18   15     12     9           6       3       1
                                                                                          Number of Threads

                                                                                        Figure 6.1.2

                                                         negligible increases in time required for 6 or more
                    Figure 6.1                           threads. It is only when the larger array sizes must be
                                                         sorted with 3 or fewer threads that the expected n2
speed for both sets of tests. The client machine         behavior becomes clearly evident.
shared the same hardware characteristics as the                   Figure 6.1.3 contains the results for the same
servers.                                                 environment using the O(n log2 n) algorithm.

6.1 The Mono Results                                                                     Linux O(nLogn)

                                                                   00:05.18
          The first sets of tests to be conducted were                                                                           1024
the Linux tests using Mono. Fedora Core 4 Test 3                   00:04.32
                                                                                                                                 2048
operating system was on each server and the client in              00:03.46                                                      4096
the default package mode for a personal desktop.                                                                                 8192
                                                           Tim e

Next, the Mono CLR environment, version 1.1.7.1,                   00:02.59                                                      16384

was installed on all seven machines. Figure 6.1.2
                                                                                                                                 32768
                                                                   00:01.73
shows the results obtained running the n2 algorithm in
                                                                                                                                 65536
                                                                                                                                 131072
the Mono CLR on the Fedora Core 4 Test 3                           00:00.86
                                                                                                                                 262144
machines.                                                          00:00.00                                                      524288

          One may note that even as the problem set                           18   15    12      9       6       3       1       1048576

size gets to the reasonably large value of 131072                                        Number of Threads

integers, all of the problem sizes exhibit
                                                                                        Figure 6.1.3
                                                                  The O(n log2 n) graph shows a somewhat
                                                         expected behavior as a result of the merge sort
                                                         combined with the network and algorithm processing
                                                         requirements for each machine. Note that the times
                                                         required actually decrease as the number of threads
                                                         approaches the number of physical machines and that
                                                         for the smaller array sizes even beyond that point.
                                                         The smaller array sizes actually suffer a significant
                                                         amount in time lost due to the splitting and
                                                         reconstitution of the array. However, as the array size
                                                         approaches a significant value the benefit of the
                                                         distributed model begins to manifest itself.
                                                                  Figure 6.1.4 shows the results for the same
                                                         environment using the O(n) algorithm. It is certainly
                                                         not to be unexpected that the O(n)
Windows O(n^2)
                                 Linux O(n)
                                                                                05:02.40
         00:01.04
                                                                                04:19.20
                                                           1024                                                                    1024
         00:00.86
                                                           2048                 03:36.00                                           2048
         00:00.69                                          4096                                                                    4096
                                                           8192                 02:52.80                                           8192

                                                                       T im e
  Time

         00:00.52                                          16384
                                                                                02:09.60                                           16384
                                                           32768
         00:00.35                                          65536                                                                   32768
                                                                                01:26.40                                           65536
                                                           131072
         00:00.17
                                                           262144
                                                                                00:43.20                                           131072
         00:00.00                                          524288
                    18   15    12      9      6    3   1   1048576              00:00.00
                               Number of Threads                                           18   15     12      9      6    3   1
                                                                                                       Number of Threads
                              Figure 6.1.4
                                                                                                     Figure 6.2.1
algorithm will display near linear behavior in a
negative direction since one would not expect to gain
any benefit from either splitting an array up for O(n)               Figure 6.2.2 contains the results of the .Net CLR on
processing via threads or via threads in combination                 Windows for the O(n log2 n) algorithm.
with a distributed system. That being said, some
benefit may be gained if the array size is large
enough that the time required to process each subset                                                 Windows O(nLogn)
of the array begins to outweigh the cost of
transferring the array across the network. This would                           00:03.46

seem to be born out for array sizes 524288 and more                             00:03.02                                           1024
so for array size 1048576.                                                      00:02.59                                           2048
                                                                                                                                   4096
                                                                                00:02.16
6.2 The .Net Results                                                                                                               8192
                                                                       T im e

                                                                                00:01.73                                           16384
         As one would expect the results for the                                00:01.30                                           32768
windows set of tests were expected to be significantly                                                                             65536
                                                                                00:00.86
better than the results indicated in the Mono CLR                                                                                  131072
environment and the results of the .Net CLR tests                               00:00.43                                           262144
certainly seemed to bear that out at first glance.                              00:00.00                                           524288
Unfortunately, things are not always what they seem.                                       18   15     12      9      6    3   1   1048576
The results from the O(n2) algorithm on the Windows                                                    Number of Threads
operating system using the .Net CLR are illustrated in
Figure 6.2.1.
                                                                                                     Figure 6.2.2

                                                                     And lastly, Figure 6.2.3 contains the results of the
                                                                     O(n) algorithm.
Windows O(n)                                                       Linux versus Windows (Native)

          12:00:01 AM                                                            900
                                                                                 800
          12:00:01 AM                                         1024
                                                                                 700
          12:00:01 AM                                         2048
                                                                                 600
                                                              4096
          12:00:00 AM                                                            500                                                              Linux

                                                                          Time
                                                              8192
  Tim e

          12:00:00 AM                                                            400                                                              Windows
                                                              16384
                                                              32768              300
          12:00:00 AM
                                                              65536              200
          12:00:00 AM
                                                              131072             100
          12:00:00 AM                                         262144                 0
                                                              524288                     1024   2048   4096     8192   16384 32768 65536 131072
          12:00:00 AM
                        18   15    12     9      6    3   1   1048576                                         Size of Data Set

                                   Number ofThreads

                              Figure 6.2.3                                                                  Figure 6.3.1

                                                                                 Please note the vast difference indicated in
          You may note that the results look quite a bit
                                                                        the two operating systems even utilizing native code.
different when compared to the Linux results using
                                                                        At 131072 integers in the array the Fedora Core 4
the Mono CLR. The O(n2) algorithm is pretty
                                                                        Test 3 operating system performed more than 3 times
similar, but a look at the actual numbers unveils a
                                                                        slower than the exact same code in Windows.
significant difference. At this point a suspicion may
be dawning that things are not as they should be for
this comparison. After all the times required for the                   6.4 Combined Results
O(n) algorithm on Windows are completely linear
whereas they were not on the Linux platform. The                                 The following three figures indicate the
O(n log2 n) results while they are reasonably similar                   direct comparison between .Net and Mono. In the
are again quite a bit faster. This leads to some                        figures, N = .Net and M = Mono. Figure 6.4.1
speculation that there may be other factors at work.                    contains the O(n2) results, while figure 6.4.2 contains
It is also important to note that there was more                        the O(n log2 n) results and finally figure 6.4.3
difference in the results than these graphs indicate.                   contains the O(n) results. The chart indicating the
The Linux machines all demonstrated a much quicker                      direct comparison N = .Net and M = Mono.
propensity to go to the hard drive cache/swap space
as the problem size grew which led to wild                                                                  .Net vs Mono O(n^2)                   1024 N
fluctuations in the results. These results were
                                                                                                                                                  1024 M
discarded as not pertinent since it was not intended                                14:24.00                                                      2048 N
for the study to validate the efficiency of the memory                              12:57.60
management, but the effect could not be entirely                                                                                                  2048 M
                                                                                    11:31.20
ignored either.                                                                                                                                   4096 N
                                                                                    10:04.80                                                      4096 M
                                                                                    08:38.40                                                      8192 N
6.3 Native Code
                                                                           T im e

                                                                                    07:12.00                                                      8192 M
                                                                                    05:45.60                                                      16384 N
         Since the results seemed to indicate that the                                                                                            16384 M
.Net CLR outperformed the Mono CLR written and                                      04:19.20
                                                                                                                                                  32768 N
supported by operating system vendor Novell [8] by                                  02:52.80
such a wide margin the suspicion led to the native
                                                                                                                                                  32768 M
                                                                                    01:26.40
code testing of each operating system. Windows                                      00:00.00
                                                                                                                                                  65536 N

was tested using a bare O(n2) bubble sort algorithm                                             18     15      12       9        6   3     1      65536 M
written in C/C++ and compiled using Microsoft                                                                  Number of Threads
                                                                                                                                                  131072 N
Visual Studio 6 in release mode. The Fedora Core 4                                                                                                131072 M
Test 3 operating system was tested using the exact                                                          Figure 6.4.1.
same code compiled using GCC, version 3.4.4,
without debugging. The results are indicated in
Figure 6.3.1.
1024 N     to .Net. In fact, if one was considering such a move
                                        .Net vs Mono O(nLogn)
                                                                                1024 M     they may get even more performance than they may
                                                                                2048 N     otherwise expect.
                                                                                           If one is familiar with Java and RMI, the Java
           00:05.18
                                                                                2048 M

           00:04.32                                                             4096 N     equivalent of .Net/Mono distributed processing, it is
                                                                                4096 M     noteworthy to point out that the Mono/.Net was
           00:03.46                                                             8192 N     easier to implement than the same functionality in
                                                                                8192 M
                                                                                           Java.
  T im e

           00:02.59                                                             16384 N
                                                                                16384 M
           00:01.73
                                                                                32768 N    8 Future Directions
                                                                                32768 M
           00:00.86
                                                                                                     It is easy to argue that these results more
                                                                                65536 N
                                                                                65536 M
           00:00.00
                                                                                131072 N
                                                                                           than justify further study.      Of the free Linux
                                                                                           operating systems available, one could try Debian,
                         18        15         12        9       6       3   1
                                                                                131072 M
                                                                                           Ubuntu, Knoppix, FreeBSD, or another not listed
                                             Number of Threads
                                                                                262144 N
                                                                                           here, possibly even one or more commercial variants
                                                                                           of linux, such as Suse/Novell or Redhat. One should
                                             Figure 6.4.2                                  consider the direct comparison of native code speeds
                                                                                           before pursuing CLR.
                                                                                1024 N               Another possible direction in this would be
                                             .Net vs Mono O(n)
                                                                                1024 M     to compare both the Mono CLR versus .Net CLR on
           12:00:01 AM
                                                                                2048 N     the Windows operating system. Mono will run on
                                                                                2048 M     Windows, by doing so will eliminate operating
           12:00:01 AM                                                          4096 N
                                                                                           system impact.
                                                                                                     A third possible study would entail a
                                                                                4096 M
                                                                                8192 N
                                                                                           comparison of the Mono CLR with Java. Both could
           12:00:01 AM
                                                                                8192 M
                                                                                           be run on a Linux platform and neither would
  Tim e

           12:00:01 AM                                                          16384 N
                                                                                16384 M    necessarily gain an operating system advantage. It
           12:00:00 AM
                                                                                32768 N    may even be reasonable to expect that one could
           12:00:00 AM
                                                                                32768 M    compare the two in the same clustered environment
                                                                                65536 N    as this study did.
           12:00:00 AM                                                          65536 M
                              18        15         12       9       6   3   1   131072 N
                                               Number of Threads                131072 M   9 References
                                                                                262144 N

                                             Figure 6.4.3                                  [1] What is Mono? (31 Jan 2006). Retrieved
                                                                                           February 16, 2006, from http://www.mono-
7 Conclusions                                                                              project.com/Main_Page

                                                                                           [2] .NET Highlights. (December 27, 2005).
         So where does this leave the results as a
                                                                                           Retrieved February 20, 2006 from
function of the study? Clearly the use of the Fedora
                                                                                           http://www.microsoft.com/net/default.mspx
Core 4 Test 3 operating system severely handicapped
the Mono CLR as a test environment. It would seem,
                                                                                           [3] Java.sun.com: The Source for Java Developers.
however, that not all is as bad as it may at first
                                                                                           (2006). Retrieved February 24, 2006 from
appear. Looking at the time required to run the O(n2)
                                                                                           http://java.sun.com/
algorithm in the Mono CLR with 131072 integers in
the array, 774.64s, and comparing that to the length
                                                                                           [4] Dell Products and Services. (2006). Retrieved
of time it took to run the same number of integers
                                                                                           February 1, 2006 from http://www.dell.com/
using native code, 825.14s, it would seem that the
Mono CLR is indeed MORE efficient than even
                                                                                           [5] Windows XP Editions. (226). Retrieved February
native code.       Though the operating system
                                                                                           1, 2006 from
handicapped the results such that truly direct
                                                                                           http://www.microsoft.com/windowsxp/default.mspx
comparisons seem unreasonable to take, there is
certainly no reason, given the results of Mono versus
Native code, to discount Mono as a viable alternative
[6] Fedora. (March 20, 2006). Retrieved February
21, 2006 from http://fedora.redhat.com/

[7] Red Hat. (March 20, 2006). Retrieved February
21, 2006 from http://redhat.com/

[8] Novell: Software for the Open Enterprise. (2006).
Retrieved February 19, 2006 from
http://www.novell.com/
You can also read