XBOX Performances

XBOX Performances
XBOX Performances

                  XNA


       Part of the slide set taken from:
‚Understanding XNA Framework Performance‛
              Shawn Hargreaves
                GameFest 2007
XBOX Performances
XBOX Performances

        • Overview
               – XNA Architecture
                 • Context Switch and Batches
               – XBOX CPUs
                 • Limitations and Threading
               – XBOX GPU
               – About Profiling



CGL slideset                         2
XBOX Performances
Windows Architecture
               User programs cannot directly access hardware



                      Operating System    Graphics         Graphics
                      (supervisor mode)    Driver          Hardware




                      Game Executable       D3D
                      (user mode)           D3DX




CGL slideset                         3
XBOX Architecture
               Consoles typically just run everything directly in supervisor mode



                                                      Graphics            Graphics
                              Operating System
                                                       Driver             Hardware




                              Game Executable



  • No mode transitions = reduced overhead
  • Small batches less expensive than on Windows


CGL slideset                                   4
Xbox 360 Architecture
                   Xbox 360 hypervisor enforces security


                   Game Executable         D3D             Graphics
    Hypervisor
                   (supervisor mode)       D3DX            Hardware




    • Hypervisor ensures only signed memory pages can
      execute
    • Games are signed during certification
    If only signed code can execute, how is a
    dynamically jitted runtime even possible?

CGL slideset                           5
Xbox 360 Architecture
                   Xbox 360 hypervisor enforces security


                   XNA Framework            D3D            Graphics
    Hypervisor
                   (supervisor mode)        D3DX           Hardware




                   Managed Game            Managed
                                           Graphics
                   (user mode)              Device


• Managed code cannot directly call D3D or D3DX
• User to supervisor transitions are expensive
       – 4 microseconds per system call
• Command buffer batches up API calls
CGL slideset                           6
Batchable APIs
               These APIs are currently batched into a single system call

    Assigning to:                              Calling:
    •      VertexShader                        •   Effect Begin/End
    •      PixelShader                         •   EffectPass Begin/End
    •      VertexDeclaration                   •   Effect.CommitChanges
    •      IndexBuffer                         •   EffectParameter.SetValue
    •      RenderState                         •   VertexStream.SetSource
    •      SamplerStates                       •   Set*ShaderConstant
    •      Textures                            •   StateBlock Capture/Apply
    •      DepthStencilBuffer                  •   SetRenderTarget
    •      Viewport                            •   Draw[Indexed]Primitives
    •      ScissorRectangle                    •   DrawUser[Indexed]Primitives
    •      ClipPlanes                              •   If the primitive count is small
    •      Effect.CurrentTechnique             •   Clear
                                               •   Resolve
CGL slideset                               7
Nasty Unbatchable APIs
                      These APIs currently require one system call each

 •       Present
 •       Creating or destroying graphics resources
 •       *.SetData, *.GetData
 –       DrawUser[Indexed]Primitives
        •      If the primitive count is large
 –      Reading from:
        •  VertexShader
        •  PixelShader
        •  RenderState
        •  SamplerStates
        •  Textures
        •  Get*ShaderConstant
        •  EffectParameter.GetValue


CGL slideset                                     8
Cached Managed State
                   These can be read without any system call at all


 •       DisplayMode
 •       Viewport
 •       VertexDeclaration
 •       VertexStream
 •       IndexBuffer
 •       Effect.CurrentTechnique




CGL slideset                              9
XBOX CPUs
        • XBOX will run your code 4 to 6 times slower than on
          PC…
           – JIT Compiler can’t reorder instruction
               • Stalls are added to maintain sync.
               • A cache miss might cost tousands cycles
           – Floating-points are new to .NetCF
                  • JIT Compiler can’t use AltiVec instructions
               – Operators are ‘pass-by-value’
                  • Function call + overhead due to copy!
               – Garbage Collector not generational
                  • Called every 1MB allocated
                  • Called if out-of-memory exception occurs
CGL slideset                                10
XBOX CPUs

        • Almost no inlining
               – Automated, fixed rules:
                  • 16 bytes of IL or less
                  • No branching (typically an ‚if‛)
                  • No local variables
                  • No exception handlers
                  • No 32-bit floating point arguments or return value
                  • If the method has more than one argument, the
                    arguments must be accessed in order from lowest to
                    highest (as seen in the IL)
                  • Virtual methods are never inlined
               – Solution? Manual inlining...

CGL slideset                              11
XBOX CPUs

        • 3 Hardware Cores
               – Cache + Register
        • 6 Threads
               – 0 and 2 reserved for XNA
               – 1, 3, 4, 5 free
               – 4, 5 on same core (shared cache!)




CGL slideset                        12
XBOX Multithreading
  • Xbox 360 does not automatically schedule threads
    across multiple cores
  • You must explicitly assign threads to cores
          – setProcessorAffinity()… see the twiki
  • Current Xbox 360 ThreadPool is not optimized




CGL slideset                        13
XBOX Multithreading
  • GraphicsDevice is somewhat thread-safe
          –    Cannot render from more than one thread at a time
          –    Can create resources and SetData while another thread renders
  • ContentManager is not thread-safe
          –    Ok to have multiple instances, but only one per thread
  • Input is not threadable
          –    Windows games must read input on the main game thread
  • Audio and networking are thread-safe



CGL slideset                               14
XBOX GPU

        • GPU is the only piece of hardware you
          have real access to
        • Expensive computations should be done
          with it




CGL slideset               15
Profiling on Xbox 360
               XNA Framework Remote Performance Monitor for Xbox 360


  • Provides basic garbage collector
    information
  • Can tell if you have a GC problem, but not
    usually enough to diagnose the cause
  • Shows the number of system calls
  • Not much help for identifying
    computational bottlenecks


CGL slideset                            16
Tricks

        • Don’t use class defined operators
               – Use ref/out methods (.Add(ref a, out b))
               – Do your computations component-wise
        • Do not instantiate classes at runtime!
               – Use structs instead
        • Consider moving floating-point expensive task
          to GPU
        • Pay attention at API calls
        • Consider GPU instancing
          (rendering of multiple small models)
CGL slideset                           17
Next part ... Cancel