XBOX Performances XNA - Part of the slide set taken from: Understanding XNA Framework Performance Shawn Hargreaves GameFest 2007

 
CONTINUE READING
XBOX Performances XNA - Part of the slide set taken from: Understanding XNA Framework Performance Shawn Hargreaves GameFest 2007
XBOX Performances

                  XNA

       Part of the slide set taken from:
‚Understanding XNA Framework Performance‛
              Shawn Hargreaves
                GameFest 2007
XBOX Performances XNA - Part of the slide set taken from: Understanding XNA Framework Performance Shawn Hargreaves GameFest 2007
XBOX Performances

        • Overview
               – XNA Architecture
                 • Context Switch and Batches
               – XBOX CPUs
                 • Limitations and Threading
               – XBOX GPU
               – About Profiling

CGL slideset                         2
XBOX Performances XNA - Part of the slide set taken from: Understanding XNA Framework Performance Shawn Hargreaves GameFest 2007
Windows Architecture
               User programs cannot directly access hardware

                      Operating System    Graphics         Graphics
                      (supervisor mode)    Driver          Hardware

                      Game Executable       D3D
                      (user mode)           D3DX

CGL slideset                         3
XBOX Architecture
               Consoles typically just run everything directly in supervisor mode

                                                      Graphics            Graphics
                              Operating System
                                                       Driver             Hardware

                              Game Executable

  • No mode transitions = reduced overhead
  • Small batches less expensive than on Windows

CGL slideset                                   4
Xbox 360 Architecture
                   Xbox 360 hypervisor enforces security

                   Game Executable         D3D             Graphics
    Hypervisor
                   (supervisor mode)       D3DX            Hardware

    • Hypervisor ensures only signed memory pages can
      execute
    • Games are signed during certification
    If only signed code can execute, how is a
    dynamically jitted runtime even possible?

CGL slideset                           5
Xbox 360 Architecture
                   Xbox 360 hypervisor enforces security

                   XNA Framework            D3D            Graphics
    Hypervisor
                   (supervisor mode)        D3DX           Hardware

                   Managed Game            Managed
                                           Graphics
                   (user mode)              Device

• Managed code cannot directly call D3D or D3DX
• User to supervisor transitions are expensive
       – 4 microseconds per system call
• Command buffer batches up API calls
CGL slideset                           6
Batchable APIs
               These APIs are currently batched into a single system call

    Assigning to:                              Calling:
    •      VertexShader                        •   Effect Begin/End
    •      PixelShader                         •   EffectPass Begin/End
    •      VertexDeclaration                   •   Effect.CommitChanges
    •      IndexBuffer                         •   EffectParameter.SetValue
    •      RenderState                         •   VertexStream.SetSource
    •      SamplerStates                       •   Set*ShaderConstant
    •      Textures                            •   StateBlock Capture/Apply
    •      DepthStencilBuffer                  •   SetRenderTarget
    •      Viewport                            •   Draw[Indexed]Primitives
    •      ScissorRectangle                    •   DrawUser[Indexed]Primitives
    •      ClipPlanes                              •   If the primitive count is small
    •      Effect.CurrentTechnique             •   Clear
                                               •   Resolve
CGL slideset                               7
Nasty Unbatchable APIs
                      These APIs currently require one system call each

 •       Present
 •       Creating or destroying graphics resources
 •       *.SetData, *.GetData
 –       DrawUser[Indexed]Primitives
        •      If the primitive count is large
 –      Reading from:
        •  VertexShader
        •  PixelShader
        •  RenderState
        •  SamplerStates
        •  Textures
        •  Get*ShaderConstant
        •  EffectParameter.GetValue

CGL slideset                                     8
Cached Managed State
                   These can be read without any system call at all

 •       DisplayMode
 •       Viewport
 •       VertexDeclaration
 •       VertexStream
 •       IndexBuffer
 •       Effect.CurrentTechnique

CGL slideset                              9
XBOX CPUs
        • XBOX will run your code 4 to 6 times slower than on
          PC…
           – JIT Compiler can’t reorder instruction
               • Stalls are added to maintain sync.
               • A cache miss might cost tousands cycles
           – Floating-points are new to .NetCF
                  • JIT Compiler can’t use AltiVec instructions
               – Operators are ‘pass-by-value’
                  • Function call + overhead due to copy!
               – Garbage Collector not generational
                  • Called every 1MB allocated
                  • Called if out-of-memory exception occurs
CGL slideset                                10
XBOX CPUs

        • Almost no inlining
               – Automated, fixed rules:
                  • 16 bytes of IL or less
                  • No branching (typically an ‚if‛)
                  • No local variables
                  • No exception handlers
                  • No 32-bit floating point arguments or return value
                  • If the method has more than one argument, the
                    arguments must be accessed in order from lowest to
                    highest (as seen in the IL)
                  • Virtual methods are never inlined
               – Solution? Manual inlining...

CGL slideset                              11
XBOX CPUs

        • 3 Hardware Cores
               – Cache + Register
        • 6 Threads
               – 0 and 2 reserved for XNA
               – 1, 3, 4, 5 free
               – 4, 5 on same core (shared cache!)

CGL slideset                        12
XBOX Multithreading
  • Xbox 360 does not automatically schedule threads
    across multiple cores
  • You must explicitly assign threads to cores
          – setProcessorAffinity()… see the twiki
  • Current Xbox 360 ThreadPool is not optimized

CGL slideset                        13
XBOX Multithreading
  • GraphicsDevice is somewhat thread-safe
          –    Cannot render from more than one thread at a time
          –    Can create resources and SetData while another thread renders
  • ContentManager is not thread-safe
          –    Ok to have multiple instances, but only one per thread
  • Input is not threadable
          –    Windows games must read input on the main game thread
  • Audio and networking are thread-safe

CGL slideset                               14
XBOX GPU

        • GPU is the only piece of hardware you
          have real access to
        • Expensive computations should be done
          with it

CGL slideset               15
Profiling on Xbox 360
               XNA Framework Remote Performance Monitor for Xbox 360

  • Provides basic garbage collector
    information
  • Can tell if you have a GC problem, but not
    usually enough to diagnose the cause
  • Shows the number of system calls
  • Not much help for identifying
    computational bottlenecks

CGL slideset                            16
Tricks

        • Don’t use class defined operators
               – Use ref/out methods (.Add(ref a, out b))
               – Do your computations component-wise
        • Do not instantiate classes at runtime!
               – Use structs instead
        • Consider moving floating-point expensive task
          to GPU
        • Pay attention at API calls
        • Consider GPU instancing
          (rendering of multiple small models)
CGL slideset                           17
You can also read