Integer Boltzmann Machines for Future Generation Digital Annealing Units - AQC 2021

Page created by Renee Herrera
 
CONTINUE READING
Integer Boltzmann Machines
for Future Generation Digital Annealing Units

 Mohammad Bagherbeik1, Kouichi Kanda2,
 Hirotaka Tamura3, Ali Sheikholeslami1

¹Dept. of Electrical & Computer Engineering, University of Toronto, Canada
 ²Fujitsu Limited, Kawasaki, Japan
 ³DXR Laboratories, Kawasaki, Japan

 Adiabatic Quantum Computing Conference 2021

 Integer Boltzmann Machines for Future DAUs 1
Outline
 Motivation and Background
 ◼ Quadratic Assignment Problems
 ◼ Local Search Algorithms

 Integer Boltzmann Machine

 Parallel Tempering Solver Design
 ◼ Algorithm and SIMD
 ◼ Load Balancing

 Benchmark Results

 Conclusion

 Integer Boltzmann Machines for Future DAUs 2
Integer Assignment Problems
Example 1: Travelling Salesman  Class of NP-hard combinatorial optimization
 problems with integer variables
 ◼ Quadratic Assignment Problem (QAP) [1]

  Simple formulation but difficult to solve even with
 modern CPU and GPU hardware
 Start & Visting Travel Itinerary
 Locations (Min. Total Distance)

Example 2: FPGA Block Placement
 a b c d a b
 Goal
  Create an integer Boltzmann Machine variant to
 e d accelerate search for QAP
 place f c

 e f
  Presentation covers formulation for symmetric,
 CLB Netlist
 FPGA Placement bias-less QAPs but concepts are directly
 (Min. Route Cost)
 transferrable to asymmetric + biased problems

 Integer Boltzmann Machines for Future DAUs 3
QAP Formulation
 ϕ X
 1 x1,1 x1,2 x1,3 x1,4
 L1 ϕ1 = 2
 0 1 0 0
 x2,1 x2,2 x2,3 x2,4
 4 2 L2 ϕ2 = 3
 0 0 1 0
 x3,1 x3,2 x3,3 x3,4
 L3 solve ϕ3 = 4
 0 0 0 1
 3 x4,1 x4,2 x4,3 x4,4
 L4 ϕ4 = 1
 1 0 0 0
 Facilities & Flows Locations & Distances Assignment Permutation (Integer and Binary)

 Given facilities and locations
 Find a permutation = ∈ , where is set of all possible permutations
 ◼ assigns each facility , to a unique location such that the total cost of the
 assignment (1) is minimized:
 
 1, = 1
 = ෍ ෍ , , (1) , = ቊ , = xi,j ∈ 0,1 × 
 (2)
 0, ℎ 
 =1 =1

 Integer Boltzmann Machines for Future DAUs 4
Local Search
1. Propose new states as perturbations of the current state
 ◼ Exchange locations of 2 facilities

2. Calculate difference in cost function (Δ ) between previous and proposed state

3. Decide whether to accept proposed state based on some criterion
 ◼ Stochastic Approach (e.g. Simulated Annealing):
 Use a varying temperature variable ( ) that is used to inject randomness into acceptance of
 moves via Metropolis Hastings acceptance probability generated using (3)
  Proposal Acceptance Rate ( )
  → ∞, → 100%, all proposals are accepted regardless of Δ value
  → 0, → 0%, search turns greedy

 Δ 
 = min{1, exp − } (3)
 
 Integer Boltzmann Machines for Future DAUs 5
Local Search: QAP Exchange
  Exchange locations of two facilities , 
 x1,1 x1,2 x1,3 x1,4
 ϕ1 = 2
 0 1 0 0
 ϕ2 = 3
 x2,1 x2,2 x2,3 x2,4
  Change in cost calculated via (4)
 0 0 1 0
 x3,1 x3,2 x3,3 x3,4
 ϕ3 = 4
 0
 x4,1
 0
 x4,2
 0
 x4,3
 1
 x4,4
  Implemented in software via simple
 ϕ4 = 1
 1 0 0 0 loop(s) in ( ) time

 x1,1 x1,2 x1,3 x1,4 
 ϕ1 = 1
 1 0 0 0
 x2,1 x2,2 x2,3 x2,4 Δ = ෍ 2( , − , )( , − , ) (4)
 ϕ2 = 3
 0 0 1 0 =1, ≠ , 
 x3,1 x3,2 x3,3 x3,4
 ϕ3 = 4
 0 0 0 1
 x4,1 x4,2 x4,3 x4,4
 ϕ4 = 2
 0 1 0 0

 Integer Boltzmann Machines for Future DAUs 6
Outline
 Motivation and Background
 ◼ Quadratic Assignment Problems
 ◼ Local Search Algorithms

 Integer Boltzmann Machine

 Parallel Tempering Solver Design
 ◼ Algorithm and SIMD
 ◼ Load Balancing

 Benchmark Results

 Conclusion

 Integer Boltzmann Machines for Future DAUs 7
Boltzmann Machines
 1D – neurons 2D – × neurons
 x1,1 x1,2 x1,3 x1,n
 x5 T rand()

 x4 x6 x2,1 x2,2 x2,3 x2,n

 hi
 update
 x3 wi,j xi x3,1 x3,2 x3,3 x3,n

 S xi
 x2 xN bi
 x1 xn,1 xn,2 xn,3 xn,n

 State 1
 
 1
 
 Cost/ = − ෍ ෍ , − ෍ (5) = − ෍ ෍ ෍ ෍ , ; , , , − ෍ ෍ , , (7)
 2 2
Energy =1 =1 =1 =1 =1 =1 =1 =1 =1

Neuron 

 Local ℎ ( ) = ෍ , + (6) ℎ , ( ) = ෍ ෍ , ; , , + , (8)
 Fields =1 =1 =1

 Integer Boltzmann Machines for Future DAUs 8
BM vs Integer BM for QAP [2,3]

 Data Storage Searching the State-Space: Exchange 2 Facilities (a b)
 n locations S1 S2 S3 S4 S5 S3

 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 S2 S4
 n facilities

 0 1 0 0
 State 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0
 2

 Cost(X)
 0 0 1 0 n binary bits 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0
BM

 0 0 0 1 64kbits at n=256 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1

 S1
 W matrix: Fully Connected Bits Infeasible States S5
 1.6G floats at n=256 4 Sequential Spin-Flips Search State-Space
 ϕ
 State

 Cost(ϕ)
Int BM

 1 F, D Matrices 1 3
 n integers
 2 2 x 64k floats 2 2 S1
 3 2,048bits 3 1
 at n=256 S5

 n at n=256 n S1 1 Exchange Operation S5 n Search State-Space

 2D BM requires penalty terms to enforce one-hot constraints on rows/columns of 
 Using an Integer BM with intrinsic exchange moves (4-bit-flip equivalent) simplifies
 problem formulation and speeds up search, significantly reduces memory reqs.

 Integer Boltzmann Machines for Future DAUs 9
Int BM QAP Formulation
 Notes Formulation #
 
• Remove conditions on the summation
 Δ = ෍ 2( , − , )( , − , ) (4)
 loop iterator
 =1, ≠ , 
 
• Introduces extra terms within the sum
• Compensate via additional terms Δ = ෍ 2( , − , )( , − , )
 (9)
 =1
 +4 , 
• Create vector of flow row differences Δ = ,∗ − ,∗ (10)
• Create vector of permutation ordered 
 Δ = ,∗ − ,∗ ⊺ (11)
 distance row differences
 
• Reformulate using a dot product Δ = 2Δ ∙ Δ + 4 , (12)

 Integer Boltzmann Machines for Future DAUs 10
Integer BM: Local Fields
 Notes Formulation #
• Generate cache values at start of
 = ⊺ = ℎ , ∈ ℝ × (13)
 search in ( 3 ) time

 X L1 L2 L3 L4 H L1 L2 L3 L4
 x1,1 x1,2 x1,3 x1,4
• Elements in correspond to bits 0 1 0 0
 h1,1 h1,2 h1,3 h1,4
 in x2,1 x2,2 x2,3 x2,4
 h2,1 h2,2 h2,1 h2,4
• Element ℎ , is cost of assigning a 0 0 1 0
 x3,1 x3,2 x3,3 x3,4
 facility to location 0 0 0 1
 h3,1 h3,2 h3,3 h3,4
 x4,1 x4,2 x4,3 x4,4
 h4,1 h4,2 h4,3 h4,4
 1 0 0 0

• Reformulate dot product as sum 
 Δ ∙ Δ = ℎ , + ℎ , − ℎ , − ℎ , (14)
 of 4 elements taking (1) time
 
 Δ = 2Δ ∙ Δ + 4 , Δ = 2(ℎ , + ℎ , − ℎ , − ℎ , ) + 4 , (15)

 Integer Boltzmann Machines for Future DAUs 11
Int BM: Local Field Updates

 H (ΔF) ΔD  Update local fields 1 row at a time using
 + floating point fused-multiply-adds
h1,1 h1,2 h1,3 h1,n = δf1 δd1 δd2 δd3 δdn

 ×
 ◼ ( 2 )
h2,1 h2,2 h2,1 h2,n +
 = δf2 δd1 δd2 δd3 δdn

 ×
  Can skip rows corresponding to 0
h3,1 h3,2 h3,3 h3,n 0 δd1 δd2 δd3 δdn elements in Δ 

 +  Significant speed-ups if is sparse or
hn,1 hn,2 hn,3 hn,n = δfn δd1 δd2 δd3 δdn
 ×

 has a particular pattern that leads to
 sparse Δ vectors
 ⊺ 
 += Δ ⊗ Δ (16)

 Integer Boltzmann Machines for Future DAUs 12
Outline
 Motivation and Background
 ◼ Quadratic Assignment Problems
 ◼ Local Search Algorithms

 Integer Boltzmann Machine

 Parallel Tempering Solver Design
 ◼ Algorithm and SIMD
 ◼ Load Balancing

 Benchmark Results

 Conclusion

 Integer Boltzmann Machines for Future DAUs 13
Parallel Tempering
  Uses M unique search-states/replicas
 with unique temperatures, = ( ) ∈ 
 Tmax
 global min found
  Replicas perform Local Search at their
Cost

 assigned temperature
 ◼ Can swap temperatures with other
 replicas with probability calculated
 via (17)
 Iterations (time)
 ◼ Allows for an escape mechanism
 from local minima
T Scaled Cost

 1 1
 = 1, exp − − +1 (17)
 Tmin +1
 State

 Integer Boltzmann Machines for Future DAUs 14
Parallel Tempering Solver System
  Experimental setup:
 ◼ 64-Core AMD 3990X CPU with AVX2
 QAP
 Replica
 C Cmin
 H
 support
 ϕ ϕmin
 Data ◼ Coded in C++ and compiled with GCC-9
 n F D T0 I
  Float32 used to store , , 
 Replica Replica
 . ..
 Replica ◼ Allows for Fused-Multiply-Add instructions
 1 2 32

C T
  Each engine uses 32 replicas in combination
 SLS
 SLS . .. SLS
 with a PT controller
 PT
 . ..
 ◼ Each replica is run on a dedicated CPU core
 core 1 core 2 core 32
 ◼ 2 engines are run in parallel to utilize all
 32 Replica PT Engine 64 cores
 ◼ Runs until a target cost is found or a 5-
 minute time-out limit is reached
 Integer Boltzmann Machines for Future DAUs 15
Load Balancing Threads
 no load balancing load balanced
 ReplicaT32 ReplicaT32
 ... ...
 ReplicaT24 ReplicaT24
 ... ...
 ReplicaT17 ReplicaT17
 ... ...
 ReplicaT1 ReplicaT1
 time time
 idle ΔC calc. (I) ΔC calc. (Ibal) update

 Varying proposal acceptance rates of replicas at different temperatures causes thread
 imbalance when using Parallel Tempering

 Track average time-per-iteration at each temperature and scale iterations at each
 temperature to try to equalize thread run-times

 Integer Boltzmann Machines for Future DAUs 16
Outline
 Motivation and Background
 ◼ Quadratic Assignment Problems
 ◼ Local Search Algorithms

 Integer Boltzmann Machine

 Parallel Tempering Solver Design
 ◼ Algorithm and SIMD
 ◼ Load Balancing

 Benchmark Results

 Conclusion

 Integer Boltzmann Machines for Future DAUs 17
QAP Benchmark Results
  Avg. Time-to-Solution (TtS) across 100
 independent, randomly seeded, runs
 with 99% confidence interval for 19
 difficult QAP instances
 ◼ Instances from [4,5,6]

  ParEOTS [7]: Parallel Extremal
 Optimization + Tabu Search
 ◼ Best performing published QAP solver
 ◼ 128 cores on CPU cluster 8 x 16–core AMD
 6376 CPU
 ◼ TtS reported as average across 10 runs with
 5-minute time-out

  57x faster than ParEOTS on average
 ◼ Max speed-up of 353x

 Integer Boltzmann Machines for Future DAUs 18
Conclusion
 Extended BMs to support for integer variables with one-hot constraints
 ◼ Remove need for penalty tuning on user side

 Added support for weight-generation using submatrices for problem
 topologies such as QAP
 ◼ Larger problem support due to lower memory requirements

 Integer BM based CPU solution displays performance that is orders of
 magnitude faster than competing solvers across difficult QAP instances

Future Work
 Accelerate search relative to high-performance multi-core CPU system
 through dedicated hardware

 Integer Boltzmann Machines for Future DAUs 19
Acknowledgements
 The authors would like to thank Fujitsu Ltd. and Fujitsu Consulting
 (Canada) Inc. for providing financial support and technical expertise on
 this research and to CMC Microsystems for access to CAD tools used in
 this research
Thank You!
 Looking forward to your questions at the live session
References
 [1] Lawler, E.L.: The quadratic assignment problem. Manage. Sci. 9(4), 586–599 (1963).
 [2] M. Bagherbeik, P. Ashtari, S. F. Mousavi, K. Kanda, H. Tamura, and A. Sheikholeslami, “A permutational
 boltzmann machine with parallel tempering for solving combinatorial optimization problems,” in International
 Conference on Parallel Problem Solving from Nature. Springer, 2020, pp. 317–331
 [3] M. Bagherbeik and A. Sheikholeslami, “Caching and vectorization schemes to accelerate local search
 algorithms for assignment problems,” in IEEE Congress on Evolutionary Computation (to appear), 2021.
 [4] Burkard, R.E., Karisch, S.E., Rendl, F.: QAPLIP-a quadratic assignment problem library. J. Global Optim.
 10(4), 391–403 (1997).
 [5] Palubeckis, G.: An algorithm for construction of test cases for the quadratic assignment problem.
 Informatica Lith. Acad. Sci. 11, 281–296 (2000)
 [6] Drezner, Z., Hahn, P.M., Taillard, E.D.: Recent advances for the quadratic assign- ´ ment problem with
 special emphasis on instances that are difficult for metaheuristic methods. Ann. Oper. Res. 139(1), 65–94
 (2005).
 [7] Munera, Danny, Daniel Diaz, and Salvador Abreu. "Hybridization as cooperative parallelism for the
 quadratic assignment problem." International Workshop on Hybrid Metaheuristics. Springer, Cham, 2016.

 Integer Boltzmann Machines for Future DAUs 22
You can also read