Parallel the optimal reduced tour length. The

Parallel Simulated Annealing to Solve TSP
Ajinkya Wakhale, Sushant Chaudhari, Pavan kumar Hanumantharaya
University of Maryland, Baltimore County, Department of CSEE
1000, Hilltop Circle, Baltimore, Maryland, USA,21227
December, 2017
[email protected] [email protected] [email protected]
Abstract?— ?We implement a serial and parallel algorithm for
solving Travelling Salesman Problem(TSP) using simulated
annealing methodology. We propose to solve TSP by applying
Message Passing Interface(MPI) in order to reduce the overall
time required to calculate the optimal solution with parallel
simulated annealing approach. In our algorithm, the number of
operational executions is proportional to the number of
available nodes in the cluster. This case study, analyses and
compares the performance of serial and parallel approaches
for solving TSP. It illustrates the decrease in execution time
for calculating best path and shows that parallel approach is
best-suited in case of high-end machines.
Keywords?— Simulated Annealing, Travelling Salesman
Problem, Parallel Algorithm, Bluespeed Cluster, Multi-core
systems.
I. INTRODUCTION
Simulated Annealing is a probabilistic technique named
after metal heat treatment process to find the approximate
global optimum of a given function. Temperature is the most
vital variable in annealing process and it determines the
overall energy state of the process. Simulated Annealing
provides a solution for optimization problems and focuses on
converting mathematical insights from the physics domain to
real world optimization problems. Simulated Annealing of
TSP approach starts with an arbitrary initial tour and changes
from one tour to another when we encounter a new city. The
result will be the optimal reduced tour length.
The Traveling Salesman Problem (TSP) is one of the classic
problems in combinatorial optimization research. “Given a set
of cities and distance between every pair of cities, the problem
is to find the shortest possible route that visits every city
exactly once and return to the starting city”18. Its solution is
widely used in diverse fields such as job sequencing, X-Ray
crystallography, computer wiring, VLSI chip fabrication,
vehicle routing 16. Any improved algorithms to find
optimized solutions to this problem will be adapted to an
entire class of NP-complete problems.
MPI is a kind of communication protocol for programming
parallel computers. MPI provides a standard set of base
routines that will be efficiently implemented in parallel
computer hardware on a distributed system. It supports both
point-to-point and collective communication. MPI is one of
the highly used model in high performance computation 13.
MPI provides essential communication, virtual topology and
synchronization functionalities between set of different
processes.
In this paper, we have implemented both serial and parallel
version of TSP using simulated annealing principles. We
demonstrate, the significant reduction in execution time to
find the best path using MPI as communication interface
between nodes . We analyze the performance results of both
the implemented algorithms.
The rest of the paper, we discuss related work, overview of
TSP, Simulated Annealing, Parallel Communication Interface,
explain pseudocode and implementation of both serial and
parallel algorithms, discuss several parameters and
benchmarks, compare serial performance with parallel and
finally present our conclusions.
II. ?RELATED WORK
We briefly review selected related work and previous
algorithms used for solving TSP using simulated annealing
method.
In 1982, Kirkpatrick 11 introduced simulated annealing
and gave basic applications in obtaining solutions for
optimization problems. The aim was to develop a crystal
lattice such that it is the form of highly ordered silicon and
defect-free for semiconductor manufacturing. The material
was ‘annealed’ to achieve the end product and the process is
termed as ‘Simulated Annealing’. It is a heuristic approach
based on Monte Carlo iterative solution method.
TSP is combination optimization problem modelled as a
graph problem to find the shortest possible route between the
cities. Simulated annealing is powerful and most effective in
solving optimization problems such as TSP.
In 1995, David S. Johnson and Lyle A. McGeoch 17
applied Simulated annealing to TSP problem and gave the
detailed analysis of their results and possible speedup
techniques. They used Neighbourhood Pruning and a low
value to initialize temperature. They concluded that
1
combining neighbourhood pruning and low-temperature yields
better results for Simulated annealing on TSP.
In past few years, there were many pieces of research and
studies about different applications of TSP in discrete fields.
In 2009, XuZhihong 14 implemented Ant colony algorithm
to solve TSP using simulated annealing. They proposed
parallel methods to implement Ant colony algorithm using
MPI. They analyzed the performance difference between the
parallel algorithm and traditional algorithm. In conclusion,
they proposed a hybrid algorithm as a combination of ant
colony and parallel simulated annealing approach to solve
TSP resulting in low-cost and high efficient results.
III. Overview
Travelling Salesman Problem(TSP) is an NP-Hard problem,
which means that there is no algorithm that gives us best
solution to TSP problem. Although it is difficult to predict the
best solution, we often only want a good solution. We always
worry about the inordinate amount of computing time and
resources which a solution might take and hence we put our
focus on finding a solution that takes less time and computing
resource. To find a good solution quickly, we often use
stochastic optimization methods. We select randomly
generated routes and incrementally improve them in some
manner to find a better route. The change depends on the
algorithm used to solve a problem like TSP.
A common approach to solve TSP is to think of solutions
existing within a landscape. We can move from one solution
to another in search of better solutions. One methodology is
Hill Climbing method. Hill climbing algorithm which is a
greedy algorithm first picks up a place to start and then take a
solution which is uphill in our landscape and stop when there
are no more uphill steps. This means that in Hill climbing
approach, it is pretty quick to reach the top of the hill in our
landscape but it depends on the initial starting position. The
biggest hill in solution landscape is global maximum and top
of any other hill is local maxima. Standard hill climbing gets
stuck at top of local maxima.
Simulated Annealing is the modified version of hill
climbing. It has the better chance of finding the global
maxima in the solution landscape. As we know hill climbing
algorithm sometimes get stuck at the local optimal solution,
simulated annealing tries to address this problem by moving in
both the direction in solution landscape. i.e both uphill and
downhill. So. simulated annealing is basically a hill climbing,
but with an ability to go downhill.
In Simulated annealing, for each iteration creates a
neighbour by changing some parameters of the solution and
then checks its neighbour function value against the previous
solution. To find the approximate solution for a problem like
TSP the number of iterations required is huge. This
significantly burdens the resources and increases the time
complexity of the problem. Hence the parallelized version of
Simulated annealing yields better results in terms of total time
taken for an input. To effectively parallelize an algorithm
between processes, selecting inter-process communication
model is very important. One of the best light-weighted
models for inter-process communication is MPI. It is a
low-level parallel programming library but it provides several
inbuilt methods which help in parallel distribution of work
among the nodes in the cluster. In MPI, the application
communicates with other processes using messages in order to
perform a task.
IV. Implementation
Serial SA Algorithm for TSP
Input:
n = number of city count.
T = Initial Temperature 0
Tk = The temperature at k-th iteration of simulated annealing
City Co-ordinates(s): set of given city co-ordinates in a tour
Neighbor State (s?): City obtained by a random switch of city
array.
City Cost Matrix(C): total Euclidian distance between all the
cities
Difference in Cost ( ? ): the relative difference in cost c
between s and s?
Cooling Constant ( ? ): The rate at which temperature is
decreased in each iteration.
Probability Function (P): probability of selecting costly state
and to avoid staying in the local maxima
Tk+1 = ?Tk
, where ? is some constant between 0 and 1
P( ?, T ) = for >0 k { e ,
??
T
k ? > 0 Tk
P( ?, Tk
) = { 1, ? <= 0 for Tk >0
When ? > 0, for any given T, P is greater for smaller values
of ? .
Output?:
B = best solution.
time = Total time for calculating the approximate best
solution.
1) Choose a random city s and define T0 and ?
2) Create a new state s? by randomly swapping two cities in
city matrix
3) Compute ? = C(s)
C(s?) ? C(s)
a. If ? <= 0, then s = s? b. If ? > 0, then assign s = s? with probability
P( ?, T ) k
i. Compute Tk+1 = ?Tk and increment k
4) Repeat the steps 2 and 3 keeping track of the best solution
until terminating conditions are met.
Terminating Conditions: The algorithm terminates once we
reach the goal cost and specified minimum temperature.
Terminating condition plays a very important in simulated
annealing since we need to stop as soon as we reach the global
2
optimum. Hence, selection of cooling factor and initial
temperature is very important.
We have implemented parallel TSP using Simulated
Annealing approach with MPI as the communication protocol
between the processes. The algorithm for the parallel version
is as below:
Parallel SA Algorithm for TSP
Input?: The input remains the same as serial implementation
only the computation and communication is in MPI channels.
1) Initialize MPI
2) Choose a random city s and define T0 and ?
3) Create a new state s? by randomly swapping two cities in
city matrix
4) Selection of root process and division of workload
if(Process_Id == 0):
//this is a master process
MPI_Scatter(city_distance,size_per_process,MPI_INT,weight
s_per_process,size_per_process,MPI_INT,0,MPI_Comm_wor
ld);
5) Compute ? = C(s)
C(s?) ? C(s)
a. If ? <= 0, then s = s? b. If ? > 0, then assign s = s? with probability
P( ?, T ) k
i. Compute Tk+1 = ?Tk and increment k
6) Gather local minima calculated in each process back to the
root.
MPI_Gather(min_weight, cities_per_process, MPI_INT,
global_min_weight, cities_per_process, MPI_INT, 0,
MPI_Comm_world);
7) Pick the best minimum solution among all the gathered
local minima from the worker processes.
Terminating conditions remains the same as serial
implementation in each process.
The time consumed to calculate the best path in
parallel implementation is dependent on the number of
available processes in the cluster. As the number of processes
increases the time consumed decreases. Once, processes reach
the threshold value the expected time increases due to more
communication between the processes. Hence, optimal
selection of processes is vital.
V. Discussion
Consider the fig. 1 15 which shows two local
minima A and B. B is the global maxima in this case and our
goal is to reach the global maxima and avoid getting stuck at
local maxima A. If we place a ball randomly at any point in
the landscape and allow it to travel only downhill, has equal
probabilities of reaching A or B. To increase the probability of
the ball to reach B, we can use shaking to simulate the
temperature in the entire process of reaching the goal.
fig 1. Energy landscape
Shaking helps the ball to cross from A to B. But we
need to make sure that the shaking is not so furious that it goes
in the opposite direction, further away from the global
maxima. So the intensities of shaking are large in the initial
stages and as we reach towards the goal it is reduced. This will
increase the probability of the ball to reach from A to B and
reduce the probability from B to A. We are using the same
concept in choosing our parameters for implementing the
parallel version of the algorithm.
We choose initial temperature T = , where 0 ?Emax
?Emax is the maximum difference in the cost that exists
between any two neighbouring tours. This helps us to ensure
that all the possible paths are explored until the temperature
reduces to a final low temperature. We choose the final
temperature very low but not as zero, to omit the chances of
accepting a worse solution. We select the suitable low
temperature such that the algorithm reaches a state where it
can no longer accept a better or worst solution.
The cooling rate should allow the algorithm to run
enough number of iterations at each temperature to help
stabilise the system. So we choose to reduce the temperature
linearly with the help of a cooling constant. In our algorithm,
we choose less number of iterations at higher temperatures
and more number of iterations at lower temperatures. This is
important to completely explore the local optimums and not
confuse it with global optima.
VI. Demonstration System Architecture
We tested both the serial and parallel versions of our
code on IBM MINSKY cluster, also called as Bluespeed.
Bluespeed is built on next-gen IBM 200 Petaflops summit
processor with two nodes of IBM Power S822LC. .It
integrates with dual power 8 plus processors with 10 cores at
3.0 GHz.
It has the capacity of 30 TERABYTES of Flash
RAM whereas each node has 5 TERABYTE internal storage,
1 TB of RAM. CPU architecture is 64-bit POWER8 with 2
Socket(s), 10 Core(s) per socket and 8 Threads per core. Total
Processor to memory bandwidth is 115 GB/sec.19
3
VII. Analysis and Results
Our experiments show that initially when the number
of processes is one, the time taken by the serial and parallel
algorithm is almost the same. But as we increase the number
of processes, the execution time of the parallel algorithm
significantly reduces.
To analyze the overall complexity and compare the results for
both serial and parallel version, we used weak scaling and
Strong scaling.
In weak scaling, we compare both serial and parallel version
for the different number of inputs and then we compare their
running time. In our case, to vary the input, we increase the
total number of cities to both the program. For parallel
version, we fixed the total number of processes to 8 and then
we analyze the results.
fig 2. Weak scaling
From fig 2, it can be inferred that total running time reduces
significantly for the parallel version. When the number of
cities increases above 15,000 the speedup is almost 3 to 4
times.
In strong scaling, we analyze the parallel version of the
program, keeping the total number of cities constant to 5000.
Then we compare the total running time of the program by
increasing the total number of processors.
fig 3.Strong Scaling
Fig. 3 demonstrates that the overall running time reduces
significantly for the parallel program when we increase the
total number of processes. The reduction in total running time
is constant as we approach higher number of processes, but
running time increases when the total number of processes
equals 20. This is primarily due to the reason that as the
number of processes increases, there is an overhead of
interprocess communication.
IX. Conclusion
In this paper, we have presented and implemented
both serial and parallel approaches to solve TSP using
Simulated annealing. We have speed up the execution of
whole simulated annealing algorithm by using MPI for
parallel communication. We have designed the MPI structure
such that it will be easily replaced with any other parallel
communication interface. We have shown the analysis and
performance comparison of serial and parallel
implementation. We have illustrated that with equal
distribution of city groups and proper initial values for
simulated annealing, the time required for generating the best
tour is proportional to the number of processes in the cluster.
Our experimental results on Bluespeed cluster have shown
that our parallel TSP algorithm is very fast compared to other
approaches.
Acknowledgement
We are grateful to Dr. Alan Sherman for giving us
timely feedback and helpful comments.
4
REFERENCES
1
P. J
a
i
l
l
e
t. P
r
o
b
a
b
i
l
i
s
t
i
c
T
r
a
v
e
l
i
n
g
S
a
l
e
s
m
a
n
P
r
o
b
l
e
m
s. P
h. D. Dissertation, MIT, Cambridge MA, USA., 1985.
2
J. B
r
e
c
k
l
i
n
g
,
E
d.,
T
h
e
A
n
a
l
y
s
i
s
o
f
D
i
r
e
c
t
i
o
n
a
l
T
i
m
e
S
e
r
i
e
s: D.J. Bertsimas, L. Howell. Further Results on the Probabilistic Traveling Salesman Problem. European Journal of Operational Research 65, 68–95. 1993.
3
N
e
i
l
l
E. B
o
w
l
e
r
,
T
h
o
m
a
s
M. A. F
i
n
k
,
a
n
d
R
o
b
i
n
C. B
a
l
l. Characterization of the probabilistic traveling salesman problem. Physical Review E, 68:036703, 2003.
4
M. J
ü
n
g
e
r
,
G. R
e
i
n
e
l
t
,
a
n
d
G. R
i
n
a
l
d
i
,

T
h
e
t
r
a
v
e
l
i
n
g
s
a
l
e
s
m
a
n
p
r
o
b
l
e
m
,

i
n
N
e
t
w
o
r
k
M
o
d
e
l
s
,
M. B
a
l
l
,
T. M
a
g
n
a
n
t
i
,
C. M
o
n
m
a
,
a
n
d
G. N
e
m
h
a
u
s
e
r
,
E
d
s. A
m
s
t
e
r
d
a
m
,
T
h
e
N
e
t
h
e
r
l
a
n
d
s: N
o
r
t
h

H
o
l
l
a
n
d
,
1
9
9
5
,
p
p. 2
2
5

3
3
0.
5
D
e
l
i
n
,
L
u
o.,
L
i
x
i
a
o
,
Z
h
a
n
g.,
Z
h
i
h
u
i
,
X
u. (
A
u
g
u
s
t
,
2
0
1
1
). ”
H
e
u
r
i
s
t
i
c
s
i
m
u
l
a
t
e
d
a
n
n
e
a
l
i
n
g
g
e
n
e
t
i
c
a
l
g
o
r
i
t
h
m
fo
r
T
r
a
v
e
l
l
i
n
g
S
a
l
e
s
m
a
n
P
r
o
b
l
e
m
“. R
e
t
r
i
e
v
e
d
o
n
O
c
t
o
b
e
r
0
8
,
2
0
1
7
6
L. B
i
a
n
c
h
i. A
n
t
c
o
l
o
n
y
o
p
t
i
m
i
z
a
t
i
o
n
a
n
d
l
o
c
a
l
s
e
a
r
c
h
fo
r
t
h
e
p
r
o
b
a
b
i
l
i
s
t
i
c
t
r
a
v
e
l
i
n
g
s
a
l
e
s
m
a
n
p
r
o
b
l
e
m: a
c
a
s
e
s
tu
d
y
i
n
s
t
o
c
h
a
s
t
i
c
c
o
m
b
i
n
a
t
o
r
i
a
l
o
p
t
i
m
i
z
a
t
i
o
n. P
h
D
t
h
e
s
i
s
,
U
n
i
v. L
i
b
r
e
d
e
B
ru
x
e
l
l
e
s
,
B
ru
s
s
e
l
s
,
B
e
l
g
i
u
m
,
2
0
0
6.
7
J
e
o
n
g
,
C.S.,
K
i
m
,
M.H. (
n.d
). ”
F
u
l
l
P
a
r
a
l
l
e
l
S
i
m
u
l
a
t
e
d
A
n
n
e
a
l
i
n
g
fo
r
T
r
a
v
e
l
l
i
n
g
S
a
l
e
s
m
a
n
P
r
o
b
l
e
m
“. R
e
t
r
i
e
v
e
d
o
n
O
c
t
o
b
e
r
0
8
,
2
0
1
7
8
L
i
,
Y
a
n
g.,
Z
h
o
u
,
A
i
m
i
n.,
Z
h
a
n
g
,
G
u
i
x
u. (
2
0
1
1
). ”
S
i
m
u
l
a
t
e
d
A
n
n
e
a
l
i
n
g
w
i
t
h
P
r
o
b
a
b
i
l
i
s
t
i
c
N
e
i
g
h
b
o
rh
o
o
d
fo
r
T
r
a
v
e
l
l
i
n
g
S
a
l
e
s
m
a
n
P
r
o
b
l
e
m
s
“.
9
L
i
u
,
X
i
a
oj
u
n.,
Z
h
a
n
g
,
B
i
n.,
D
u
,
F
a
n
g
y
i
n
g. (
2
0
1
4
). ”
I
n
t
e
g
r
a
t
i
n
g
R
e
l
a
t
i
v
e
C
o
o
r
d
i
n
a
t
e
s
w
i
t
h
S
i
m
u
l
a
t
e
d
A
n
n
e
a
l
i
n
g
t
o
S
o
l
v
e
a
T
r
a
v
e
l
l
i
n
g
S
a
l
e
s
m
a
n
P
r
o
b
l
e
m
“.
10
M
a
l
e
k
,
M
i
r
o
s
l
a
w.,
G
u
ru
s
w
a
m
y
,
M
o
h
a
n.,
P
a
n
d
y
a
,
M
i
h
i
r. (
1
9
8
9
). ”
S
e
r
i
a
l
a
n
d
P
a
r
a
l
l
e
l
S
i
m
u
l
a
t
e
d
A
n
n
e
a
l
i
n
g
a
n
d
T
a
b
u
s
e
a
r
c
h
A
l
g
o
r
i
t
h
m
s
fo
r
T
r
a
v
e
l
l
i
n
g
S
a
l
e
s
m
a
n
P
r
o
b
l
e
m
s
“.
11
Ru
t
e
n
b
a
r
,
R
o
b.A. (
n.d
). ”
S
i
m
u
l
a
t
e
d
A
n
n
e
a
l
i
n
g
A
l
g
o
r
i
t
h
m
s: A
n
O
v
e
r
v
i
e
w
“. R
e
t
r
i
e
v
e
d
o
n
O
c
t
o
b
e
r
0
8
,
2
0
1
7
12
R
aj
e
s
h
M
a
t
a
i
,
S
u
ry
a
S
i
n
g
h
a
n
d
M
u
r
a
r
i
L
a
l
M
i
t
t
a
l
(
2
0
1
0
). T
r
a
v
e
l
i
n
g
S
a
l
e
s
m
a
n
P
r
o
b
l
e
m: a
n
O
v
e
r
v
i
e
w
o
f
A
p
p
l
i
c
a
t
i
o
n
s
,
F
o
r
m
u
l
a
t
i
o
n
s
,
a
n
d
S
o
l
u
t
i
o
n
A
p
p
r
o
a
c
h
e
s
,
T
r
a
v
e
l
i
n
g
S
a
l
e
s
m
a
n
P
r
o
b
l
e
m
,
T
h
e
o
ry
a
n
d
A
p
p
l
i
c
a
t
i
o
n
s.
13
S
u
r
,
S
a
y
a
n
t
a
n.,
K
o
o
p
,
M
a
t
t
h
e
w.,
P
a
n
d
a
,
D
h
a
b
a
l
e
s
w
a
r. (
2
0
0
6
). “High-performance and scalable MPI over InfiniBand with reduced memory usage: an in-depth performance analysis”.
14
X
u
Z
h
i
h
o
n
g.,
S
o
n
g
B
o.,
G
u
o
Y
a
n
y
a
n. (
2
0
0
9
). ”
U
s
i
n
g
S
i
m
u
l
a
t
e
d
A
n
n
e
a
l
i
n
g
a
n
d
A
n
t
C
o
l
o
n
y
H
y
b
r
i
d
A
l
g
o
r
i
t
h
m
t
o
S
o
l
v
e
T
r
a
v
e
l
l
i
n
g
S
a
l
e
s
m
a
n
P
r
o
b
l
e
m
“.
15
G.E. H
i
n
t
o
n
a
n
d
T.J. S
ej
n
o
w
s
k
i
,
L
e
a
rn
i
n
g
a
n
d
r
e
l
e
a
rn
i
n
g
i
n
B
o
l
t
z
m
a
r
m
m
a
c
h
i
n
e
s
,
i
n: P
a
r
a
l
l
e
l
D
i
s
t
r
i
b
u
t
e
d
P
r
o
c
e
s
s
i
n
g
,
V
o
l. 1
(
M
I
T
P
r
e
s
s
,
1
9
8
6
).
16

A
p
p
l
i
c
a
t
i
o
n
s
o
f
C
o
m
b
i
n
a
t
o
r
i
a
l
o
p
t
i
m
i
z
a
t
i
o
n
,

t
a
l
k
a
t
t
h
e
1
3
t
h
I
n
t
e
rn
a
t
i
o
n
a
l
M
a
t
h
e
m
a
t
i
c
a
l
P
r
o
g
r
a
m
m
i
n
g
S
y
m
p
o
s
i
u
m
,
T
o
ky
o
,
1
9
8
8
17
D
a
v
i
d
S. J
o
h
n
s
o
n
,
L
y
l
e
A. M
c
G
e
o
c
h
(
1
9
9
5
). ”
T
h
e
T
r
a
v
e
l
i
n
g
S
a
l
e
s
m
a
n
P
r
o
b
l
e
m: A
c
a
s
e
s
tu
d
y
i
n
L
o
c
a
l
O
p
t
i
m
i
z
a
t
i
o
n
18

T
r
a
v
e
l
l
i
n
g
S
a
l
e
s
m
a
n
P
r
o
b
l
e
m

(
n.d
). R
e
t
r
i
e
v
e
d
fr
o
m
W
i
k
i
p
e
d
i
a
19

C
H
M
P
R
L
A
B
a
t
U
M
B
C

h
t
tp
s://
c
h
m
p
r.u
m
b
c.e
du
/b
l
u
e
s
p
e
e
d

5