Particle Swarm Optimization

Particle Swarm models the swarming behaviour of bees and other animals.

We maintain a swarm (population) of particles (solutions). Each particle has velocity, which is tempered over time. The particles have knowledge of the global best known solution that the swarm has discovered, and also the local best known solution that the individual particle has discovered. They are attracted (think magnetism or gravity) to each of these historical points according to weighting parameters.

The algorithm is as follows. As usual, I’ve simplified this a little for readability.  Have a look a the pseudocode on Wikipedia for a more complete description.

  1. Given fitness function “f”
  2. For each of n particles,
    1. Select a starting point xᵢ at random (uniformly distributed) within the solution space.
    2. Assign initial velocity vᵢ at random.  This is chosen almost identically to how xᵢ is chosen, except that it may also be negative in each direction. The velocity is a vector with the dimensionality of the decision space.
  3. For each particle,
    1. Update the particle position, given xᵢ = xᵢ + vᵢ 
    2. If f(xᵢ) < f(p), replace with xᵢp is the best known solution discovered by this particle.
    3. If f(xᵢ) < f(g), replace with xᵢ. g is the best known solution discovered globally by the swarm.
    4. Update the particle velocity, given vᵢ = ω vi + φp rp (pixi) + φg rg (gxi). rp and rg are random numbers between 0 and 1. ω, φp, φg are parameters chosen by the user.
  4. Continue (from step 3) until termination criteria is reached.

There were several parameters used above, I’ll define them now.

  • ω governs the reduction of particle velocity over time.
  • φp, is a weighting factor, governing how much the particle is attracted to p, the “particle best” solution.
  • φg, also a weighting factor, governs how much the particle is attracted to g, the “global best” solution.

You will have noted that the initial velocity is quite high. Particles will initially be zipping right from one side of the decision space to the other. The velocity is slowed over time according to parameter ω, favouring exploration early on, and exploitation later on.

We predict that PSO will perform poorly on problems of high decision space dimensionality, where the metaphor of velocity becomes strained.

We predict that PSO will perform poorly in a discretized solution space, where the number of outcomes in each dimension are few. Again, the metaphor of velocity becomes strained here.

Further Reading

References