Particle Swarm Optimization

Particle Swarm models the swarming behaviour of bees and other animals.

We maintain a swarm (population) of particles (solutions). Each particle has velocity, which is tempered over time. The particles have knowledge of the global best known solution that the swarm has discovered, and also the local best known solution that the individual particle has discovered. They are attracted (think magnetism or gravity) to each of these historical points according to weighting parameters.

The algorithm is as follows. As usual, I’ve simplified this a little for readability.  Have a look a the pseudocode on Wikipedia for a more complete description.

  1. Given fitness function “f”
  2. For each of n particles,
    1. Select a starting point xᵢ at random (uniformly distributed) within the solution space.
    2. Assign initial velocity vᵢ at random.  This is chosen almost identically to how xᵢ is chosen, except that it may also be negative in each direction. The velocity is a vector with the dimensionality of the decision space.
  3. For each particle,
    1. Update the particle position, given xᵢ = xᵢ + vᵢ 
    2. If f(xᵢ) < f(p), replace with xᵢp is the best known solution discovered by this particle.
    3. If f(xᵢ) < f(g), replace with xᵢ. g is the best known solution discovered globally by the swarm.
    4. Update the particle velocity, given vᵢ = ω vi + φp rp (pixi) + φg rg (gxi). rp and rg are random numbers between 0 and 1. ω, φp, φg are parameters chosen by the user.
  4. Continue (from step 3) until termination criteria is reached.

There were several parameters used above, I’ll define them now.

  • ω governs the reduction of particle velocity over time.
  • φp, is a weighting factor, governing how much the particle is attracted to p, the “particle best” solution.
  • φg, also a weighting factor, governs how much the particle is attracted to g, the “global best” solution.

You will have noted that the initial velocity is quite high. Particles will initially be zipping right from one side of the decision space to the other. The velocity is slowed over time according to parameter ω, favouring exploration early on, and exploitation later on.

We predict that PSO will perform poorly on problems of high decision space dimensionality, where the metaphor of velocity becomes strained.

We predict that PSO will perform poorly in a discretized solution space, where the number of outcomes in each dimension are few. Again, the metaphor of velocity becomes strained here.

Further Reading

References

 

Random Search

Going way back to basics today with a look at Random Search. In this simple algorithm, the solution space is sampled purely at random, with no attempt to learn from previous samples.

The approach is described as follows.

  • Given fitness function “f”
  • Select a starting point x at random within the solution space.
  • Let best = x.
  • Sample a new position x at random within the solution space.
  • If f(x) < f(best), replace best with x.
  • Continue until termination criteria is reached.

In the worst case, random search may perform worse than full enumeration. The reason for this is that this search method has no memory, and will certainly sample the same “x” multiple times.

This type of search, sometimes called “true random search” or “pure random search”, is often used as a point of reference in literature.

Variants

  • Rastrigin (1963) presents an improved variant FSSRS which samples the previous best around a hyperspace of fixed radius “r”.

Further Reading

References

Rastrigin, L.A (1963), “About Convergence of Random Search Method in Extremal Control of Multi-Parameter Systems” L.A. Rastrigin (mathnet.ru, includes PDF, russian language).

Stochastic Optimization

stochastic
/stəˈkastɪk/
adjective (technical)

Further Reading