Coding & Optimisation

Richard has a long history in coding highly efficient implementations of algorithms, as well as taking pre-existing algorithms and refactoring them to improve their performance. Optimisation can usually be achieved to boost performance across multiple platforms, but is more frequently done for specific target processors/platforms.

We have experience optimising on x86 processors, as well as embedded processors and media processors (such as the NXP Nexperia range of devices). Some examples of the type of algorithms that we have optimised are:

  • digital video decoders (MPEG2, MPEG4 ASP and H.264),
  • user interfaces and user interface management systems,
  • 2D and 3D graphics libraries,
  • image processing techniques,
  • system and hardware drivers (e.g. hard disk, IO, network and TCP/IP).

The techniques of optimisation will depend on the particular problem and target platform, but there are some general catagories of approach that can be employed:

Algorithm analysis
With any optimisation it is important to really understand the inner workings of the system - including the inputs, outputs and the mathematical background. This can lead to a number of different optimisation techniques being utilised, for example more efficient core algorithm implementations and interim result caching.
Empirical analysis
When optimising it is imperative to have empirical measurements of the system. When optimising there are often surprises where unlikely parts of the system take an unexpectedly long time to complete. It is obvious, but by targetting these parts first a substantial increase in performance can usually be affected.
Platform targetting
Key to optimising on embedded and media processors and DSPs is understanding the underlying platform - including its instruction set, memory system and development tools. By utilising the advantages of the platform (e.g. any SIMD operations), whilst avoiding the pitfalls (e.g. strange cache behaviour) an optimal solution can be found.
Non-computational & hardware optimisation
Often optimisation is considered for the mathematical and computational parts of the system. However, in real systems time is often wasted waiting for hardware: be it waiting for the memory system to refill cache lines, waiting for IO or dedicated hardware. Often, by intelligently scheduling the utilisation of hardware this wasted time can be minimised.
Multi-core, multi-threading
Of increasing importance as modern processor ICs are given multiple cores is the ability to spread an algorithm across multiple cores. In these cases, the traditional optimisation techniques have to be combined with an understanding of how an algorithm can be split across multiple cores, along with the particular characteristics of multi-processing on the target platform.
Offload to dedicated hardware
In some systems the required speed cannot be reached using a general-purpose processor on its own. Often a worrying concept, these days there are actually a few ways to proceed in offloading the processor. It may be that an FPGA is the best way to go, or perhaps a modern graphics card (using the amazing power in the GPU).