AGENIUM SCALE | etp4hpc

Back

Agenium Scale is an innovative company located at the Plateau de Saclay in France (European Silicon Valley) that provides software solutions for high performance computing and complex systems. Its expertise lies in various business areas as well as in the entire software development chain and includes knowledge of processors and computation architectures. At Scale we work in all business areas that require computations: automotive, space, rail, finance, telecoms, aeronautics, defence.

Computations are found in many technical areas such as:

Image processing: Agenium Scale's engineers can design from scratch image processing algorithms. We can also optimize existing algorithms by taking advantage of parallalism levels provided by modern processors.
Predictive maintenance of systems composed by tens or hundreds of subsystems requires well designed algorithms in order to analyze as quickly as possible and in real time the amount of data coming from these subsystems. Agenium Scale can help you in this matter and can apply it to several businesses.
Machine learning algorithms can be found in many businesses and give really good results. Agenium Scale's specialty in this area is to optimze neural networks inference and more generally machine learning algorithms to speedup their executions and to execute them in embedded systems.
Complex systems: A lot of businesses require specific hardware that must be connected to a computation system. The interaction of all these components within an HPC system is difficult. Agenium Scale's expertise will allow you to design and industrialize such a solution.
Real time: A increasing number of systems need to be real time. This can be for security or technical reasons. In any case it is necessary to optimize softwares that run on such systems. The expertise of Agenium Scale's engineers in HPC can be used in this domain in order to make your algorithm real time.

We propose a wide variety of services including:

Code Modernization and refactoring: We can modernize your C++98 or Python 2 code to use the last versions of those languages (C++14, C++17, Python 3). This allows you to revisit its data structures and its API in order to use the features of the standard libraries of those languages. This also allows you to use the last copmilers in order to take advantage of their optimization passes.
Diagnosis and optimization of existing software: We analyze your code to establish a diagnostic and offer recommandations. This can be followed by optimizations at high level (data structures, serialization) but also at low level (MPI, SIMD, mult-threading).
Porting source code from one language to another: We translate code which can come from a prototype in order to make it more robust and increase its running times. For example we can convert MATLAB or Python into C/C++.
Custom software and applications: We start from scratch and make a custom software to answer your needs. This software can be stand-alone, be part of a bigger system, communicate with standard and home-made hardware, be in embedded systems.

We also write libraries to help HPC engineers write well optimized programs. You can check us out on Github. Our main library is NSIMD. It is a vectorization library that abstracts SIMD programming. It was designed to exploit the maximum power of processors at a low development cost. NSIMD provides C89, C++98, C++11 and C++14 APIs. All APIs allow writing generic code. Binary compatibility is guaranteed by the fact that only a C ABI is exposed. The C++ API only wraps the C calls.

The list of supported SIMD instruction sets follows:

Intel:
- SSE 2
- SSE 4.2
- AVX
- AVX 2
- AVX-512 as found on KNLs
- AVX-512 as found on Xeon Skylake CPUs
Arm
- NEON 128 bits as found on ARMv7 CPUs
- NEON 128 bits as found on Aarch64 CPUs
- SVE

Support for the following architectures is on the way:

NVIDIA
- CUDA
- HIP
AMD GPUs
- HIP
POWERPC
- VSX
- VMX

A part of the library is open sourced on github (https://github.com/agenium-scale/nsimd) and can be downloaded and tested at will thanks to its MIT license.

A small part of it is made of a proprietary binary at the price of 49.90 €/user and can purshased at https://store.agenium-scale.com/en/. It contains among other:

trigonometric functions
inverse trigonometric functions
hyperbolic functions
inverse hyperbolic functions
exponentials
logarithms

We have put NSIMD into GROMACS to demonstrate its potential. GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is heavely used in the HPC community to bench super computers and has became a reference in this area.

As GROMACS is already a fully optimized software our goal is too obtain similar running times and we do! It also prooves the claims of NSIMD, namely low development cost for high performences and portability. We have replaced nearly 11000 lines of GROMACS code by 4700 lines of NSIMD code.

We also work for the french Army and use NSIMD as the base library for our neural network inference engine. Its C++ API allows us to write all layer kernels once and have better performances than Caffe on Intel Workstations and Arm mobile devices (such as smartphones). We speed-up neural networks using quantizations and fixed-point arithmetic which are all supported by NSIMD.

We encountered several times very well optimized code written for one specific CPU using its vendor specific API. This situation becomes a problem when upgrading hardware even from the same vendor. A lot of people buy newer Xeon which are AVX-512 capable but have written their code for old AVX-capable only Xeons. That's when our translator comes into play. It is a clang-based program that takes your C/C++ code as input and chases down all vendor specific code. The output is C/C++ code whose calls to vendor APIs have been replaced by portable NSIMD code. This program saves you roughly 80% of translation time. The resulting code is then portable and uses the last SIMD capabilities.

SEE MEMBER