Anaconda 5.0 with Intel Distribution for Python



Those wishing to try the new anaconda environments can do so using these commands from login console or job scripts:

module load anaconda/5.0.0_py3 # for Python 3.6+ environment


module load anaconda/5.0.0 # for Python 2.7+ environment

Those who use simply “module load anaconda”, please be aware that doing so always loads the latest version of the module. In this case, it will load the current anaconda/5.0.0_py3 instead of anaconda/4.2.0_py3. This is the reason we strongly discourage the use of such syntax without module version numbers at the end.

The new anaconda environments are made as close to the earlier ones as possible in terms of the packages available. However, there are some improvements to the set-up of the new modules as described below.

The high-performance Intel Distribution for Python environment

The new anaconda modules now come with an additional new environment containing Intel Distribution for Python (IDP). While MKL math libraries are widely known and are part of the default Anaconda distribution for a while now, the IDP contains much additional development (such as the pyDAAL library) and other performance optimized builds by Intel engineers. A very informative introduction can be found in this presentation.

You can source activate idp environment from both – the Python 2.7 and Python 3.6 anaconda modules. An example for Python 3.6 is below:

module load anaconda/5.0.0_py3 # module with Python 3.6
source activate idp # Intel Distribution for Python

When you do this for the first time, please test that your code is compatible with the libraries in this environmentThere is a fair amount of changes that have happened in the core libraries of the python ecosystem (numpyscipypandasscikit-learn etc.) and to the newest version of Python 3. Additionally, the IDP environment contains an optimized Intel build of the Python language interpreter itself. Once you have ensured that your software works with the library versions in IDP environment, you may consider using the idp environment as the default for maximum performance. If you are starting a new python project you may also consider developing it in this environment. Unlike the Intel® ParallelStudIo XE suite which is a paid offering, the Intel Distribution for Python is part of Anaconda and is distributed for free.

The software you develop using Intel's high-performance builds of python libraries such as numpy, scipy and others, should be compatible with non-Intel builds of corresponding packages. However, if you decide to use libraries developed principally by Intel, such as DAAL or TBB, please check that their licensing model is suitable for your project. 

Concluding remarks

You may wonder why you have to do the additional “source activate idp” after “module load anaconda/5.0.0_py3”. While the efforts by Intel are an important contribution to the Python community, there are other considerations why this is not the default environment. First, it takes Intel engineers time to rebuild and test the most recent versions of popular libraries and the Intel distribution seems to track some important library releases with a delay. For example – scikit-learn 0.19.0 was released in July, but the Intel build for this appeared in November. It has already been superseded by another minor scikit-learn release. Second, there may still be minor compatibility issues lurking somewhere in the Intel distribution as it is quite new. For example, the Intel build of the Python interpreter itself cannot use GNU readline library due to readline being GPL licensed. It will also refuse to import it. To ensure the best compatibility, root Anaconda environment includes the default Python interpreter build with readline and the most recent versions of scipy, scikit-learn and pandas from the default channels. The root environment uses the Intel versions of libraries where possible. For example, it uses the Intel numpy and MKL dependencies for maximum performance. In turn, the idp environments use Intel builds as the first priority so that Anaconda defaults and conda-forge channels are used only where an Intel version is not available.

Finally, the package hieroglyph is no longer available for Python 3.6+ and is excluded from the anaconda/5.0.0_py3 module. Due to the bad maintenance shape of this package, its users may wish to look for alternatives, but it is still available in anaconda/5.0.0 with Python 2.7 and earlier anaconda/4.2.0 modules.

Please write to cluster-support at pik-potsdam dot de if you require any further changes to the Anaconda packages and the corresponding IDP environments.