Using pNbody in parallel
*****************************

With **pNbody**, it is possible to run scripts in parallel, using the ``mpi`` libary.
You need to have of course ``mpi`` and ``mpi4py`` installed.
To check your installation, try::

  mpirun -np 2 pNbody_mpi
  
you should get::

  This is task 0 over 2
  This is task 1 over 2

but if you get::

  This is task 0 over 1
  This is task 0 over 1

this means that something is not working correctly, and you should check your path or ``mpi`` and ``mpi4py`` installation
before reading further.

The prevous scripts ``scripts/slice.py`` can diretely be run in paralle.
This is simply obtained by calling the ``mpirun`` command::

  mpirun -np 2 scripts/slice.py gadget_z*0.dat

In this simple script, only the processus of rank 0 (the master) open the file. 
The content of the file (particles) is then distributed among all the other processors.
Eeach processor recives a fraction of the particles.
Then, the selection of gas gas particles and the slice are preformed by all processors on
their local particles. 
Finally, the ``nb.write()`` command, run by the master,  gather all particles and write the output file. 

Parallel output
========================================

With **pNbody**, its possible to write files in parallel, i.e., each task write its own file.
We can do this in the previous script simply by adding the line ``nb.set_pio('yes')``. This 
tells **pNbody** to  write files in parallel when ``nb.write()`` is called.
The content of the new scripts ``scripts/slice-p1.py`` is::

  #!/usr/bin/env python

  import sys
  from pNbody import *

  files = sys.argv[1:]

  for file in files:
    print "slicing",file
    nb = Nbody(file,ftype='gadget')
    nb = nb.select('gas')  
    nb = nb.selectc((fabs(nb.pos[:,1])<1000))
    nb.rename(file+'.slice')
    nb.set_pio='yes'
    nb.write()

We can now run it::

  mpirun -np 2 scripts/slice-p1.py gadget_z00.dat

This creates two new files::

  gadget_z00.dat.slice.1
  gadget_z00.dat.slice.0

The files have the same name than the initial name given in ``Nbody()`` with an extention ``.i`` where ``i`` 
corresponds to the processus rank. Each file contains the particles attributed to the corresponding task.


Parallel input
========================================

Now, it possible to start by reading these two files in parallel instead of asking only the master to read one file::
In our script, we add the optional argument ``pio='yes'`` when creating the object with ``Nbody()``::

    TODO: SOMETHING MISSING HERE???

Note also that we have used ``nb.set_pio('no')`` . This force at the end the file te be written only by the master.

.. code-block:: python

    #!/usr/bin/env python

    import sys
    from pNbody import *

    files = sys.argv[1:]

    for file in files:
        print("slicing",file)
        nb = Nbody(file,ftype='gadget',pio='yes')
        nb = nb.select('gas')  
        nb = nb.selectc((fabs(nb.pos[:,1])<1000))
        nb.rename(file+'.slice.new')
        nb.set_pio('no')
        nb.write()
      

When we launch it::

  mpirun -np 2 scripts/slice-p2.py gadget_z00.dat.slice
  
the two files ``gadget_z00.dat.slice.0`` and ``gadget_z00.dat.slice.1`` are read
each by one task, processed but at the end only the master write the final output : `gadget_z00.dat.slice.slice.new``.
  
  
More on parallelisme
========================================


Lets try two other scripts. The first one (``findmax.py``) try to find the radial maximum distance among 
all particles and the center. It illustrate the difference between using ``max()``
wich gives the local maximum (maximum among particles of the node) and ``mpi.mpi_max()`` 
which gives the global maximum among all particles::

  #!/usr/bin/env python

  import sys
  from pNbody import *

  file = sys.argv[1]

  nb = Nbody(file,ftype='gadget',pio='yes')
  local_max  = max(nb.rxyz())
  global_max = mpi.mpi_max(nb.rxyz())

  print("proc %d local_max = %f global_max = %f"%(mpi.ThisTask,local_max,global_max))


When running it, you should get::

  mpirun -np 2 ./scripts/findmax.py gadget_z00.dat.slice
  proc 1 local_max = 8109.682129 global_max = 8109.682129
  proc 0 local_max = 7733.846680 global_max = 8109.682129


which illustrate clearly the point. Finally, the latter script shows that even graphical 
functions support parallelisme. The script ``showmap.py`` illustrate this point by computing 
a map of the model::

  #!/usr/bin/env python

  import sys
  from pNbody import *

  file = sys.argv[1]

  nb = Nbody(file,ftype='gadget',pio='yes')
  nb.display(size=(10000,10000),shape=(256,256),palette='light')

When running ::

  mpirun -np 2 ./scripts/showmap.py gadget_z00.dat.slice

you get an image of the model. The mapping has been performed independently by two processors.