Discussion:
[R-sig-hpc] Rmpi and mpirun
Vincent Boucher
2018-07-11 20:31:33 UTC
Permalink
Hi,

I'm running into an issue using Rmpi with Open MPI on a beowulf cluster.
The installation of the package went without any issue. I have done the
following:
'mpirun -n 1 --hostfile hostfile.txt R --interactive'
then, 'library(Rmpi)

when I do 'ns <- mpi.universe.size()' I get ns=12, which is what it is
supposed to be. However, 'mpi.spawn.Rslaves(nslaves=ns)' fails and I get
the "not enough slots available..." message.
It looks like when R opens, the nodes are already up and running (due to
the mpirun) so mpi.spawn fails... I've tried to launch R directly (without
mpirun) but then, I only get 1 node...

Am I missing something?

many thanks

Vincent

[[alternative HTML version deleted]]
Ei-ji Nakama
2018-07-13 05:17:46 UTC
Permalink
Hello,

When you debug the OpenMPI process...
Read the result of the following command
$ ompi_info --param btl base --level 9

Maybe first time...try following command
$ mpirun --mca btl_base_verbose 40 -np 1 R --interactive
----<write script>----

Debugging parameter file can also be written below
$ mkdir -p ~/.openmpi
$ echo "btl_base_verbose = 40" > ~/.openmpi/mca-params.conf
Post by Vincent Boucher
Hi,
I'm running into an issue using Rmpi with Open MPI on a beowulf cluster.
The installation of the package went without any issue. I have done the
'mpirun -n 1 --hostfile hostfile.txt R --interactive'
then, 'library(Rmpi)
when I do 'ns <- mpi.universe.size()' I get ns=12, which is what it is
supposed to be. However, 'mpi.spawn.Rslaves(nslaves=ns)' fails and I get
Since you have already started the MPI master process,
`mpi.universe.size() - 1'
will be the number of slaves that can be activated.
Post by Vincent Boucher
the "not enough slots available..." message.
It looks like when R opens, the nodes are already up and running (due to
the mpirun) so mpi.spawn fails... I've tried to launch R directly (without
mpirun) but then, I only get 1 node...
See below, orte_default_hostfile
ompi_info --params orte all --level 9

c.f.
$ echo 'orte_default_hostfile = "~/hostfile.txt"' >>
~/.openmpi/mca-params.conf

For the host file format, refer to the following.
https://www.open-mpi.org/doc/v3.0/man7/orte_hosts.7.php
Post by Vincent Boucher
Am I missing something?
many thanks
Vincent
[[alternative HTML version deleted]]
_______________________________________________
R-sig-hpc mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
Best Regards,
--
Eiji NAKAMA <nakama (a) ki.rim.or.jp>
"\u4e2d\u9593\u6804\u6cbb" <nakama (a) ki.rim.or.jp>
Vincent Boucher
2018-07-13 13:06:02 UTC
Permalink
Hi,

thanks for the suggestion. Nothing I could find informative in the debug
output (or at least, that I could interpret). However, it made me think of
playing around with the options a bit... I think I found the issue (i.e. it
works at the moment!). I'm not sure why, but I post it here in case someone
has a similar issue.
I didn't discuss the setup, but I have a beowulf type cluster (a bunch of
old computers linked through Ethernet cables). The thing is that open mpi
first tries to find infiniband connections first (which obviously, I don't
have). Normally this is not an issue (at least it isn't for my Fortran90
codes) but it seems to screw with Rmpi... running the -mca btl ^openib as
in : mpirun -n 1 -mca btl ^openib --hostfile hostfile.txt R --interactive
Solves the problem...
It is weird since R, or mpirun, does not really issue any error related...

anyway, many thanks !

Vincent
Post by Ei-ji Nakama
Hello,
When you debug the OpenMPI process...
Read the result of the following command
$ ompi_info --param btl base --level 9
Maybe first time...try following command
$ mpirun --mca btl_base_verbose 40 -np 1 R --interactive
----<write script>----
Debugging parameter file can also be written below
$ mkdir -p ~/.openmpi
$ echo "btl_base_verbose = 40" > ~/.openmpi/mca-params.conf
Post by Vincent Boucher
Hi,
I'm running into an issue using Rmpi with Open MPI on a beowulf cluster.
The installation of the package went without any issue. I have done the
'mpirun -n 1 --hostfile hostfile.txt R --interactive'
then, 'library(Rmpi)
when I do 'ns <- mpi.universe.size()' I get ns=12, which is what it is
supposed to be. However, 'mpi.spawn.Rslaves(nslaves=ns)' fails and I get
Since you have already started the MPI master process,
`mpi.universe.size() - 1'
will be the number of slaves that can be activated.
Post by Vincent Boucher
the "not enough slots available..." message.
It looks like when R opens, the nodes are already up and running (due to
the mpirun) so mpi.spawn fails... I've tried to launch R directly
(without
Post by Vincent Boucher
mpirun) but then, I only get 1 node...
See below, orte_default_hostfile
ompi_info --params orte all --level 9
c.f.
$ echo 'orte_default_hostfile = "~/hostfile.txt"' >>
~/.openmpi/mca-params.conf
For the host file format, refer to the following.
https://www.open-mpi.org/doc/v3.0/man7/orte_hosts.7.php
Post by Vincent Boucher
Am I missing something?
many thanks
Vincent
[[alternative HTML version deleted]]
_______________________________________________
R-sig-hpc mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
Best Regards,
--
Eiji NAKAMA <nakama (a) ki.rim.or.jp>
"\u4e2d\u9593\u6804\u6cbb" <nakama (a) ki.rim.or.jp>
[[alternative HTML version deleted]]

Loading...