How to configure ANSYS RSM v18 with a PBS Pro cluster

ANSYS v18 has recently been released bringing a couple of interesting new features and as a fervent user of ANSYS Fluent, it seems this new version brings a lot of improvements on this side allowing a complete progression from the CAD to mesh to the simulation through the same interface with a much stronger Fluent Mesher (formerly known as Tgrid).

On the HPC side, the major change for the RSM configuration is the apparition of ARC: ANSYS RSM Cluster, bringing the ability to create a real cluster for distributed jobs. Previously, the RSM Service was split in two components the RSM Manager and the RSM Server and consisted of sending jobs to one of the RSM Servers through the RSM Manager but it was not possible to spawn multiple servers at the same time, a capability that required a third-party scheduler (Windows HPC, PBS, LSF, …). ARC seems to be here to tackle this issue and suppress the need for another scheduler. As you will have understood, ANSYS has decided to tackle a new sector with its own job scheduler/workload manager.

Some of those changes include:

  • A single service

With the new V18, ANSYS suppressed the RSM Server services. Now the RSM Manager allows a client to directly communicate to a workload manager/job scheduler, whether it is ANSYS ARC or a third party. A client communicates from one machine to a remote RSM Manager service submitting the job to the Scheduler, or communicate with a local RSM Manager to a remote scheduler through other communication means (i.e. SSH).

  • No more RSM Application :{

Reading the “Important changes for R17 Users” inside the RSM User’s Guide v18, my eyes stopped on this:

The RSM configuration application no longer provides a standalone job monitoring interface. Job monitoring is now embedded within Workbench and EKM.

The loss of a standalone RSM window/application should raise some concerns among ANSYS users!  Starting a couple of versions ago, ANSYS added a new window inside Workbench (a ‘job’ window) which basically was just retrieving the same information as the RSM app but the info was still available in the RSM Standalone application, allowing a user to check the status of a job without having to open the not-so-small Workbench application! In V18, the standalone application does not exist anymore. Why !!?

Edit: it seems ANSYS corrected their mistake with version 18.1 and brought back the RSM Job Monitoring standalone app. The only issue is that it can only show jobs for the current user and machine only!
  • The new RSM Configuration Wizard

A positive point is the addition of this new wizard to configure the RSM cluster:

  rsm.png

Figure 1. The new RSM Wizard

 

The little computer screen inside the main window seems to indicate that this transition to a new RSM is (maybe) laying the foundations for a new interface that should come in the next versions…? 

Using this new wizard, something convenient was the ability to directly configure the RSM queues from the client machine instead of the machine where the RSM manager is installed, allowing each user to have its own configuration stored on his own computer. Moreover, if you connect to a third-party scheduler, on the last step you will be delighted by the fact that all the scheduler queues will be automatically passed on to the RSM.

 

Even if that could be convenient, IT managers could also find it useful to have a centrally managed RSM configuration to share to all users instead of configuring each computer one by one. The following steps will show you how to configure the RSM v18 with a Remote Cluster running PBS Pro under Linux and share the configuration with other users.

 

Step-by-step

On the server machine

  1. Install the RSM Manager service
  • Connect to your Linux/Windows machine and open an Terminal/Command Prompt
  • Navigate to:
    • Linux: /ansys_inc/v180/RSM/Config/tools/linux/
    • Windows: C:\Program Files\ANSYS Inc\v180\RSM\Config\tools\windows

(or your custom chosen folder for installation)

  • Type:

./rsmconfig -mgr

A service will be installed (inside /etc/init.d for Linux or a Windows service) and configured to autostart at boot.

  1. Share your Staging/Scratch folder through Samba/Windows share.

Notice that the V18 does not use the former RSM_Mgr and RSM_Cs shares. You can now specify the name of your staging share. In our case, we use the same folder for both to limit the number of replications.

 

On a client machine

1. Create a shared folder on the client machine or a centrally available path.

For example a NAS or a datacenter. Here we will use:

\\nas\RSM\

2. Point your configuration to this new folder

Open a command prompt and type:

“C:\Program Files\ANSYS Inc\v180\RSM\bin\rsm” appsettings set JobManagement ConfigurationDirectory \\nas \RSM\

You should now see a new folder inside your shared folder named ANSYS with a full path leading to RSM\Ansys\v180\RSM\ with inside a *.rsmcc file for each configured cluster.

 

3. Add a new cluster
  1. Start the RSM Configuration Wizard
  2. Add the name of the remote cluster, the type of scheduler (PBS) and input a name:

 

name.png

Figure 2. Add a new cluster 

Once you click on Apply, if it works you should be able to go to “File Management”

 

  1. Decide on the transfer type

In most cases, you will choose as a “Client-to-Cluster File Management” the “OS file transfer” method: transferring to the staging directory through a network share. More secure environment could set up a SCP transfer but will lose in transfer speed.

  • As “Cluster staging network share” and “directory”, enter your staging directory, in our case /scratch

files.png

Figure 3. File management 

N.B.: as the network share, I would suspect that a UNC path representing the staging share would work but it leads to an error when starting the job

“UNC path \\machinename\scratch\9dw5rij7.qvb passed to Linux machine.”

It seems the network share is passed as the working directory… For now, use the linux path for both until resolved.

Edit: v18.1 corrected this issue and you can now use a UNC path as a network share. The only thing to note is that if you use a sub-folder inside the shared folder, you only have to provide it on the UNC path and not on the cluster directory path, e.g.:

Cluster staging network share: \\machinename\scratch\my-sub-scratch

Cluster staging directory: /scratch

 

  • In the “Cluster Side File Management” section, you will decide if you want the headnode to copy the files inside a local working directory onto the node(s) selected to run your job. This type of behavior depends of the type of job you want to run:

> For structural/FEA jobs, a lot of files can be written locally and I/O is crucial so a replication on a local share can be useful for direct access.

> For CFD jobs, usually a few files are written and working inside a network share is sufficient.

If you have a solid network infrastructure, like InfiniBand or 10-Gigabit Ethernet, combining the staging directory and the scratch folder is beneficial in terms of file replication.

Validate and go to queues.

 

  1. Detect queues

If your configuration is correct, click on the “Refresh queues” button and the RSM should import all PBS queues creating a corresponding RSM queue for each one.

queues.png

Figure 4. Imported queues

 

You can now decide which queues you want to import and the corresponding name inside the RSM that you want. Try your configuration with a test job by submitting your job submission credentials.

 

If you handle files differently depending on the queue or the type of job, I recommend creating another cluster configuration with the same settings except on the file management window (for example a CFD configuration to work inside the staging directory and a FEA configuration to replicate files inside a local scratch).

Once you close this window and have saved your config, the shared folder should be populated with your new config. To allow a user to use the same configuration, execute 2.b. on each of your client machine.

Let us know your experience and your impression on this new version which brings a lot of (good/bad?) new features.

 

Good luck

Share this post

Comments (2)

  • Quentin

    Hi Ritvij, 18.1 adds a modification on the default shell used and you need to add the PBS binaries folder to the PATH. Modify the following file to add PBS folder to the RSM PATH: /ansys_inc/v182/RSM/Config/tools/linux/rsm_env_profile and add: export PATH=$PATH:/opt/pbs/bin or the correct PBS directory if yours differs.

    12/12/2017, 6:32:06 PM
  • Ritvij Vyas

    I have done the exact same steps for RSM18.1; and it does not work. It seems to not find where the PBS executables are located. it is going to /tmp to find them. It is working perfectly in 18.0

    8/7/2017, 12:47:26 PM

Leave a comment