Quick Start Quick Start Guide for ARCHIE-WeSt users This guide briefly describes how to login to ARCHIE-WeSt, transfer the data, use the modules, create a job script and how to submit a job. It also gives you information about best practice of using ARCHIE-WeSt and some additional information that might be useful in practice. Login to ARCHIE-WeSt ARCHIE-WeSt has four login nodes called archie-w, archie-e, archie-s and archie-t. To log into ARCHIE-WeSt you need to have account on ARCHIE-WeSt. For log in you should use your DS username and password, and you need to specify the particular login node: 1) Terminal Access (ssh) To login to ARCHIE-WeSt via ssh (e.g. from Linux/Mac), use any of the following login nodes: archie-w.hpc.strath.ac.uk (130.159.17.167) archie-e.hpc.strath.ac.uk (130.159.17.168) archie-s.hpc.strath.ac.uk (130.159.17.169) archie-t.hpc.strath.ac.uk (130.159.17.170) For example: ssh -X username@archie-s.hpc.strath.ac.uk -X is optional and will tunnel X windows back to your desktop. You will see a table summarizing your project usage and disk usage. Note that if you are assigned to the project but your usage is 0.0 the project will not be listed From Windows, download putty . Click on the images below for instructions on how to use Putty. [nggallery id=4] 2) Graphical Desktop Session A graphical desktop session can be obtained using the ThinLinc remote desktop client (Windows/Linux/Mac). View the images below to see the suggested configuration options (click on an image to exit). [nggallery id=9] Pressing F8 from within the desktop session will give you access to the ThinLinc client options. You can “suspend” the session by simply closing the window via the “X” on the top LH corner. You can of course resume suspended sessions. However, if you have no applications running, we recommend that you log out so as to release the license. 3) Visualization Servers Use the ThinLinc remote desktop client to connect to archie-viz.hpc.strath.ac.uk. Follow the instructions above for “2) Graphical Desktop Session”, but replacing archie-login.hpc.strath.ac.uk with archie-viz.hpc.strath.ac.uk. To get the best performance use prefix all GUI commands with vglrun to ensure that your applications use the server installed graphics card. For example to run vmd type: vglrun vmd instead of simply vmd. File Systems and Data Transfer 1) File Systems ARCHIE-WeSt file system are: /home: backed-up /lustre: not backed-up; because of its’ high performance it should be used to run jobs ARCHIE-WeSt operates using soft and hard quotas on the disk space: File System Soft Quota (in GB) Hard Quota (in GB) /home 50 100 /lustre 250 500 If the soft quota is exceeded the user has 7 days to go under the soft quota, otherwise the disk will be protected from writing. The second limitation is hard quota which can not be exceeded. In well-justified cases /lustre disk allocation might be increased. 2) Data Transfer From/to Windows desktop For data transfer between ARCHIE-WeSt and Windows desktop download WinSCP . Click on the images below for instructions on how to use WinSCP. [nggallery id=10] For big data transfer connect to dm1.hpc.strath.ac.uk rather than to particular login node. Data Mover 1 (dm1) network connection is 10Gb/s while other parts of ARCHIE-WeSt file system is 1Gb/s. From/to Linux/Mac desktop From Linux/Mac desktop user cwb08102 would do: scp -pr cwb08102@dm1.hpc.strath.ac.uk:/lustre/strath/phys/cwb08102/MY_DATA . -p - preserves file attributes and timestamps -r - option to transfer the entire directory Note that cwb08102 story her big data at /lustre folder. Transferring Files to H and I Drives (Strathclyde users only) To copy files to your I drive space on dm1, type: mount_idrive (you will be prompted for your DS password) I drive will be mounted at ~username/i_drive To copy files to your H drive space on dm1, type: mount_hdrive H drive will be mounted at ~username/h_drive You can then copy files to ~username/i_drive or ~username/h_drive . Once finished, type umount_idrive umount_hdrive (as appropriate). Environment Modules There are a variety of software packages and libraries installed each of which require different environment variables and paths to be set up. This is handled via “modules”. To view installed software, type: module list At any time a module can be loaded or unloaded and the envrionment will be automatically updated so as to be able to use the desired software and libraries. Module commands To list available modules type: module avail To list loaded modules, type: module list To load the Intel Compiler suite, for example, type: module load compilers/intel/2012.0.032 To remove a module: module rm compilers/intel/2012.0.032 These commands can be added to your .bashrc file so that they are loaded automatically when you log in. Note that the order of loading modules might be important at some cases. Create a Job Submission Script To run a calculation on the ARCHER compute nodes you need to write a job submission script that tells the Sun Grid Engine (SGE) system what compute nodes you need (normal, SMP, GPU) and what is your project ID. To run a parallel process you also need to specify the parallel environment (mpi-verbose or smp-verbose). For efficient usage of ARCHIE-WeSt is also advised to use the “back-filling” – it is efficient particulary for short parallel calculations and gives you the possibility of using the nodes reserved for bigger parallel job. All jobs should be submitted from the /lustre file system. This high-performance storage allows for faster writing the results, therefore your calculations will use less CPU and you get the result faster than using /home file system. 1) Sample serial job script: # Simple serial job submission script # Specifies that all environment variables active within the qsub # utility be exported to the context of the job. #$ -V # Execute the job from the current working directory. Standard output and # standard error files will be written to this directory #$ -cwd # Submit to the queue called serial.q #$ -q serial.q # Merges standard error stream with standard output #$ -j y # Specifies the name of the file containing the standard output #$ -o out.$JOB_ID #Indicate runtime #$ -l h_rt=01:00:00 #============================================================== ~/bin/hello-gcc-serial 2) Sample parallel job-script: # NAMD job-script export PROCS_ON_EACH_NODE=12 # ************* SGE qsub options **************** #Export env variables and keep current working directory #$ -V -cwd #Specify the project ID #$ -P project.prj #Select parallel environment and number of parallel queue slots (nodes) #$ -pe mpi-verbose 10 #Combine STDOUT/STDERR #$ -j y #Specify output file #$ -o out.$JOB_ID #Request resource reservation (reserve slots on each scheduler run until enough have been gathered to run the job #$ -R y #Indicate runtime #$ -l h_rt=06:00:00 # ************** END SGE qsub options ************ export NCORES=`expr $PROCS_ON_EACH_NODE \* $NSLOTS` export OMPI_MCA_btl=openib,self # Execute NAMD2 with configuration script with output to log file charmrun +p$NCORES -v namd2 namd.inp > namd.out Note: lines starting from # are comments, lines starting with #$ are SGE directives. Runtime is in format hh:mm:ss. If the job would exceed the running time it will be killed automatically. Before submitting a job ensure you have loaded all modules required (see above). For more sample job-scripts click here. Basic Job Submission and Monitoring 1) Types of queues: Serial – this queue uses normal compute nodes. On ARCHIE-WeSt there are 276 compute nodes, each has 2 Intel Xeon X5650 2.66 GHz CPU (6 cores each) and 48 GB RAM per node. Therefore there are 12 cores with 2 GB RAM on each core. For this queue up to 3312 cores might be available. Parallel – this queue uses normal compute nodes. On ARCHIE-WeSt there are 276 compute nodes, each has 2 Intel Xeon X5650 2.66 GHz CPU (6 cores each) and 48 GB RAM per node. Therefore there are 12 cores with 2 GB RAM on each core. For this queue up to 3312 cores might be available. Smp – this queue uses SMP nodes (“fat nodes”). On ARCHIE-WeSt there are 8 DELL R810 SMP nodes, each has 4 Intel Xeon E-7430 2.3 GHz CPU (8 cores on each) and 512 GB RAM per node. Therefore there are 32 cores with 16 GB RAM on each core. There are 256 cores in this queue available. The AU multiplier for smp queue usage is 1.5 GPU queue – this queue uses 8 Nvidia M2075 GPU nodes. The AU multiplier for smp queue usage is 12. Note that serial and parallel queue run on the same compute nodes (3312 in total). Part of them is allocated to serial queue and the remaining part to the parallel one. The division might be changed basing on the system load / users demand. The AU multiplier factor for serial and parallel queue is 1. For more details about ARCHIE-WeSt prices see http://www.archie-west.ac.uk/information/archie-fees. 2) Basic SGE Commands Jobs on the ARCHIE-WeSt machine are submitted and controlled using Sun Grid Engine (SGE). qstat – Lists your jobs (qw – waiting, r – running) qstat -u “*” – Lists all jobs in the queue(s) by all users qstat -g c – Provides summary overview of the system use qconf -sql – Lists all queues qsub start-job.sh – Launches job using the script start-job.sh qstat -j JOBID – Gives fuller detail on a job qacct -j JOBID – Gives details on a completed job qdel – Deletes job from queue 3) Submitting a job All job commands and SGE directives should be placed in a script (e.g. start-job.sh) and launched by typing: qsub start-job.sh Then you will see the comment: your job 9355 ("start-job.sh") has been submitted 4) Monitoring a job Progress can be monitored via the qstat command. qstat which returns: job-ID prior name user state submit/start queue slots ------------------------------------------------------------------------------------------------------------ 9355 0.50894 start-job.sh cwb08102 r 05/31/2012 09:41:35 parallel.q@node193.archie.clus 6 If the user does not have any running jobs qstat will not return any output. 5) Deleting a job To delete a job from the queue (the job can be in any state i.e. running or waiting): qdel 9355 6) Duration of jobs The maximum queuing time for one job is 61 days. The maximum wall-clock duration of one job is 14 days. Acknowledging ARCHIE-WeSt In all graphical presentations such as conference presentations, posters, lectures etc., the graphical logo of ARCHIE-WeSt should be used (click here to download the logo). In papers, reports etc., include this statement in the Acknowledgement paragraph: “Results were obtained using the EPSRC funded ARCHIE-WeSt High Performance Computer (www.archie-west.ac.uk). EPSRC grant no. EP/K000586/1.” Strathclyde users are obliged to update PURE and associate all papers, conference talks and posters as well as completed PhD thesis with UOSHPC (available under “equipment”). Practical Hints Do not launch the production job without knowing: A. How much data it will generate (disk quota limitation) B. How much time it will take to complete (runtime limit 14 days) Do not submit jobs from /home directory. All jobs should be submitted from /lustre For data transfer use dm1. It is particularly important for big data transfer There is no /lustre back-up, therefore copy the data to other, secure location (desktop computer, university storage) Keep important ARCHIE-West files at /home because this drive is backed-up For post-processing data you might use vizualization servers (archie-viz.hpc.strath.ac.uk). Due to limited license number please log out as soon as your work is finish to release the license. More reading Basic Linux presentation is available here. Full ARCHIE-WeSt guide is available here. HPC introductory presentation is available here. More job-scripts examples are available here.