ARCHIE-WeSt User Guide ARCHIE-WeSt User Guide 1. Introduction This is introduction to ARCHIE-WeSt facility which includes information both for ARCHIE-WeSt users and Principal Investigators. It covers ways how to get time on the facility as well as information how to use the facility in practice. 1.1 About ARCHIE-WeSt High Performance Computing for the West of Scotland [pointelleslider id=’2′] ARCHIE-WeSt is a regional supercomputer centre at the University of strathclyde dedicated to research excellence and wealth creation in the West of Scotland. Funded by EPSRC, we operate in parthership with the Universities of Glasgow, Glasgow Caledonian, Stirling and West of Scotland. The centre was established in March 2012 by a £1.6M award from the EPSRC e-Infrastructure fund to establish a regional centre of excellence in High Performance Computing. The aim of the centre is to provide High Performance Computing capability for Academia, Industry and Enterprise in the West of Scotland. Archie comprises almost 3500 cores for distributed parallel computing providing almost 38 Teraflops peak performance, eight 512GB RAM large memory nodes, 8 GPU servers, 2 visualization servers and 150TB of high performance LUSTRE storage. 1.2 Hardware ARCHIE-WeSt is a state-of-the art High Performance Computer jointly funded by the University of Strathclyde and the Engineering and Physical Sciences Research Council (£1.6M). The primary aim of ARCHIE-WeSt is to support the Academic and Industrial communities in the west of Scotland, though access is available to all UK Academic Institutions and companies based in the UK. Headline Specification 3408 cores for distributed parallel and serial computing eight 512GB RAM large memory nodes (32 cores per node) 8 GPU servers (12 CPU cores + 448 GPU cores per node) 150TB of high performance LUSTRE storage. aggregate peak performance of almost 38 Teraflops Detailed Specification Standard Compute Nodes Dell C6100 servers Dual Intel Xeon X5650 2.66 GHz CPU’s (6 cores each) 48 GB RAM 4xQDR Infiniband Internconnect Large Memory Nodes Dell R810 servers 4 Intel Xeon E7-430 2.3GHz CPU’s (8 cores each) 512 GB RAM 4xQDR Infiniband Interconnect GPU Nodes standard node + NVidia M2075 GPU card 448 CUDA cores 6GB RAM Visualization nodes 2 Dell R5500 servers Dual Intel Xeon X5650 2.66 GHz CPU’s (6 cores each) 48GB RAM NVIDIA Quadro 6000 6GB RAM graphics card LUSTRE Storage 148TB storage (formatted) 1 Meta Data server + 3 Object Storage Servers 4xQDR Infiniband network 3.6GBs-1 peak performance 1.3 Software ARCHIE-WeSt promotes usage of open source software. Various open-source packages for Computational Fluid Dynamics (CFD), Molecular Dynamics (MD), Quantum/Computational Chemistry, Data Analysis to mention only a few, are already installed. The list of pre-installed compilers include (but is not limited to): intel, impi, fftw, open64, openmpi. Full list of installed open source software and compilers is available here. Other software packages can be installed on user request. The list of licensed software installed on ARCHIE-WeSt is available here. It is also possible to install new packages on user request. Licensed software usage has to be discussed from case to case because the license terms vary from vendor to vendor. In general it should be possible to use own user license on ARCHIE-WeSt. 1.4 ARCHIE-WeSt Allocation Unit and Fees The basic Allocation Unit on ARCHIE-WeSt is a standard core hour, which corresponds to running on one ARCHIE CPU core for 1 hour. Different parts of the service will be charged at different multiples of the Allocation Unit to reflect the differing resources available. These are: Service Element AU multiplier Standard Node 1.0 High-Memory Node 1.5 GPU Node 12.0 The standard price per allocation unit is £0.04 + VAT. The price for Strathclyde Academics is £0.03, due to the fact that the University underwrites the running costs. Data storage is free up to 250GB. Thereafter there is a monthly charge of £25 per 0.5TB. Users exceeding their quota for longer than 7 days will be charged at £50 per 0.5TB for every part month for which they remain over quota. VAT is charged as appropriate on all quoted fees. 1.5 How to get access to ARCHIE-WeSt A. Academic Users Academic Users from ARCHIE-WeSt Partner Institutions (University of Glasgow, Glasgow Caledonian University, University of Strathclyde, University of Stirling and University of the West of Scotland) are welcomed to apply for: Testing Project (1000 CPU hours allocation), which ran under high priority at no cost. The application form is available here. High Priority Project (SL1). The standard price per allocation unit is £0.04 (+VAT), payment should be up-front, no cash refunds are given. The price for Strathclyde Academics is £0.03, no VAT required. The application form is available here. Standard Priority Project (SL2) is available via an annual subscription fee of £500 per user per year (+VAT for non-Strathclyde Academics). The subscription year runs from 1st April – 31st March the following year. There is no restriction to the amount of core-hours that can be consumed via low priority access. The application form is available here. Academic Users from Non-Partner Institutions can access is via the priority route only. The standard Academic rate for Non-Partner Institutions is £0.06 (+VAT) per Allocation Unit. The application form is available here. It is also possible to extend the existing project by filling the form available here. B. Non-Academic Users Testing Project (1000 CPU hours allocation), which ran under high priority at no cost. The application form is available here. High Priority Project (SL1). The standard price per Allocation Unit is £0.10; VAT will be charged at the usual rates as appropriate. Charges for data storage and AU multiplier remain the same as for Academic users (both Partner and Non-Partner institutions). The application form is available here. Once the project is accepted to run on ARCHIE-WeSt the project ID will be set-up (it has to be specified in all job-scripts) and accounts for all users will be created. Moreover all users will be invited for the Introductory Training: Basic Linux (dedicated for new born Linux users) and HPC Introduction which is mandatory for all ARCHIE-WeSt users. Both are delivered free of charge. 1.6 Acknowledging ARCHIE-WeSt In papers, reports etc., include this statement in the Acknowledgement paragraph: “Results were obtained using the EPSRC funded ARCHIE-WeSt High Performance Computer (www.archie-west.ac.uk). EPSRC grant no. EP/K000586/1.” Strathclyde users are obliged to update PURE and associate all papers, conference talks and posters as well as completed PhD thesis with UOSHPC (available under “equipment”). 1.7. Terms and Conditions Terms and conditions for academic users may be downloaded from here. Terms and conditions for industrial users may be downloaded from here. 1.8 Useful terminology (glossary) The glossary is available here. 2. Connect to ARCHIE-WeSt 2.1 Login to ARCHIE-WeSt ARCHIE-WeSt has four login nodes: archie-w, archie-e, archie-s and archie-t. To log into ARCHIE-WeSt you need to have account on ARCHIE-WeSt along with a DS username and password (Active Directory) issued by the University of Strathclyde. 2.1.1 Terminal Access (ssh) To login to ARCHIE-WeSt via ssh (e.g. from Linux/Mac), use any of the following login nodes: archie-w.hpc.strath.ac.uk (130.159.17.167) archie-e.hpc.strath.ac.uk (130.159.17.168) archie-s.hpc.strath.ac.uk (130.159.17.169) archie-t.hpc.strath.ac.uk (130.159.17.170) For example: ssh -X username@archie-s.hpc.strath.ac.uk -X is optional and will tunnel X windows back to your desktop. You will see a table summarizing your project usage and disk usage. The table is updated every 3am. If you exceed your soft quota the information about how much time left to go under the soft allocation will appear. core Hour usage is calculated basing on completed jobs. Note that if you are assigned to the project but your usage is 0.0 the project will not be listed. If logging on from a Windows desktop, download putty . Click on the images below for instructions on how to use Putty. [nggallery id=4] 2.1.2 Graphical Desktop Session A graphical desktop session can be obtained using the ThinLinc remote desktop client (Windows/Linux/Mac). View the images below to see the suggested configuration options (click on an image to exit). [nggallery id=9] Pressing F8 within the desktop session will give you access to the ThinLinc client options. You can “suspend” the session by simply closing the window via the “X” on the top LH corner. You can of course resume suspended sessions. However, if you have no applications running, we recommend that you log out so as to release the license. Idle sessions will automatically be terminated after 21 days. 2.1.3 Visualization Servers Use the ThinLinc remote desktop client to connect to archie-viz.hpc.strath.ac.uk. Follow the instructions above for 2.1.2 Graphical Desktop Session”, but replacing archie-login.hpc.strath.ac.uk with archie-viz.hpc.strath.ac.uk. To get the best performance use prefix all GUI commands with vglrun to ensure that your applications use the server installed graphics card. For example to run vmd type: vglrun vmd instead of simply vmd. 3. File Systems and Data Transfer 3.1 File Systems ARCHIE-WeSt file systems are: /home : this is where the user’s home directory resides and should be used for the storage of essential files. This filesystem is backed up daily. /lustre : a high performance filesystem from which jobs should be run and can be used for the storage of temporary run-time data. This is backed up up weekly for disaster recovery purposes only. ARCHIE-WeSt makes use of soft and hard quotas on disk allocations: File System Soft Quota (in GB) Hard Quota (in GB) /home 50 100 /lustre 250 500 The soft quota can be temporarily exceeded up to the hard quota limit. If the soft quota is exceeded the user has 7 days to return to being under the soft quota, otherwise the disk will be protected from further writing. In well-justified cases /lustre disk allocation might be increased (both hard and soft quotas) 3.2 Data Transfer From/to Windows desktop For data transfer between ARCHIE-WeSt and a Windows desktop download WinSCP . Click on the images below for instructions on how to use WinSCP. [nggallery id=10] For large data transfers, the user should connect to the “datamover” node dm1.hpc.strath.ac.uk rather than to a standard login node. dm1 has a fast 10Gb/s connection to the campus network, unlike the login nodes which have a standard 1Gb/s connection. Transferring files from ARCHIE to the user’s desktop (Mac/Linux) From a Linux or Mac terminal a typical command would be : scp -pr cwb98199@dm1.hpc.strath.ac.uk:/lustre/strath/phys/cwb98199/MY_DATA . -p : preserves file attributes and timestamps -r : transfer the directory and its contents Note that in the above example, the user is copying data from a folder on the lustre filesystem, back to their desktop Transferring files from the user’s desktop (Mac/Linux) to ARCHIE From a Linux or Mac terminal a typical command would be : scp -pr MY_DATA cwb98199@dm1.hpc.strath.ac.uk:/lustre/strath/phys/cwb98199 -p : preserves file attributes and timestamps -r : transfer the directory and its contents Note that in this example, the user is copying the folder MY_DATA and its contents from their desktop to /lustre on ARCHIE. Transferring Files from ARCHIE (dm1) to H and I Drives (Strathclyde users only) To copy files to your I drive space on dm1, type: mount_idrive Note you will be prompted for your DS password The I drive will be mounted at ~username/i_drive To copy files to your H drive space on dm1, type: mount_hdrive The H drive will be mounted at ~username/i_drive You can then copy files to ~username/i_drive or ~username/h_drive . Once finished, type (as appropriate): umount_hdrive umount_idrive 4. Environment Modules There are a variety of software packages and libraries installed, each of which require different environment variables and paths to be set up. This is handled via “modules”. To view installed software, type: module avail. At any time a module can be loaded or unloaded and the environment will be automatically updated so as to be able to use the desired software and libraries. To list available modules type: module avail To list loaded modules, type: module list To load the Intel Compiler suite, for example, type: module load compilers/intel/15.0.3 To remove a module: module rm compilers/intel/15.0.3 These commands can be added to your .bashrc file so that they are loaded automatically when you log in, or they can be inserted in to a job-script which can be a more flexible method. Note that the order of loading modules might be important in some cases. 5. Job Submission and Queue Selection In most multi-user HPC systems, jobs are managed via a queuing system where the user submits their job to the ‘batch queue’ and the job gets dispatched to the appropriate nodes when they become available. The queuing system (sometime referred to as a “scheduler”) on ARCHIE is the Open Grid Engine (GE) scheduler. To run a calculation on the ARCHIE-WeSt compute nodes a job submission script needs to be prepared that contains directives for the Grid Engine system that contains: your project ID the desired queue (optional for parallel jobs, where a parallel environment has been specified) the desired parallel environment (not required for serial jobs) the required number or cores or nodes (depending on the desired queue) the run-time of the job For efficient usage of ARCHIE-WeSt is also advised to use the “back-filling” – it is efficient particularly for short parallel calculations and gives you the possibility of using the nodes reserved for bigger parallel job. For the details the section Parallel jobs (below). 5.1 Job Run Times The maximum runtime on all queues is 14 days and when a runtime has not been specified in a job-script, the queuing system assumes that the job will run for the full 14 days. However, it is advisable for shorter jobs to specify a runtime in the job script, especially if the runtime is << 14 days. The scheduler has to reserve nodes for long, large jobs in order to ensure that they will eventually run. That means that at any given time, there are a certain percentage of idle nodes that could be used for short and/or smaller jobs. Therefore, by specifying a runtime, it enables the scheduler to assess if it is able to efficiently use those idle nodes for jobs with shorter run times. This is known as “backfilling”. 5.2 ARCHIE-WeSt Queues ARCHIE-West operates with several queues (each of which has high and low priority variants): Serial – this queue uses standard compute nodes (12 cores per node, 48GB RAM, 4 GB RAM per core). For this queue there are typically around 200 cores available. Parallel – this queue uses standard compute nodes (12 cores per node, 48GB RAM, 4 GB RAM per core). For this queue there are typically around 3200 cores available. Smp – this queue uses SMP nodes (“fat nodes”) (32 cores per node, 512 GB RAM). There are 256 cores available in this queue. The AU multiplier for smp queue usage is 1.5 GPU – there are 8 Nvidia M2075 GPU nodes in this queue, each with 12 standard CPU cores available. The AU multiplier for gpu queue usage is 12. Note that serial and parallel queue run on the same compute nodes (~3400 in total). Part of them is allocated to serial queue and the remaining part to the parallel one. The division might be changed basing on the system load / users demand. The AU multiplier factor for serial and parallel queue usage is 1. For more details about ARCHIE-WeSt prices see http://www.archie-west.ac.uk/information/archie-fees. Your project ID specifies to which particular queue level you would have the access. The queue names are: Resources High Priority (SL1) Normal Priority (SL2) Standard Compute Nodes parallel.q parallel-low.q serial.q serial-low.q multiway.q multiway-low.q High Memory Nodes (SMP) smp.q smp-low.q GPU gpu.q gpu-low.q Note that multiway queue is a variation of the parallel queue (using the same nodes) which is required for some commercial software packages. The information about number of cores or nodes available at each particular queue can be obtained via typing: qstat -g c Note that you will additionally see the queues called phi.q and teaching.q. Teaching queue is dedicated to wee-archie users (Strathclyde undergraduate students only) while the phi queue gives access to Intel phi co-processor cards. The priority of a submitted job at any given time depends on what other jobs the user already has in the queue – either running or waiting: the priority of each subsequent job submitted by a user will decrease. There is also an upper limit of 576 cores which can be used by a user at any given time. For example if the user needs 240 cores per job and have two such jobs already running (480 cores used), the third job will not start because the user would exceed the user upper limit (three jobs would use 720 cores). 5.3 Sample Job Scripts Every job must be submitted with an appropriate project ID. The project ID is given to each user (and PI) after opening the account on ARCHIE-WeSt. All jobs should be submitted from the /lustre file system. /lustre is a high-performance filesystem which will provide better performance compared to when running jobs from the /home file system. 5.2.1 Serial Jobs Sample serial job script: # Simple serial job submission script # # Export environment variables to the nodes and preserve the current working directory #$ -V -cwd #Specify Project ID #$ -P project.prj # Submit to the queue called serial.q #$ -q serial.q # Merges standard error stream with standard output #$ -j y # Specifies the name of the file containing the standard output #$ -o out.$JOB_ID #Indicate hard runtime #$ -l h_rt=01:00:00 # ************** END SGE qsub options ************ ~/bin/hello-gcc-serial This job runs a program called “hello-gcc-serial” which resides in the users “bin” directory. Note: lines starting with # are comments, lines starting with #$ are SGE directives. Runtime is in format hh:mm:ss. If the job exceeds the runtime it will be terminated, therefore it is advisable to over-estimate by an appropriate amount. Before submitting a job ensure you have loaded all modules required (see above). Remember about changing the project ID and the runing time. 5.2.2 Parallel Jobs Each parallel queue has at least one associated “parallel environment” which configures environment variables necessary for running parallel jobs. By specifying the “parallel environment” (PE) in the job script, the job will be dispatched to an appropriate parallel queue. For the standard nodes there are two PE: mpi and mpi-verbose (mpi and mpi-verbose are essentially the same, but the verbose variant provides more output); for SMP there are smp and smp-verbose while for Fluent multiway should be used. The parallel environment is specified using a line such as: #$ -pe mpi-verbose 1 (1 node: 12 cores will be allocated) or #$ -pe smp-verbose 1 (1 core: 1 core will be allocated) should be added to the job script. Note that on standard compute nodes we allocate entire nodes (i.e. multiplies of 12 cores) while on SMP nodes we allocate cores and it is possible to use any number of cores per job. Moreover, for each parallel job a resource reservation is required. The SGE needs to know it should reserve the nodes for parallel jobs. It is done by additional option in the job script: #$ -R y Short parallel jobs might benefit from “back-filling”. This means that a short, parallel job might be able to use nodes that are already reserved for another job. So if the job is successful back-filled it will jump over the other queuing jobs and start run earlier than the queue list shows. To enable this, then the run time should be specified as follows: #$ -l h_rt=06:00:00 It means that the expected running time is no longer than 6 hours (of the wall-clock). Runtime is in format hh:mm:ss. If the job exceeds the runtime it will be terminated. Therefore to efficiently use the “back-filling” option the runtime should be overestimated by an appropriate amount. Default runtime is 14 days, therefore it may be advantageous to specify the runtime if is much shorter – it would allow to use “back-filling”. If the “back-filling” is not possible the usage of this option does not harm the job at any way – it will run normally, as it would without the option switched on. Sample parallel job script: # NAMD job-script export PROCS_ON_EACH_NODE=12 # ************* SGE qsub options **************** #Export environment variables to the nodes and preserve the current working directory #$ -V -cwd #Specify the project ID #$ -P project.prj #Select parallel environment and number of parallel queue slots (nodes) #$ -pe mpi-verbose 10 #Combine STDOUT/STDERR #$ -j y #Specify output file #$ -o out.$JOB_ID #Request resource reservation (reserve slots on each scheduler run until enough have been gathered to run the job #$ -R y #Indicate runtime #$ -l h_rt=06:00:00 # ************** END SGE qsub options ************ export NCORES=`expr $PROCS_ON_EACH_NODE \* $NSLOTS` export OMPI_MCA_btl=openib,self # Execute NAMD2 with configuration script with output to log file charmrun +p$NCORES -v namd2 namd.inp > namd.out This is a job script to run calculations using NAMD software, the simulation parameters are stored in the file namd.inp. Note: lines starting from # are comments, lines starting with #$ are SGE directives. Before submitting a job ensure you have loaded all modules required (see above). Remember about changing the project ID, runing time and number of nodes. To load the modules within the job script after the SGE options add the line: module load apps/gcc/namd/mpi/2.8 It would load NAMD 2.8 module. Note that the some modules are depending on the others, so they should be loaded in correct order. The best idea is first to check the order by loading modules from the terminal. 5.3 Basic Job Submission and Monitoring 5.3.1 Basic SGE Commands qstat – Lists your jobs (qw – waiting, r – running) qstat -u “*” – Lists all jobs in the queue(s) by all users qstat -g c – Provides summary overview of the system use qconf -sql – Lists all queues qsub start-job.sh – Launches job using the script start-job.sh qstat -j JOBID – Gives fuller detail on a job qacct -j JOBID – Gives details on a completed job qdel – Deletes job from queue 3.5.2 Submitting a job All job commands and SGE directives should be placed in a script (e.g. start-job.sh) and launched by typing: qsub start-job.sh Then you will see the comment: your job 9355 (“start-job.sh”) has been submitted 3.5.3 Monitoring a job Progress can be monitored via the qstat command. qstat which returns: job-ID prior name user state submit/start queue slots ------------------------------------------------------------------------------------------------------------ 9355 0.50894 start-job.sh cwb08102 r 05/31/2012 09:41:35 parallel.q@node193.archie.clus 6 If the user does not have any running jobs qstat will not return any output. The CPU usage is calculated (and deduced from your project allocation) every 3am basing on completed jobs. 3.5.4 Deleting a job To delete a job from the queue (the job can be in any state i.e. running or waiting): qdel 9355 The maximum queuing time for one job is 61 days. The maximum wall-clock duration of one job is 14 days. 5.4. Interactive Jobs Although it is not very efficient way of using HPC, on ARCHIE-WeSt it is possible to run jobs interactively, e.g. using software Graphical User Interface (GUI). It still needs to be done via SGE – the queuing system needs to know about such job to allocate the resources and to calculate your CPU usage. To open the interactive session: qrsh –P training.prj –V –pe smp 4 xterm Be aware you need to wait in the queue for the resources, once it will be allocated a new terminal window will open. From here you can launch your program (using 4 cores, in this example) 6. Practical Hints and Limitations 6.1. Hints: Do not launch the production job without knowing: How much data it will generate (disk quota limitation) How much time it will take to complete (runtime limit 14 days) Do not submit jobs from /home directory. All jobs should be submitted from /lustre For data transfer use dm1. It is particularly important for big data transfer There is no /lustre back-up, therefore copy the data to other, secure location (desktop computer, university storage) Keep important ARCHIE-West files at /home because this drive is backed-up For post-processing data you might use vizualization servers (archie-viz.hpc.strath.ac.uk). Due to limited license number please log out as soon as your work is finish to release the license. 6.2. Limitations Maximal job running time is 14 days Maximal job queuing time is 61 days Maximum number of cores to be used by one user is 576 Soft quota (both on /home and /lustre) can be exceeded up to 7 days and up to hard limitation Project allocation: if the usage is over 90% of project allocation the PI will start getting notification e-mails. This is a good moment to ask for project extension if required. Project usage is calculated basing on jobs completed before 3am every day. It the project allocation is exceeded the project will be blocked, what means that SGE will not allow any more job to start. Nevertheless running jobs will comlplete normally. For normal priority users the ARCHIE-WeSt subscription fee should be paid every year. The subscription fee is £500 per user per year (+VAT) and it is not transferable from one user to another. It covers the period from 1 April to 31 March next year and gives the access to the queues. Nevertheless to run a job user needs to have both subscription fee paid and CPU allocation within the project he is assigned to. If the subscription fee is not paid the user is moved to the suspended users list 6.3 Support and Help ARCHIE-WeSt Team provides a wide support for all ARCHIE-WeSt users including help with project application, data transfer, job submission and all other practical issues. Moreover, we install the open-source software on user request and provide the most commonly used licensed software. In general it is possible to use a user personal license on ARCHIE-WeSt, nevertheless due to variations among license terms it has to be sorted on case-to-case basis. All new ARCHIE-WeSt users are invited for the Introductory training which covers: Basic of Linux. This is dedicated to those who never used Linux before and deliver the skills and information required to operate on ARCHE-WeSt. HPC Introduction. It covers basic rules and ways how to submit jobs on ARCHIE-WeSt. This is mandatory for all our users. 6.4 More reading Basic Linux presentation is available here. HPC introductory presentation is available here. More job-scripts examples are available here. The glossary is available here. 7.Support For support e-mail support@archie-west.ac.uk