{ "cells": [ { "cell_type": "markdown", "id": "bd5fefe3", "metadata": {}, "source": [ "## Running Cthulhu on a Cluster" ] }, { "cell_type": "markdown", "id": "018c4e32", "metadata": {}, "source": [ "For real world applications, one typically needs a cross section computed across a grid of temperatures and pressures. Such computations can be much more efficiently handled on a computing cluster, where each (P,T) pair can be assigned to a distributed 'job'.\n", "\n", "This tutorial describes how to run `Cthulhu` on a cluster, focusing on clusters managed by the common Slurm scheduling system.\n", "\n", "
\n", "\n", " **Note:**\n", "\n", " Every computing cluster is special and has its own unique architecture. We strongly recommend reading the documentation for your local cluster before proceeding and adapting the code below accordingly.\n", "\n", "
" ] }, { "cell_type": "markdown", "id": "1fc2876e", "metadata": {}, "source": [ "Running on a cluster involves two separate files:\n", "\n", "1. A Python file calling `Cthulhu` (similar to those you've seen in the previous tutorials).\n", "2. An batch script or sbatch file to submit specific combinations of pressure-temperature points to different cores." ] }, { "cell_type": "markdown", "id": "45860f64", "metadata": {}, "source": [ "### Option 1: Python script in cluster mode (e.g. many atoms)\n", "\n", "Let's first create a python file to calculate cross sections for $\\mathrm{Fe}$, $\\mathrm{Ti}$, $\\mathrm{Mg}$, and $\\mathrm{Fe^{+}}$ all at once. When a user places `Cthulhu` in cluster mode (via ``cluster_run = True``) the code will use a **single core** for each pressure and temperature pair. This is ideal for small line lists that are quick to compute, such as for atoms.\n", "\n", "In the example here, a core will first compute the $\\mathrm{Fe}$ cross section at a given (P, T), then continue to compute $\\mathrm{Ti}$ for the same (P, T) pair and so on. So each core will calculate 4 cross sections at a single (P, T) point and we will use a Slurm shell script to request enough cores to cover all the (P, T) points.\n", "\n", "Copy the code below into a .py file... how about `many_atoms_on_my_powerful_cluster.py`" ] }, { "cell_type": "code", "execution_count": 3, "id": "2a887ec5", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "skipping # <---- This is just for the documentation to skip this cell, please remove before running\n" ] } ], "source": [ "%%script echo skipping # <---- REMOVE THIS LINE (it is just for the documentation to skip running this cell)\n", "\n", "#***** Example script to batch-run Cthulhu on a cluster *****#\n", "\n", "from Cthulhu.core import compute_cross_section\n", "\n", "species_neutral = ['Fe', 'Ti', 'Mg'] # Fe, Ti, and Mg (neutral atoms)\n", "species_ions = ['Fe'] # Fe + (the ionization state is entered below)\n", "\n", "database = 'VALD'\n", "\n", "input_directory = '/PATH_TO_YOUR_LINE_LISTS/input/' # Change this to point to your line list input folder\n", "\n", "P = [1.0e-6, 1.0e-5, 1.0e-4, 1.0e-3, 1.0e-2, 1.0e-1, 1.0e0, 1.0e1, 1.0e2] # Pressure (bar)\n", "T = [100.0, 200.0, 300.0, 400.0, 500.0, 600.0, 700.0, 800.0, # Temperature (K)\n", " 900.0, 1000.0, 1200.0, 1400.0, 1600.0, 1800.0, 2000.0,\n", " 2500.0, 3000.0, 3500.0]\n", "\n", "# Create cross sections for the atoms\n", "for i in range(len(species_neutral)):\n", " compute_cross_section(database = database, species = species_neutral[i], \n", " pressure = P, temperature = T, S_cut = 1.0e-100,\n", " input_dir = input_directory, ionization_state = 1,\n", " nu_out_min = 200, nu_out_max = 40000, dnu_out = 0.01,\n", " verbose = False, N_cores = 1, cluster_run = True) # The last argument must be True for a cluster run!\n", "\n", "# Create cross sections for the ions\n", "for i in range(len(species_ions)):\n", " compute_cross_section(database = database, species = species_ions[i], \n", " pressure = P, temperature = T, S_cut = 1.0e-100,\n", " input_dir = input_directory, ionization_state = 2,\n", " nu_out_min = 200, nu_out_max = 40000, dnu_out = 0.01,\n", " verbose = False, N_cores = 1, cluster_run = True) # The last argument must be True for a cluster run!" ] }, { "cell_type": "markdown", "id": "c8e5fe68", "metadata": {}, "source": [ "Now we need to create a shell script to assign a core to each (P, T) pair.\n", "\n", "From looking at the Python script above, we have 9 pressures and 18 temperatures, for a total of 162 (P, T) pairs. So we will create a shell script to submit 162 jobs, one for each (P, T) point, with a single core being assigned to each job. Each job will run `Cthulhu` from the terminal automatically via the following commands:" ] }, { "cell_type": "markdown", "id": "0e472005", "metadata": {}, "source": [ " ```\n", " python -u many_atoms_on_my_powerful_cluster.py 0\n", " python -u many_atoms_on_my_powerful_cluster.py 1\n", " python -u many_atoms_on_my_powerful_cluster.py 2\n", " .\n", " .\n", " .\n", " python -u many_atoms_on_my_powerful_cluster.py 161\n", " ```" ] }, { "cell_type": "markdown", "id": "7dd563a1", "metadata": {}, "source": [ "Where '0' here denotes the first (P, T) pair (P[0], T[0]) and '161' denotes the final (P, T) pair (P[8], T[17]).\n", "\n", "To accomplish this, copy the code below into a shell script (.sh). We'll call it `my_ultimate_shell_script.sh`. \n", "\n", "You should also make a folder called `logs` in the same folder to store the terminal output from each job." ] }, { "cell_type": "code", "execution_count": 4, "id": "43b6ee0b", "metadata": { "vscode": { "languageId": "shellscript" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "skipping # <---- This is just for the documentation to skip this cell, please remove before running\n" ] } ], "source": [ "%%script echo skipping # <---- REMOVE THIS LINE (it is just for the documentation to skip running this cell)\n", "\n", "# Job name for the group\n", "JOB_NAME=\"atoms\"\n", "\n", "for i in {0..161}; do # Loops over the 162 jobs\n", " srun \\\n", " --account=YOUR_USER_ACCOUNT \\ # Your user account on your cluster (or which account will be charged)\n", " --partition=standard \\ # Your cluster may have a different name for the default partition\n", " --nodes=1 \\\n", " --cpus-per-task=1 \\ # We only need one core per job, since this is a cluster run\n", " --tasks-per-node=1 \\\n", " --mem=100G \\ # Reserving 100 GB of RAM (less is probably fine)\n", " --time=0-01:00:00 \\ # Max runtime of 1 hour (atoms will take seconds)\n", " --output=./logs/${JOB_NAME}.$i.out \\ # In a directory called 'logs' (make one!), write the terminal output\n", " --error=./logs/${JOB_NAME}.$i.err \\ # Write any error messages into a seperate file\n", " --job-name=$JOB_NAME \\\n", " python -u /PATH/TO/YOUR/CODE/many_atoms_on_my_powerful_clutster.py $i & # Path to the Python file above\n", "done" ] }, { "cell_type": "markdown", "id": "847f25d5", "metadata": {}, "source": [ "You run this shell script simply by:" ] }, { "cell_type": "markdown", "id": "6fff99bd", "metadata": {}, "source": [ " ```\n", " ./my_ultimate_shell_script.sh\n", " ```" ] }, { "cell_type": "markdown", "id": "779d787d", "metadata": {}, "source": [ "Congratulations, you have just unlocked the power of calculating cross sections in parallel on 162 cores! 🎉" ] }, { "cell_type": "markdown", "id": "0023232d", "metadata": {}, "source": [ "### Option 2: Multiple nodes in parallel (e.g. molecules with large line lists)\n", "\n", "For very large line molecular line lists, to minimise runtime you'll usually want to throw many cores at each (P, T) point while also computing several (P, T) points in parallel. \n", "\n", "Here we'll look at an example of how to efficiently calculate a cross section for the ExoMol $\\mathrm{SO_2}$ line list (1.3 billion transitions). We'll break the (P, T) range into 6 segments and throw 36 cores at each segment.\n", "\n", "Copy the code below into a .py file, e.g. `SO2_cluster.py`" ] }, { "cell_type": "code", "execution_count": null, "id": "8a94d7f4", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from Cthulhu.core import compute_cross_section\n", "\n", "species = 'SO2'\n", "\n", "database = 'ExoMol'\n", "linelist = 'ExoAmes'\n", "\n", "broadening = 'H2-He'\n", "\n", "input_directory = '/PATH_TO_YOUR_LINE_LISTS/input/' # Change this to point to your line list input folder\n", "\n", "#***** Each node will work on all the pressures for a subset of temperatures *****#\n", "\n", "P = np.array([1.0e-6, 1.0e-5, 1.0e-4, 1.0e-3, 1.0e-2, 1.0e-1, 1.0e0, 1.0e1, 1.0e2])\n", "\n", "#***** Selectively uncomment the temperature range you want this job to work on, then submit the job (see below) *****#\n", "\n", "T = np.array([100.0, 200.0, 300.0, 400.0]) # Only have one of these lines uncommented at a time\n", "#T = np.array([500.0, 600.0, 700.0, 800.0])\n", "#T = np.array([900.0, 1000.0, 1200.0])\n", "#T = np.array([1400.0, 1600.0, 1800.0])\n", "#T = np.array([2000.0, 2500.0])\n", "#T = np.array([3000.0, 3500.0])\n", "\n", "# Create cross section\n", "compute_cross_section(database = database, species = species, linelist = linelist,\n", " pressure = P, temperature = T, S_cut = 0.0,\n", " input_dir = input_directory, broad_type = broadening,\n", " nu_out_min = 200, nu_out_max = 25000, dnu_out = 0.01,\n", " verbose = True, N_cores = 36) # 36 cores in parallel" ] }, { "cell_type": "markdown", "id": "a93ad7f1", "metadata": {}, "source": [ "Now let's create the Slurm batch script to submit this job. Copy the below into a file called `SO2.sbatch`." ] }, { "cell_type": "code", "execution_count": null, "id": "e28a4568", "metadata": { "vscode": { "languageId": "shellscript" } }, "outputs": [], "source": [ "%%script echo skipping # <---- REMOVE THIS LINE (it is just for the documentation to skip running this cell)\n", "\n", "#SBATCH --job-name=SO2\n", "#SBATCH --account=YOUR_USER_ACCOUNT\n", "#SBATCH --partition=standard\n", "#SBATCH --nodes=1\n", "#SBATCH --cpus-per-task=36\n", "#SBATCH --tasks-per-node=1\n", "#SBATCH --mem=100G\n", "#SBATCH --time=1-00:00:00\n", "#SBATCH --mail-type=ALL\n", "#SBATCH --mail-user=YOUREMAIL@YOURUNIVERSITY.edu\n", "#SBATCH -o ./logs/SO2.%j.out\n", "#SBATCH -e ./logs/SO2.%j.err\n", "#SBATCH --get-user-env\n", "\n", "declare -xr WDIR=\"/home/YOUR_PATH_HERE/Cthulhu/REST_OF_YOUR_PATH/\"\n", "\n", "declare PATH=${PATH}:${WDIR}\n", "\n", "time_start=`date '+%T%t%d_%h_06'`\n", " \n", "echo ------------------------------------------------------\n", "echo SBATCH: job name is $SLURM_JOB_NAME\n", "echo SBATCH: job identifier is $SLURM_JOBID\n", "echo SBATCH: sbatch is running on $SLURM_SUBMIT_HOST\n", "echo SBATCH: executing queue is $SLURM_JOB_PARTITION\n", "echo SBATCH: working directory is $SLURM_SUBMIT_DIR\n", "echo SBATCH: node file is $SLURM_JOB_NODELIST\n", "echo ------------------------------------------------------\n", "\n", "cd ${WDIR}\n", "\n", "python -u SO2_cluster.py\n", "\n", "time_end=`date '+%T%t%d_%h_06'`\n", "echo Started at: $time_start\n", "echo Ended at: $time_end\n", "echo ------------------------------------------------------\n", "echo Job ends\n" ] }, { "cell_type": "markdown", "id": "8f38151c", "metadata": {}, "source": [ "On your Slurm-managed cluster, you can submit this job in a terminal via the command\n", "\n", "`sbatch SO2.sbatch`\n", "\n", "Check that the job has started running with\n", "\n", "`sq`\n", "\n", "Once the first job is running, you can see the terminal output in `./logs/SO2.JOB_ID.out` (where JOB_ID will be a number assigned by your cluster). If you need to cancel the job\n", "\n", "`scancel JOB_ID` (replace with the specific number you see from `sq`).\n", "\n", "Then you can edit the Python file to specify the next range of pressures and temperatures you want to compute, save the .py file, submit the next job, and so on. \n", "\n", "Enjoy the power of Cthulhu! 🐙" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 5 }