Skip to content

Direct SSH (Terminal / Batch Submission)

The browser portal is the supported path for actually getting work done. Direct SSH exists for terminal-driven batch operations only — writing and submitting sbatch scripts, checking the queue (squeue, sacct), looking at log files on lab storage. All compute work must go through Slurm (sbatch for batch, srun --pty for an interactive allocation). A bare python train.py or matlab in your SSH shell bypasses the scheduler and will get killed without warning.

For VS Code / Cursor: use the OOD tile, not Remote-SSH

Do not point VS Code Remote-SSH at hpc.eng.umd.edu. That host is the Slurm controller and OOD portal; Remote-SSH would start a Node.js server, language servers, file watchers, and indexing processes on it — degrading the scheduler for the entire cluster.

The supported path is the VS Code (code-server) OOD app:

  1. Open https://hpc.eng.umd.eduInteractive Apps → VS Code (code-server).
  2. Pick the lab profile (your lab's node), resources, and optionally a workdir to open. Submit.
  3. The session card gives you a Connect to VS Code link — full VS Code in your browser, running on your lab's GPU node under a Slurm allocation, with /opt/sw modules and your lab share available.

Prerequisites

  • A UMD Directory ID and AD password.
  • Membership in the lab's AD group (your PI / lab admin manages this).
  • AD password or Kerberos ticket only. SSH keys are not accepted on these hosts. Don't paste your public key into ~/.ssh/authorized_keys and expect it to work.

The one SSH target you should use

ssh <directoryID>@hpc.eng.umd.edu

That host is the Slurm submit host — use it only for the small, cheap operations a submit host is meant for: writing/editing sbatch scripts (a few minutes of vim), submitting them, and checking on them with squeue / sacct. To do real work on a GPU, use srun --pty bash -l from this host — Slurm will drop you into an interactive shell inside an allocation, on hardware that can actually run your job.

Things to keep off the submit host:

  • Anything Node-shaped or IDE-shaped (VS Code Remote-SSH, language servers, npm install, big builds).
  • Sustained compilation, training, simulation, data crunching.
  • Long-running tmux sessions doing real work.

The submit host doesn't have GPUs, doesn't have your lab share's compute capacity, and is shared by every lab — it's a thin shell for talking to Slurm, not a workstation.

SSH — password auth

ssh <directoryID>@hpc.eng.umd.edu

You'll be prompted for your AD password. That's it. (SSH keys are not accepted — see "Prerequisites" above.)

Submitting a batch job from hpc.eng.umd.edu

Once connected to the submit host:

cd /mnt/lab-research/jobs                     # cifscreds first if needed (below)
vi run.sh                                     # write your sbatch script
sbatch run.sh                                 # → Submitted batch job 123
squeue -u $USER                               # check status
sacct -j 123 --format=JobID,State,Elapsed     # after it finishes

The submit host doesn't run your job — Slurm dispatches it to a compute node. You can log out; the job keeps running. See slurm-cli.md for sbatch directives and patterns.

Interactive shell on a GPU node

From the submit host, ask Slurm for an interactive allocation:

srun --partition=<your-lab> --time=01:00:00 \
     --cpus-per-task=4 --mem=16G --gres=gpu:1 \
     --pty bash -l

Slurm picks a node in your lab's set, allocates the resources to you, and drops you into a shell on it. Exit to release the allocation.

SSH with Kerberos (no password prompt, optional)

If you SSH frequently and prefer not to type a password each time, get a Kerberos ticket against AD.UMD.EDU:

macOS / Linux

# Get a Kerberos ticket
kinit <directoryID>@AD.UMD.EDU

# Verify
klist

# Configure SSH to use GSSAPI
cat >> ~/.ssh/config <<'EOF'
Host *.ad.umd.edu *.eng.umd.edu
  GSSAPIAuthentication yes
  GSSAPIDelegateCredentials yes
EOF

kinit is built in on macOS; on Ubuntu install with apt install krb5-user.

Windows

PowerShell with OpenSSH for Windows works for password auth out of the box. For Kerberos, MIT Kerberos for Windows is the closest parallel; WSL2 Ubuntu is the easier path — do the Linux setup inside WSL.

Mount your lab storage

When you SSH to the submit host, lab shares are mounted there but your user's credentials aren't in the kernel keyring yet. Once per SSH session:

read -sp "AD password: " P; echo
echo "$P" | cifscreds add -u $(whoami) -d AD.UMD.EDU <nas-fqdn>

To find your lab's NAS hostname, run mount | grep cifs inside an OOD desktop session (the share's UNC path is the server), or ask eit-help@umd.edu.

If your lab has multiple NAS servers, repeat cifscreds add for each.

Clean up when done:

cifscreds clearall

Running Slurm from SSH

sbatch, squeue, sacct are all available from SSH — that's what this SSH path is for. See slurm-cli.md for how to submit jobs.

When SSH is the right tool

  • Submitting and managing batch jobs from your laptop's terminal without launching a desktop.
  • Quick log inspection on lab storage after a job finishes.
  • Editing small sbatch scripts in vim/nano.

Prefer the portal for everything else — graphical apps, 3D viz, JupyterLab notebooks, VS Code, interactive GPU work, new users. The portal handles storage credentials, GPU rendering, lab-specific software menus, and Slurm allocation automatically.

If you specifically want VS Code or Cursor, use the VS Code (code-server) OOD app — full VS Code in your browser, running on your lab's GPU node through Slurm. See the warning at the top of this page for why Remote-SSH against hpc.eng.umd.edu is harmful.

Limits

  • Your home on the submit host is separate from the home directories inside OOD sessions — files you have in ~ over SSH are not the same files you see in ~ inside a desktop session. Keep work on lab storage and use it from either path.
  • If you leave a long-running process via srun --pty bash -l, use tmux inside the allocation to protect it from your network dropping.
  • The submit host is not where your job actually runs. Anything that would peg a CPU or use real RAM belongs in sbatch / srun, not in your SSH shell on hpc.eng.umd.edu. This includes IDE language servers, watch processes, big builds, and conda solves.

Troubleshooting

Symptom Cause / fix
"Permission denied (publickey,gssapi-keyex,password)" Wrong password, or your AD account isn't in the lab's AD group. Confirm your password works for the OOD portal first; then ask your PI / lab admin about group membership.
"Permission denied" after pasting an SSH key SSH keys aren't accepted on these hosts. Use AD password or Kerberos.
"Permission denied (gssapi-keyex, ...)" only Kerberos ticket expired; kinit <user>@AD.UMD.EDU again. (Password auth still works.)
"Host key verification failed" First time connecting, or the node was reinstalled. Remove old line from ~/.ssh/known_hosts and retry.
Lab shares are empty after SSH You didn't run cifscreds add yet, or AD password was wrong.
ssh hangs on connect Probably a VPN / network issue. Confirm you're on a UMD network or the UMD VPN.