Getting Started with Remote Servers¶

Everyone has their own preferences for setting up environments and workflows on remote machines. These are the practices that we have found that significantly speed things up and make things easier, but you are of course welcome to use what works best for you. If you have limited experience, we recommend you follow these guidelines to get started. Once you get going check out our other guide for handy Unix commands as well as our guide to our favorite IDEs and editors.

Choose Your Shell¶

Before you log into our machines, make sure you set your default shell preferences. The default choice is csh, but you can choose between: bash, sh, tcsh, csh, zsh.

We recommend bash, but you are welcome to choose whatever you feel most comfortable with. To change the default shell for any remote machine, you log into account config (you need to be on VPN to access this page). Log in with your NetID, then under “Account Maintenance” -> “Shell Management,” you can choose your desired shell environment.

Set up SSH Keys¶

SSH keys are convenient, because they allow you to log into remote machines without typing your password. To set up, you will first need to generate an SSH key that acts as a way of verifying your machine.

To create ssh keys on your local machine, open up a terminal:

$ ssh-keygen

Tip

You will be prompted with the option to enter a passphrase for your keys. Please enter a passphrase, but use something different from your Rice password. You will be able to add your SSH key to ssh-agent so you do not have to enter your passphrase every time. If you already have generated an SSH key, but did not use a passphrase originally, you can set a passphrase with $ ssh-keygen -p. This same command will also allow you to reset the passphrase (after prompting you for the old one).

Note

If you already have an SSH key, you can skip the key generation step (you will receive a warning like the one below if you already have an ssh key):

  $ /Users/username/.ssh/id_rsa already exists

If you overwrite your old key, you will have to update any servers that used the previous one to authenticate with your new one.

To manage your private keys (and not have to enter your passphrase every time you SSH), first, start ssh-agent in the background:

$ eval "$(ssh-agent -s)"

Then, add your private key (you will be prompted for your passphrase):

$ ssh-add

You can also add a timeout to ssh-add using $ ssh-add -t 3600 (for a timeout of 3600 seconds) to be extra secure.

For most machines, ssh-agent should start automatically, so when you start a completely new session (e.g., after rebooting your computer), all you should need to do is run ssh-add again, but if ssh-agent has not started, you will need to start it in the background again as well.

To log in to our machines (e.g., risotto) without entering your password every time, you will need to copy your public key to the remote machine. You can do this with the following command, where username is replaced by your Rice net id:

$ ssh-copy-id username@risotto.cs.rice.edu

If successful, you will be presented with instructions of how you can now log in:

Now try logging into the machine, with:   "ssh 'username@risotto.cs.rice.edu'"
and check to make sure that only the key(s) you wanted were added.

If ssh-copy-id is not installed on your machine (e.g., you have a Windows machine), you can use the following command to copy your public key to the remote server:

$ cat ~/.ssh/id_rsa.pub | ssh username@risotto.cs.rice.edu "mkdir -p ~/.ssh; cat >> ~/.ssh/authorized_keys"

Repeat the process for any other remote machines (e.g., mochi).

Set up SSH Config File¶

Writing an SSH config file can help further streamline SSH connections and avoid having to repeatedly add flags when connecting to machines. If you do not have a config file you first need to create one in ~/.ssh/config.

We first recommend enabling multiplexing, which can make SSH connections faster by lumping together new SSH connections for the same host under one connection. This also means that if you disconnect from the "master" SSH connection to the host (the first one connected), all other connections will be terminated.

Add support for multiplexing by including the following lines at the start of your config file:

ControlMaster auto
ControlPath ~/.ssh/tmp/%r@%h:%p

To enable shorter ssh names, e.g. accessing risotto by typing ssh risotto vs ssh username@risotto.cs.rice.edu you need to add additional lines per host to your ssh config:

host risotto
    User your_netid
    hostname risotto.cs.rice.edu

With this base you should now be able to ssh into risotto without specifying a username:

$ ssh risotto

Note

If you would like to use X11 for GUI displays on the remote server (an example use case is the simple atom editor installed on the servers), you will need to install X11 on your local machine (XQuartz for macs, see below), you can add the following line under your host definition:

ForwardX11 yes

Mounting Remote Directories Locally¶

Something that can reduce the friction between navigating the remote servers and working locally is mounting remote directories on your local machine. You can then interface with the files as if they were local (without taking up any of your local storage space), as long as you have a decent internet connection. To do this, we suggest using SSHFS.

Installing SSHFS¶

On Macs, we recommend using homebrew for the installation:

$ brew cask install macfuse
$ brew install sshfs

On Ubuntu, you can install using apt:

$ sudo apt update
$ sudo apt install sshfs

On Windows, you need to install the following two packages (install the latest non-beta version):

SSHFS Commands¶

Mounting on Macs and Linux machines¶

To mount a directory using SSHFS, you will need a local "mountpoint" (i.e., an empty directory that is a place holder for the remote directory). We recommend creating a mounts directory to house all of your mountpoints so that it's clear what these are. These directories can be located anywhere. For the purposes of the documentaion here, we'll use /home/Desktop/mounts in the examples.

SSHFS commands have the following syntax:

$ sshfs [netid@]remote_server:[path_to_remote_directory] mountpoint [options]

To mount your user directory on grain, you can access it from either risotto or mochi - we'll use risotto and assume you have made a mountpoint called grain in your local mounts directory for this example.

Note

If you have set up your SSH config, you do not need to fill in the [netid@] part of the command. Also, if you have SSH keys set up, you will not be prompted for your password after entering the command; otherwise, you can use your Rice password to log in (but we highly recommend you use SSH keys).

$ sshfs netid@risotto:/grain/netid /home/Desktop/mounts/grain -o volname=grain

The -o volname option allows you to change the name of the mountpoint so that you can easily what it is. We recommend using the same name you used for the mountpoint to minimize confusion.

To unmount the remote directory, you can use the following command:

$ umount /home/Desktop/mounts/grain

Depending on your use case, some of the directories you might consider setting up include:

your user grain directory (/grain/netid) that we used for the example
the shared resource directory (/grain/resources)
your local risotto directory (/local/netid), where the remote_server is risotto
your local mochi directory (/local/netid), where the remote_server is mochi

If you mount both the local directories from risotto as well as mochi, it is best if your local mountpoint has a descriptive name (e.g., risotto_local) so that you can easily keep track of which server's local directory you're looking at.

To make things convenient, once everything is working, you can set up aliases in your local ~/.bashrc (see below for more information) to mount and unmount directories; for example, for the above commands:

alias mount_grain='sshfs netid@risotto:/grain/netid /home/Desktop/mounts/grain -o volname=grain'
alias umount_grain='umount /home/Desktop/mounts/grain'

Mounting on Windows¶

Windows Explorer can be used to mount remote directories as network drives.

To do this, right-click on "This PC" in the Windows Explorer, and select "Map network drive". You will need to choose a drive for the mount, then in the "Folder" field, you can entre the remote user, server, and path as follows:

\\sshfs\netid@remote_server[\path]

So to mount your grain directory, you would use not 100% sure if / slash will work without testing:

\\sshfs\netid@remote_server[\grain/netid]

The remote directory should now be accessible at whichever drive you chose earlier.

Port Forwarding¶

Sometimes, you will need to connect to a specific port on a remote machine, specifically when connecting to processes running on the remote server. One common use case is to connect to RStudio server so you can open it in a browser on your local machine. Instead of including the port forwarding flags every time you open an SSH connection, you can add them to the config file under the corresponding host definition, by including an additional line:

localforward port_on_local_computer localhost:port_on_server

For example, adding the following under risotto would allow an forward port 8686 on risotto to local port 8585 on your machine.

LocalForward 8585 localhost:8686

Then http://localhost:8585 will connect to a server running on port 8686 on risotto.

Note

To spin up new server instances on our remote machines, you will mostly be using Podman, which containerizes applications (more details in the corresponding section). To have a server be accessible on your local machine, there will thus be "2 hops" of port forwarding, one from the Podman container to a port on the remote server it is on (configured using a Podman command), and then a separate port forwarding event (as specified by your ssh config file) to connect the server port to your local machine. The local port you choose (8585 in the example above) is flexible. The important thing is to make sure that the remote port (8686) in this example is the same as the port you open via Podman to connect to. (Podman's internal container port number is usually fixed to some default, depending on the specific server.)

Port forwarding is also handy to connect a local IDE to modify files on the server. See the section on rmate for more on this.

Bash Profiles¶

Editing shell profiles can save you time and effort by configuring aliases or short cuts for various commands, for changing the colors of different file and directory types, and adding other convenient functionality.

Since we primarily use bash, we have included some configurations we have found useful that we usually set a priori.

These should be set in your ~/.bashrc file (you may have to create it if it is missing).

Aliases allow you to set shortcuts for frequently used commands. These are some good ones to configure.

alias rm='rm -i' # flag that asks permission to delete, can override with -f
alias mv='mv -i'
alias ls='ls -G'
alias ll='ls -lthG' # time human readable (with table)
alias l.='ls -G -d .*' # shows hidden
alias mkdir='mkdir -pv' # auto makes parent directories
alias wget='wget -c' # default behavior to continue downloading if stopped

# path shortcuts
alias grain='cd /grain/your_net_id'
alias res='cd /grain/resources/'
alias loc='cd /local/your_net_id' # using local for alias results in bash warnings

Other helpful functions:

# preventing accidental overwriting using >
set -o noclobber

# allows for unlimited bash history size
export HISTCONTROL=ignoredups
export HISTFILESIZE=
export HISTSIZE=
export HISTTIMEFORMAT="[%F %T] "
export HISTFILE=~/.bash_history_unlimited
shopt -s histappend
PROMPT_COMMAND="history -a; $PROMPT_COMMAND"

# enable colors
export TERM=xterm-256color

To look at history with timestamps, you can just use history command (and of course you can use pipes, grep as with other bash commands, e.g., history | less). If you try accessing the file directly at $HISTFILE, the formatting may look a bit nonsensical (with each command preceded by a comment).

Conda¶

Managing the vast space of python packages with all of their different versions and dependencies can get very chaotic without a package manager. A popular system to manage environments (that people use for languages beyond python as well) is Conda. Within Conda, you can choose to install Anaconda or Miniconda, where Anaconda essentially ships with many packages by default. We recommend starting with Miniconda, since it is much less bloated (there are many packages in Anaconda that you will likely never use), and you can easily install any packages you need.

To install miniconda, download their latest linux installer and run it:

$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ bash Miniconda3-latest-Linux-x86_64.sh

The installer will ask you some questions that are pretty self-explanatory (agreeing to their terms, etc.). You can simply install it in your home directory (by default, it suggests a directory along the lines of ~/miniconda3; you can rename this to something shorter if you would like, e.g., ~/conda). Say yes when it asks whether you would like to initialize it—miniconda will update your .bashrc to autoload it. Do not forget to start a new terminal window or type $ source ~/.bashrc for the new bash settings to take effect.

Note

Once conda is initialized, your environment will always show which conda environment it is in, e.g., (base) [user@risotto]$. To remove the (base) but keep other environments specified, you can run this piece of code (simply changing the conda_dir line to specify the location of your own conda installation).

RStudio Server¶

There is an RStudio Server instance running on risotto, listening to port 8787. What this means is that if you have your ssh config set to port forward a port on your local machine to 8787 on risotto, then you can go to http://localhost:your_local_port in your browser to access RStudio Server. You can use your netid and Rice credentials to log in.

Note that by default, the current working directory when you log in will be in your home directory. To prevent the home directory from filling up, make sure your new projects are in your /grain directory instead!

Podman¶

Podman is what we are using to manage containers, similar to its much more famous counterpart, Docker. If you are familiar with Docker, the commands are so similar there is an automatic alias (i.e., alias docker=podman, and you can run podman commands by invoking docker). At their core, both Podman and Docker allow users to make lightweight 'containers,' essentially somewhat isolated environments that package user and system-level libraries and dependencies. The other terminology you will hear often is 'image,' which is the configuration / set up of the environment (e.g., RStudio on Debian or Jupyter on Ubuntu). Containers are specific instances of images (and thus you can have several containers based off of the same image).

Images can be configured using a Dockerfile (Podman sometimes calls it Containerfile), which is essentially a text file that has all the commands to assemble an image. However, before writing your own, you should know that there are repositories for container images, the main one being DockerHub. By default, when you use Podman on the servers, it will already search DockerHub.

Info

If you are interested in learning more about how to write a Dockerfile, there are many resources online. Here are two options:

And if you want to gain a better understanding of Docker (there are many more resrouces for Docker, but as mentioned above, most of the commands are similar in Podman), this is a good starting point.

Tip

In addition to DockerHub, Podman will by default search RedHat registries for container images and produce a warning message if you do not have RedHat login credentials. Leaving the RedHat registries will not affect Podman's ability to pull images from DockerHub, but to limit Podman to only search DockerHub, you can create a registries.conf file at ~/.config/containers/registries.conf, with the following lines:

[registries.search]
registries = ['docker.io']

The global registries.conf file, which has this line instead:

registries = ['registry.access.redhat.com', 'registry.redhat.io', 'docker.io']

will be overridden by your local settings and no longer search RedHat.

The other thing to be aware of is that since containers are designed to be isolated environments, if you want to access any files on the filesystem within the container, it must be shared with the container. Any files created in the local storage of the container will be deleted along with the container if it is destroyed, so be sure to write any files to the mounted volume if you want to be able to access it outside of the container.

Warning

You should not mount your $HOME directory or any shared directories into your containers (any directories above your user directory in grain or local, including the resources directory), because file ownership / management can get very messy.

One way to go about organizing your projects is to simply mount your grain and local user subdirectories in podman, then use the project management tools in, for example, Jupyter notebook, to organize different projects. Below are example commands for that organization scheme, though you could also potentially create individual Jupyter instances per specific project and mount the corresponding project directories instead. This would allow you to spin up individual projects independently of others.

Tip

Images and containers do take up space in your home directory, so it is good practice to clean up any that you are completely done with using podman rm to remove containers and podman rmi to remove images.

Warning

When you spin up servers, you will have to choose which ports on the corresponding server to use. It is very important to double check first that the port is not already taken by somebody else! You can quickly check which ports are being used by running netstat -tulpn | grep LISTEN. If the port number which you are intending to use is not shown (check the numbers following the :s), then the port is not currently being used. Remember that ports 1024 and below are privileged, so you won't be able to use those either.

Info

If you are curious why we use Podman over Docker, the big advantage is that Podman allows users to spin up containers without root access. Podman does this is by creating a mapping between the host user ids and the container user ids, by default mapping the host user id to root within the container. The --uidmap flag is one way to set custom mappings.

Jupyter Notebooks¶

To set up a Jupyter notebook instance, all you have to do is select a port (that no one else is currently using see warning above for how to check this), your_port, when using the commmand below:

Info

Note that the portions of the command that start with a $ are used to denote variables in the bash environment. This allows us to avoid hardcoding in more parameters that will change depending on who is running the command. If you type echo $USER or echo $UID in the shell you can see the values of these variables.

$ podman run -d -p your_port:8888 \
    -v /local/$USER:/home/jovyan/local -v /grain/$USER:/home/jovyan/grain \
    --userns=keep-id \
    --restart=always --name=jupyter jupyter/minimal-notebook

-d: runs the container in detached mode
-p: choose another new port to map to your Jupyter instance. By default, Jupyter uses port 8888. Remember to read the warning above regarding choosing a port for your_port, and make sure that this is not a port that is used by somebody else. This will be the port you use for port forwarding from your local machine.
-v: mounting grain and local to your "home directory" in the Jupyter notebook instance. Even though your username looks like jovyan (this explains why it is jovyan) within the notebook, the UID is synced (see --userns below), so your file permissions will all be fine. There will also be a work directory when you start the notebook. Don't save things in there; only save your work in the mounted directories (aka grain and local) if you want them to be accessible outside of the container. As in the RStudio case, you can also instead mount a specific project directory (e.g., -v /grain/$USER/my_project:/home/jovyan/my_project).
--userns=keep-id: Podman flag to keep the external and internal user IDs the same
--restart=always: tells the container to always restart, unless it is explicitly stopped (via podman stop), killed (podman kill), or the system is rebooted (this also means that if you have this flag, be sure the container is working, or it will just live in a limbo of restarting and dying)
--name=jupyter: give your container some intuitive / simple name to help keep track of it more easily (if this is left out, Podman will automatically assign some whimsical name to your container)
jupyter/minimal-notebook: the container image to use. As with RStudio, you can choose from multiple options on DockerHub and read about them in the corresponding documentation. Again, remember that as you inevitably install more packages, it is good practice to keep track of them in your own Dockerfile external to the system and/or commit your updated container image from time to time so that you can easily replicate it.

Jupyter notebooks enforce authentication with a token, so even if you already have port forwarding set up on your local machine to the port you chose for the notebooks, you cannot access it directly at http://localhost:your_local_port. To get the token right after you start the container, you can simply run the following command:

$ podman logs -l | grep token

(-l is a helpful shortcut flag that simply means "latest container").

If your last container was something else, you can also always call the container by its ID (in the table when you run podman ps) or the simple name you gave it, e.g.,

$ podman logs jupyter | grep token

In either case, the logs will return urls with ?token= and a long string of alphanumeric characters. You need the full http://localhost:your_local_port?token=long_string_of_characters in order to access the Jupyter notebook server. The token can be cached by your browser, but if you need it again, you can always find it using podman logs, or you can store it somewhere else for easy access.

Last update: 2021-11-10