An Introduction to Bash Scripting

blank_computer_screen
Source: http://robertmuth.blogspot.com/2012/08/better-bash-scripting-in-15-minutes.html

What is Bash Scripting?

Bash (an abbreviation for “Bourne Again Shell“) is the shell scripting language and interpreter on most Linux systems. If you’ve used the command line in a Linux environment, you’ve probably already written some commands in Bash. This tutorial explains how to package these shell commands neatly into a script, which can be useful for anyone wanting to streamline commands for a routine task. Bash scripting is also a vital skill for anyone conducting research in a supercomputer environment.

Creating a Bash Script

When you create a bash script in a Unix environment, it is customary to give it a .sh extension. Technically it isn’t necessary, because Unix will still treat it as a bash script regardless, but it’s a good idea. Think about it: have you ever inherited someone’s old project and been faced with the task of figuring out what each of their files is for? Imagine doing that without extensions. So, give your file a .sh extension.

To open a new script in emacs:

C-x C-f my_script.sh

Or, if you’re like me and use vim:

vi my_script.sh

Then, add the following header to the top of your file. All other code will go below this. This header is called a shebang and is used to ensure that Bash (and no other interpreter) will be used to run the script.

#!/bin/bash

Running a Bash Script

Source: https://www.guru99.com/sql-interview-questions-answers.html

You do not need to run a compiler or specify an interpreter to run a Bash script (the shebang already takes care of that for you). To execute them, simply navigate to the directory where the script is saved and run:

./my_script.sh

If your Bash script takes command line arguments (more on how to do this later), execute your script as in the following example, where "my_file.csv" and 0.4 are the parameters to the script.

./my_script.sh "my_file.csv" 0.4

Some Bash scripts accept named parameters (more on this later). To run a Bash script with named parameters, execute it in the following way. In this example, -f references a file parameter and -t a threshold parameter.

./my_script.sh -f "my_file.csv" -t 0.4

Declaring and Using Variables

Source: https://www.researchgate.net/figure/Global-and-local-variables-in-procedural-programming_fig1_283347855

In bash scripting, the type of a variable is automatically inferred, so you don’t need to declare it. You also don’t have a lot of special data types. Basically, there are floating point values, character strings, and lists. Note that, by default, floating point values are treated as characters. They are only treated as numeric during mathemical or arithmetic operations. If you need anything more complicated than that, my advice is to write your code in R, Python, Java, or Perl and call it from your bash script rather than coding everything up in bash.

Single Values

Again, you don’t need to declare the type of your variable. All you need to do is this:

my_character_variable="hello world"
my_float_variable=4.0
my_integer_variable=1

One thing to note about bash is that it’s finicky about whitespace. No matter how much you want to, don’t put spaces between the variable name and the equals sign.

When you want to use these values, put a dollar sign in front of them. It is a good idea to enclose them in curly braces as well, although it isn’t always required. In some scenarios (like concatenating two strings with an underscore), curly braces are important for distinguishing variable names from other text. For example, the code below will store “hello world_1” in the variable.

full_string=${my_character_variable}_${my_integer_variable}

Arrays and Lists

The syntax for lists is also quite simple, and bash arrays can contain both strings and numeric values, e.g.

my_array=(1 2 3 4 5 6 7 8 9 10 4.0 "hello" "world")

Variable Scoping

If you’ve done a lot of coding, you know that variables are local to the block of code in which they’re declared, right? Right. Except in bash. By default, every variable declared in a bash script is global. If you want a local variable, you need to declare it like this:

local my_local_character_variable="hello neighborhood"

Arithmetic

Arithmetic on variables is straightforward. Use the same operators that you would use in typical arithmetic. However, to ensure that your variables are treated as numeric and not string variables, you must use double parentheses around the expression or the let keyword in front of the variable in which you want to store your result. Some examples are shown below. Remember, be careful about whitespace.

x=5
y=2
modulus=$((${x}%${y}))
let addition=${x}+${y}

String Concatenation

String concatenation is also very simple in Bash. Simply type the variables, one after the other, with any additional text filled in exactly where you want it! It doesn’t get easier than that. Here’s an example.

#This code stores "Why hello there, world!" in a new variable.
word1="hello"
word2="world"
combined="Why ${word1} there, ${word2}"

This also words if you’re dealing with numeric values, without any need for parsing.

#This code stores the string "We're number 1" in a new variable.
val=1
combined="We're number ${val}"

Running External Programs

If you’ve ever used the command line on a Unix system, then you already know how to execute external programs in a bash script. Simply call the program as you would do from the command line. Here are a few examples.

R Scripts

Many people like to run R interactively. Unfortunately, you can’t do this within a Bash script, because scripts are not interactive by virtue of being scripts. You will need to save your code in a .r file and run it using Rscript. If you have run Rscript on the command line before, this should look very familiar.

param_1=2
param2="/root/my_file.csv"
param3="/root/my_output.csv"
Rscript my_r_script.r $param_1 $param_2 $param3

Utilities

You can run utilities from within your bash script just as you would run them on the command line. Here is an example using bedtools, a utility common in bioinformatics. Called in this way, it will print the resulting file to the console rather than saving it. To avoid this behavior, see the section on Output.

bedtools sort -i my_bed_file.bed

Unix Tools

There are a few Unix tools that can be handy to use in Bash scripts. Each of these tools is really a topic on its own, but here is a brief introduction to them.

grep

The grep utility is primarily used for searching text. It is often used with pipes (discussed later) and can be called directly within the Bash script. Learn more about the capabilities of grep here. The following command will return every line in my_file.txt containing my_word.

grep "my_word" my_file.txt

cut

cut is used for selecting substrings of each line in a file or modifying the lines in a file. Here is an example of how it can be used to select only the first two columns in a tab-delimited file. You can see more examples of how to use cut here.

cut -d "\t" -f 1,2 my_file.txt

shuf

If you are doing any work that involves permuting data, shuf is a convenient tool. It can be used either to shuffle an entire file or to select a random set of lines from a file. Read more about shuf here. The example below shows how to shuffle an entire file’s lines using shuf.

shuf my_file.txt

awk

The awk utility is convenient for selecting and modifying lines that meet specified criteria. It is more powerful than cut, but it can also be more complicated to use. To really use awk well, you should understand regular expressions. The following example shows how to select columns 1 and 2 from a file (similar to the cat example). You can see more examples of awk here.

awk '{print $1 $2}' my_file.txt

Branching

Source: https://www.classes.cs.uchicago.edu/archive/2019/winter/15200-1/lecs/notes/Lec4ComplexCondNotes.html

Braching in Bash uses the following syntax. Within the double square brackets, you can construct tests using the comparison operators available in Bash, and you can chain tests together using Bash logical operators. Single square brackets are also supported, but double square brackets have some added features. Note that the whitespace between the test and brackets is important!

if[[ $num -eq 42 ]]
   then
      Rscript my_r_script.r "file_42.csv"
   else
      Rscript my_r_script.r "file_not_42.csv"
fi

Tests in an if statement can include more than just arithmetic. The following code checks whether a directory exists, and creates it if it doesn’t.

dir="../my_dir"
if [[ ! -e $dir ]]; then
   mkdir $dir
fi

Looping

Source: http://www.functionx.com/java/Lesson08.htm

You can use for loops or while loops in Bash. For loops are used for looping over lists. The example below shows looping over a list of numbers.

for f in 0 1 2 3 4 5 6 7; 
   do 
      Rscript my_r_script.r $f 
   done

This could also be done using a range.

for f in {0..7};
   do
      Rscript my_r_script.r $f
   done

Finally, you could loop over a pre-defined array.

for f in ${my_array);
   do
      Rscript my_r_script.r $f
   done

While loops test conditions have similar syntax to if statement test conditions. The following while loop does the same as the first two for loops above.

i=0
while [[ $i -lt 7 ]] 
   do
     Rscript my_r_script.r $f
   done

Functions

Source: http://www.desy.de/gna/html/cc/Tutorial/node3.htm

You can define and call functions in Bash scripts, but note that you need to define your function before you call it. This is notable because many programming languages do not have this restriction. Another thing that is different about functions in Bash scripting is the way parameters are passed. When calling the function, you simply pass the parameter directly after the function call like a command-line argument. Inside your function definition, your first parameter will be referred to as $1, your second as $2, and so on. What about returning values from a function? Bash doesn’t allow this. So strictly speaking, Bash functions are not really functions at all but procedures.

my_function() {
   local c=$1
   Rscript my_r_script $c
}
for f in 0 1 2 3 4 5 6 7;
   do 
      my_function $f
   done

Input

Input in Bash scripting can take two forms. You can pass command-line arguments when calling your script, or you can store your input as a file.

Command Line Arguments

Basic command line arguments work similarly to parameters in Bash functions: $1 refers to argument 1, $2 to argument 2, and so on.

However, if you want to make your script more user-friendly and allow for named parameters, that is also possible. The code below allows for three named parameters: -n for a name, -f for a file name, and -t for a threshold. All arguments are optional. The realpath operator returns the full path of the file name given if the file exists.

while getopts n:f:t: option; 
   do
      case "${option}" in
         n) name=$OPTARG;;
         f) filename=$(realpath $OPTARG);;
         t) threshold=$OPTARG;;
      esac
   done

File Input

Of course, you can also simply hard code file names into your Bash script and use them as your input. If you want to input a list of values rather than a single value, storing them in a file is probably the best way to do this. There are several ways to load your data from the file into a list.

The first option prints the file using the cat utility and stores each line in a list. Note that the parentheses here are different from the double parentheses described in the arithmetic section. Double parentheses (()) run arithmetic operations, and single parentheses () allow you to run commands in a subshell (essentially a child process) that can then be returned using the dollar sign $.

my_list=$(cat my_file.txt)

Another option uses the shell redirection operator to read each line of the file in a loop.

my_list=()
while read infile;
do
    my_list+=($infile)
done < my_file

The IFS Variable

Note that the code above will only work as expected if there is no whitespace (spaces, tabs, etc) within each line. If your lines have spaces or tabs, Bash will automatically split on each space or tab. You can change this by setting a special variable called IFS to split on new lines only.

For example, say your input file is formatted like this.

Hi, I'm a file.
You should input me into your Bash script.
But it needs to be done line-by-line.

If you want to read each of these in a single line, you could do

IFS=$'\n'
my_list=()
while read infile;
do
    my_list+=($infile)
done < my_file

You can also use IFS to split on other characters as well. See this page for more information on IFS.

Output

In Bash, you can print output to a file or direct it to stdout or stderr (by default, stdout is usually the main console).

Shell Redirection Operator

The shell redirection operator allows you to redirect output to a file. For instance, the following line redirects the output of the bedtools sort utility to the file my_sorted_bed_file.bed. Normally, this output would print to stdout.

bedtools sort -i my_bed_file.bed > my_sorted_bed_file.bed

It is also possible to append to the file, like so:

bedtools sort -i my_bed_file.bed >> my_sorted_bed_file.bed

Finally, if your line of code prints to stderr, you can redirect both streams as follows:

bedtools sort -i my_bed_file.bed > my_sorted_bed_file.bed 2> my_errors.log

echo Utility

The main way to output to the console (stdout) is to use the Unix echo utility. The following examples show how echo can be used.

#This command prints "Hello World" to stdout.
echo "Hello World"

#This command prints the contents of my_array to stdout.
my_array=(1 2 3 4 5 6 7 8 9 10 4.0 "hello" "world")
echo $my_array

#This command prints the contents of the file my_file.txt to stdout.
echo $(cat my_file.txt)

Piping

Source: https://bash.cyberciti.biz/guide/Pipes

When you pipe a command, you are redirecting its output to another command. This is done using the | operator. Pipes are used in many scenarios, but here are some examples.

#The following code prints only the names of files in a directory containing ".png".
ls -l | grep "\.png$"

#The following command sorts the first 1000 lines of a file.
head -n 1000 | sort -V -k1,1 -k2,2

Quotes

Three types of quotes are used in Bash: double quotes, single quotes, and backtick quotes. They are all used for different purposes.

Double Quotes

Double quotes are used around text. If variables are included in the double quotes, they are expanded. Here is an example. The code below prints “Why hello there, world”

word1="hello"
word2="world"
echo "Why ${word1} there, ${word2}!"

Note that if you want to include quotes within the string, you need to use an escape character. The code below prints “The script name is “my_script.r””

echo "The script name is \"my_script.r\""

Single Quotes

Single quotes are also used around text, but the difference is that they do not expand variables. Looking at a similar example (below), we print “Why ${word1} there, ${word2}”

#This code stores "Why hello there, world!" in a new variable.
word1="hello"
word2="world"
combined='Why ${word1} there, ${word2}'

Backtick Quotes

These are usually just called “backticks”; however, many people consider them a type of quote or mistake them for single quotes. Backticks have an entirely different function from other quotes, which is to return the output of a command. In this way, they function the same as $(). For instance, in the File Input section, we could have also written the command like so.

my_list=`cat my_file.txt`

Background Processes

background_process
Source: https://turbofuture.com/computers/Run-process-in-background-linux-terminal

Sometimes, you may want to run part of your script in the background so that it doesn’t block additional processes from accessing the shell. To do this, you need to attach lines of code to threads. One easy way to do it is to put all code you wish to run in the background into its own method. Then, call that method in a for loop. The ampersand attaches your code to a background process.

Note the pids array and the wait statement. These are important if you want to make sure no other code executes until all threads have completed. The code below tells the script to track all process id’s and wait until they have completed before running the next line of code.

pids=""
for f in $CHROMS;
   do
      my_function $f &
      pids="$pids $!"
   done
wait $pids

Ohio Supercomputer Center Tutorial

This tutorial is intended for users who have access to the Ohio Supercomputer Center (OSC) for research purposes but who are not familiar with conducting research in a supercomputing environment. It gives an overview of the resources available from OSC and the structure of the OSC computing environment, methods for connecting to OSC, job submission, software installation, and help desk requests.

An Overview of OSC

The term “supercomputer” is somewhat vague, but what it typically means in today’s context is actually a supercomputing cluster, or a large collection of high-powered servers connected via a local network. Clusters allow users to run computationally intensive, parallelizable tasks in a single environment. To illustrate the concept, I have provided an image of one of OSC’s retired clusters, Oakley, from the OSC website.

2015_1222-cluster-graphic_oak_all
Source: https://www.osc.edu/sites/osc.edu/files/2015_1222-Cluster-Graphic_Oak_all.png

In a supercomputing center like OSC, many users share these resources at once, and resources (nodes, memory, and time) are allocated to users as users request them. You do this by submitting jobs (discussed later). When you submit a job, you request the resources you need. When those resources are available on the cluster you are using, your job will run.

Computing Clusters

OSC has 3 clusters: Owens, Ruby, and Pitzer. To decide which of these clusters best meets your needs, check the specifications of each from the OSC website. Depending on your research, you may want to think about the type of network connections used or how many GPU’s are available. You may also want to browse the software list to determine which clusters contain the software you need.

Note: While not all clusters have the same resources, user files are mirrored across all clusters. This means that, if you are using Owens and Owens is unavailable due to periodic system updates, you can use another cluster until Owens is available again.

Login Nodes and Compute Nodes

Most of the nodes in any OSC cluster are compute nodes, with a few login nodes. The difference between the two is mainly this:

  • Login nodes are used only for logging in and for very basic tasks, like moving or deleting files. There are only a few of them because the tasks performed on them are not intensive, so many users can share only a few nodes.
  • Compute nodes are used for intensive computational tasks. All of your work should be done on compute nodes, not on login nodes.

When you connect to OSC (no matter which method you use), you are connecting to a login node. The only way to access a compute node is by submitting a job. Any work you do on OSC must be done within a job – otherwise, you risk putting a heavy workload on the login nodes. If you do this, any script you run will be killed almost immediately, and you will receive an e-mail from the OSC administrators reminding you to never do that again.

File Systems

All OSC clusters have the same file systems, described below.

  • /fs/home is the file system you are directed to automatically when you log in, and it is organized by user ID. You can store data and output files there, but your collaborators won’t be able to access them.
  • /fs/project is a file system organized by research project ID. To get to your project folder, you will need to navigate to /fs/project/. Storing your files here makes them available to your collaborators.
  • /fs/scratch is a large file system for temporary storage and is available to everyone. Use it if you need to share files with someone who does not have access to your project folder or if your files are too large to fit in your project folder.

Connecting to OSC

Personally, I prefer to use the command line for most of the work I do, but many users like the look and feel of a GUI. If you prefer using a GUI, access OSC through the OnDemand Web Portal. If you prefer working from a command line, use PuTTY (for Windows users) or SSH (for Mac or Unix users).

Using the OnDemand Web Portal

OnDemand is a web-based service run by OSC administrators for accessing OSC using a GUI (No installation required). OSC has a tutorial for using OnDemand. Note that, instead of using CILogon, you will want to choose “Log in with your OSC account”. You will be asked to register your username and password the first time you log in.

Using PuTTY (Windows)

To connect using Windows, use PuTTY. If you haven’t used PuTTY before, just download it and it’s ready to use! Once you have PuTTY installed, configure PuTTY to connect to OSC as shown below (the example below is for the Owens cluster). Other than what is highlighted, no changes need to be made to the settings.

putty1

Note: I have saved this connection under the name “Owens” by entering “Owens” under Saved Sessions and clicking Save. If you do this, it saves all information for future logins. You can select Owens and click Load rather than entering everything again.

Using SSH (Mac / Unix)

From your terminal, simply use one of the following commands, depending on which cluster you wish to use.

ssh "user-id"@pitzer.osc.edu
ssh "user-id"@owens.osc.edu
ssh "user-id"@ruby.osc.edu

Submitting Jobs

Here we are…the most important part of the tutorial! Once you have logged into your cluster, you will need to submit your job so that your code can run on the compute nodes. When you submit a job, you need to think about the following things:

  • How many nodes you will need. This is going to depend on how you’ve parallelized your code. To run on multiple nodes, you will need to use a parallelization package that supports MPI or OpenMP in your code. Examples of this are Rmpi and snow in R, and mpi4py in Python.
  • How much memory (or processors) you will need. If your program uses threading, you will need to request processors accordingly. Note that, in addition, the amount of memory allocated for you is proportional to the number of processors you request. For instance, standard Owens nodes have 28 processors and and 64 GB memory, so requesting 4 processors on Owens will allow you to use 9 GB memory. If you need to use more than 64 GB, you can request a large memory node by requesting as many processors as the large memory node contains (e.g. 48 on Owens). Note: If you are using multiple nodes, you must request all processors for each node.
  • How much time you will need. It is best to overestimate on your first run. Then, you can see how much time the job actually takes interactively or using a batch job by checking the log files.

There are two types of jobs that you can run: interactive jobs and batch jobs. You can create these yourself or (if you are using OnDemand) you can use the Job Composer templates provided by OnDemand as shown in the image below. The next two sections describe the types of jobs and how to create them without using templates.

job_composer

Interactive Jobs

An interactive job allows you to connect to the compute node and interactively run tasks. This approach has the following advantages over running in batch mode:

  • You can catch bugs as soon as they occur.
  • You can view output in real time.
  • If you prefer running code line-by-line (such as with an R or Python interpreter), you can do that.

However, note that an interactive job does not keep running in the background when you close your connection to the cluster. For time-intensive tasks, it is not a good choice. Sometimes, it can also be inconvenient when a job is queued, because you should periodically check whether it has been dequeued so that you can run your tasks.

To run an interactive job, issue the following command.

qsub -I -l nodes=4:ppn=28 -l walltime=3:00:00 -A PAS0001

Here, the request is for 4 nodes with 28 processors each (To run on large memory nodes, we could have requested 48 processors each). The interactive job will run for 3 hours, and resources will be charged to project PAS0001.

Batch Jobs

In a batch job, all of your tasks are run from a shell script, which you must create. Batch jobs have the following advantages:

  • They will continue running even after you close your connection to the cluster, making them a good choice for time-intensive tasks.
  • You do not need to check whether a job has been dequeued before running anything. Since your commands are in a script, they will run as soon as the job is dequeued.
  • Putting your commands in a script can help you to better organize your code.

For a batch job, you should create a shell script, e.g. myscript.sh, with the following format.

#PBS -l nodes=4:ppn=28
#PBS -l walltime=3:00:00
#!/bin/bash
for i in 0 1 2 3 4;
do
   echo $i
   Rscript some_r_script.r $i
done

Here, we are again requesting 4 nodes with 28 processors each and 3 hours of runtime. This is specified in the #PBS directives. The rest of the code is an example: it loops through the numbers 0 to 4, prints them out, and calls an R script. This is meant to illustrate how you can use a shell script for simple looping and branching, input and output, and running code.

Now that you have created your script, you will need to submit it. To do this, run:

qsub -A PAS0001 my_script.sh

After your script completes, it will save your console output to a file called my_script.sh.o."some-number" and any errors to my_script.sh.e."the-same-number".

Checking Job Status

To check the status of a job, run the following command:

qstat -u "user-id"

If you have a job running, your output will look something like this:

                                         Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
----------------------- ----------- -------- ---------------- ------ ----- ------ --------- --------- - ---------
4723639.owens-batch.te serial STDIN 86768 1 28 -- 03:00:00 R 00:00:06

This tells us the job ID, that the job is running from STDIN (i.e. it is an interactive job), that it is running on one node with 28 cores, and that it has been running for 6 seconds. Instead of R, jobs that are queued will show Q as their status. Jobs that have recently completed will show C.

Deleting a Job

To delete a job, you must first obtain its ID, which can be done by checking the job status as shown above. Once you have the ID, run:

qdel "job-id"

In this case, the job ID is 4723639.

Transferring Files

To/From a Local Machine

Using OnDemand

To use OnDemand for file transfers, please see this tutorial from OSC.

Using SFTP or SCP (Mac / Unix)

SCP is generally faster than SFTP, but SFTP allows for file management on the remote server, such as file deletion and creation of directories. Learn more about the differences here.

To use SFTP for file transfers, connect to the OSC SFTP server as shown:

sftp "user-id"@sftp.osc.edu

To use SCP for file transfers, use the OSC SCP server as shown below. The first command is for transferring from the SCP server to your local directory, and the second is for transferring from your local directory to the SCP server.

scp "user-id"@scp.osu.edu:"your-file.txt" "your-local-directory"

scp "your-file.txt" "user-id"@scp.osu.edu:"your-remote-directory"

Using FileZilla or WinSCP (Windows)

To connect via FileZilla or WinSCP, you will need the same connection information as you used for the PuTTY connection, but you will use the SFTP server instead of the cluster name for FileZilla and the SCP server for WinSCP. Here is an example using FileZilla.

filezilla

Downloading from Online

One option for retrieving files from online is to download them to your local system and transfer the files using one of the methods above. But this isn’t very efficient, especially if you are downloading large data files.

My favorite method for downloading data from online is to use wget. For example:

wget https://www.encodeproject.org/files/ENCFF001CUR/@@download/ENCFF001CUR.fastq.gz

This downloads a 1.41 GB file from the ENCODE Consortium. For me, it downloads at a speed of 29.7 MB/s.

If you want to obtain data from GitHub and are familiar with Git commands, you can also clone a repository from the OSC command line using git clone . Git is automatically installed on the OSC clusters, so you don’t need to worry about installing it.

It is a good idea to check whether your download was successful. If the file you are downloading has an md5 checksum, you can use that to verify your dowload. For instructions on using md5 checksums, see this tutorial.

Using Software

Available Software

Each cluster on OSC has software pre-installed that you can use. So before trying to install new software yourself, check whether it is already available using:

module spider "name-of-program"

An example of the output for R would be:

---------------------------------------------------------------------------------------------------------------------------------------------------------
  R:
---------------------------------------------------------------------------------------------------------------------------------------------------------
     Versions:
        R/3.3.1
        R/3.3.2
        R/3.4.0
        R/3.4.2
        R/3.5.0
        R/3.5.2
     Other possible modules matches:
        amber  arm-ddt  arm-map  arm-pr  blender  darshan  espresso  express  freesurfer  gromacs  hdf5-serial  homer  hyperworks  libjpeg-turbo  ...

---------------------------------------------------------------------------------------------------------------------------------------------------------
  To find other possible module matches execute:

      $ module -r spider '.*R.*'

---------------------------------------------------------------------------------------------------------------------------------------------------------
  For detailed information about a specific "R" module (including how to load the modules) use the module's full name.
  For example:

     $ module spider R/3.5.2
---------------------------------------------------------------------------------------------------------------------------------------------------------

Note that this also tells you which versions are available. Sometimes, the version of software you want to use is available on OSC, but there is another version loaded by default. To load the version you want, just use module load. For example, let’s say that you want to use version 3.5.2 of R, but 3.5.2 is not the default.

Then, you can run:

module load R\3.5.2

Configuring Paths

Setting your $PATH environment variables can be useful in a Unix environment. It allows you to simply type the name of the software or package you wish to use, without specifying the full path. In OSC, you do this by modifying two files: .bashrc and .bash_profile. These can be found in your /fs/home directory. Below is an example of .bashrc file content:

PATH=$PATH:$HOME/.bds
PATH=${PATH}:$HOME/gosr/bin
PATH=${PATH}:$HOME/Samtools/bin
PATH=${PATH}:$HOME/FastQC/
PERL5LIB=$HOME/bioperl-1.2.3
PERL5LIB=${PERL5LIB}:$HOME/ensembl/modules
PERL5LIB=${PERL5LIB}:$HOME/ensembl-compara/modules
PYTHONPATH=$HOME/pythonmodules/pysam-0.7.5/lib/python2.7/site-packages
PYTHONPATH=$PYTHONPATH:$HOME/usr/local/anaconda/anaconda2/pkgs/
PYTHONPATH=$PYTHONPATH:$HOME/pythonmodules/gosr/lib/python2.7/site-packages

Here, I have added system paths, Perl-specific paths, and Python-specific paths. Simply add a similar line to both the .bashrc and .bash_profile files to include any additional path. You will need close and reopen your connection to the cluster before these changes can take effect. Note: It is important to modify both files. One is used for the login node, and the other is used for the compute node. Ideally, you want them to be consistent.

Installing Software and Packages

Installing software on OSC is not as simple as on a Unix system on which you are the root user. You do not have sudo access, so you are limited to user installations. For example, I often use the following for python modules:

pip --user "package-name"

If there is software or a package you wish to install that requires root access, you will need to contact the OSC Help Desk.

Support

To request support, contact oschelp@osc.edu with a description of the problem you are facing, and include your user ID. The help desk usually responds quickly.