Running and managing software
Running command line tools
The command line tools available on the server can be classified as either standard GNU/Linux utilities or specialised bioinformatics tools.
Standard GNU/Linux utilities
The server provides access to a wide range of standard GNU/Linux utilities, which are installed in /usr/bin.
- A general introduction to the use of these utilities can be found here.
- A complete list of the available utilities can be obtained by running
ls /usr/bin
on the commmand line. More information on individual utilities can be accessed by runningman [NAME OF UTILITY]
- A detailed manual covering many of the utilities available can be accessed by running the
info
command on the command line
Specialised bioinformatics tools
Specialised bioinformatics tools are implemented on the server as singularity containers. The main benefit of this approach is that it allows multiple versions of the same tool to be installed in parallel on the server. This gives us a way of ensuring that the latest versions of software tools are available on the server without breaking existing user scripts and pipelines that rely on old versions. Each singularity container is accompanied by a wrapper script that enables users to call the corresponding tool directly. There are also two helper scripts: tools
, which enables users to list the tools available, and versions
, which lists the available versions of the tools. It is possible to run a tool by specifying its name as listed in the output of the tools
command, or to run a specific version of a tool by including the version number as shown in the output of the versions
command.
We recommend that you specify a particular version of any tools that are included in scripts. Calling the tool by name only generally runs to the latest version of the tool, which may change over time as new versions are installed. This could cause problems for scripts that were written to use older versions of the tool.
The man
and info
commands do not provide help for specialised bioinformatics tools on bifx-core. As an alternative, many tools provide help when run with the --help
or -h
flags, and provide version information when run with the --version
or sometimes -version
flags.
Specialised bioinformatics tools, or specific versions of tools, that are not currently installed on bifx-core can be requested by contacting the DRP-HCB bioinformatics core.
Case study: bedtools
In this example, we show how to list and run different versions of bedtools on bifx-core:
hyweldd@bifx-core3:~$ tools
alignmentSieve
bamCompare
bamCoverage
bamPEFragmentSize
bedtools
...
hyweldd@bifx-core3:~$ bedtools --version
bedtools v2.30.0
hyweldd@bifx-core3:~$ bedtools --help
bedtools is a powerful toolset for genome arithmetic.
Version: v2.30.0
About: developed in the quinlanlab.org and by many contributors worldwide.
...
hyweldd@bifx-core3:~$ versions bedtools
bedtools-2.29.2
bedtools-2.30.0
hyweldd@bifx-core3:~$ bedtools-2.29.2 --version
bedtools v2.29.2
hyweldd@bifx-core3:~$ bedtools-2.30.0 --version
bedtools v2.30.0
hyweldd@bifx-core3:~$
Managing your own software tools and packages with conda
In the preceding section, we have shown how to run command line tools that are installed on bifx-core. In addition to using these tools, it is also possible to manage your own software tools and environments using conda (see https://docs.conda.io/projects/conda/en/latest/ for more details). This is currently our recommended method for managing python packages and versions, which may be necessary if you are developing your own python scripts. While conda does provide the ability to manage R packages, we do not currently recommend using conda for this purpose. Our best practices for working with R packages are discussed later in this guide.
Installing conda
If you would like to use conda, we recommend that you install it in your home directory using the miniconda installer. You can do this using the following steps:
- Download the miniconda installer into your home directory by navigating to your home directory and running
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
- Calculate the md5 checksum of the downloaded installer by running the command
md5sum Miniconda3-latest-Linux-x86_64.sh
and check that it is the same as the value shown at https://repo.anaconda.com/miniconda/ for theMiniconda3-latest-Linux-x86_64.sh
script - If the checksums match, install conda by running
sh Miniconda3-latest-Linux-x86_64.sh
and following the steps
- When asked about the license terms, review and accept them (type
yes
) - Accept the offer to install conda to the default location
/homes/[YOUR USERNAME]/miniconda3
by pressingENTER
- Accept the offer to initialise miniconda3 by using conda init by typing
yes
- When the installer has finished, log out from bifx-core3, then log back in and run
conda --version
to verify that conda has been installed - Run
conda update conda
to ensure that conda is up to date - Run
conda config --set auto_activate_base false
to prevent the base environment from being activated automatically on login
Setting up your conda channels
In conda, software tools and packages are downloaded from remote repositories known as channels. conda provides its own default channels, but we recommend that users add the following two community mantained channels as well: - bioconda
, which is dedicated to the distribution of bioinformatics software, and contains a far wider selection of bioinformatics tools than the default conda channels - conda-forge
, which is a large community led channel that provides a wide range of software, including some tools that are required as dependencies by some bioinformatics tools distributed via the bioconda
channel
We recommend that you add the bioconda
channel with high priority, and the conda-forge
channel with low priority, using the following steps: 1. Run conda config --add channels bioconda
to add bioconda with high priority 2. Run conda config --append channels conda-forge
to add conda-forge with low priority 3. Run conda config --show channels
. You should see the following:
channels:
- bioconda
- defaults
- conda-forge
Note: If the channels do not appear in the correct order, you can run conda config --remove channels bioconda
, then conda config --remove channels conda-forge
, then try again.
Managing conda environments
By default, conda installs software tools into the directory in which it was installed, which is known as the base environment. You can check the location of this directory by running conda info
. If you followed the steps above to install conda, this is the default location /homes/[YOUR USERNAME]/miniconda3
.
One of the benefits of using conda to manage bioinformatics software is that it allows you to create different environments and easily switch between them. This is particularly useful if you work on multiple projects that use different bioinformatics software. Each user created environment has a dedicated directory stored in the env
subdirectory of the base environment directory.
We recommend that you should always create your own conda environment to install software and leave the base environment as it is, even if you do not need multiple environments, as user environments are easier to modify and update than the base environment.
The following commands can be used to create and manage conda environments: - conda create -n [ENVIRONMENT NAME] [LIST OF PACKAGES TO INSTALL IN THE ENVIRONMENT (optional)]
to create a new conda environment - conda activate [ENVIRONMENT NAME]
to activate a conda environment - conda deactivate
to deactivate a conda environment - conda env list
to list the conda environments, showing which is active - conda remove
to remove an environment - conda env export
to export an environment to a yaml file - conda search
to search for a software package - conda install
to install a software package in the active environment
A useful guide to conda commands can be found here:
https://docs.conda.io/projects/conda/en/latest/user-guide/cheatsheet.html
Using Python on bifx-core servers
We recommend using Conda to maintain your Python packages and environments. JupyterHub can be accessed from within the Univeristy of Edinburgh network at the following url:
https://bifx-core3.bio.ed.ac.uk:8888
If you would like to add a Conda environment to JupyterHub you will need to install nb_conda_kernels then add a pre-existing environment. You can run the following commands on the command line:
## Install nb_conda_kernels in your conda base environment
conda install nb_conda_kernels # in base environment
## Example to add the conda environment py3.9
python -m ipykernel install --user --name py3.9 --display-name py3.9
Using R on bifx-core servers
We recommend that you use RStudio to perform R analyses on the bifx-core servers. RStudio server is installed on bifx-core3, and can be accessed from within the University of Edinburgh network at the following url:
https://bifx-rstudio.bio.ed.ac.uk
Managing R packages with Renv
Renv is a tool that is similar to conda, in that it makes it possible to create multiple environments, but is specialised for use with R, is implemented as an R package, and can be managed from the R console. You can learn more about Renv here:
Using perl on bifx-core servers
A number of different options are available for running perl scripts on bifx-core. We provide system wide installations that should be sufficient for most applications, and users also have the option of managing their own perl installations if they need specific perl versions or modules.
System-wide perl installations on bifx-core servers
The system perl implementation on bifx-core, /usr/bin/perl
, does not come with any extra packages, such as BioPerl packages, installed system wide. In order to run BioPerl scripts, we provide a custom containerised installation of perl, /library/software/bin/bioperl
, that does include many of the BioPerl packages. This is our recommended version to use for most applications.
Managing your own perl environment
While we expect that /library/software/bin/bioperl
should be sufficient for most bioinformatics applications, it may be necessary for some users to manage their own perl environments. We recommend the use of either conda
or perlbrew
for this purpose, as both allow users to manage local installations of perl
without requiring administrator permissions.
Both conda
and perlbrew
have strengths and weaknesses relative to each other, so the choice of which to use depends on the user’s particular requirements.
conda
provides a quick and easy way of managing perl
installations and environments, and can also be used to install other tools, as described earlier in this guide. However, when using conda
to manage a perl installation, packages are installed using conda recipes rather than from CPAN. This may cause problems as not all perl packages have conda recipes available, and the conda recipe corresponding to a particular package may be difficult to find even when it exists.
By contrast perlbrew
provides access to all CPAN packages through the cpan
command, giving access to a wider range of packages. The main drawback of perlbrew
when compared to conda
is that it builds everything from source rather than providing pre-built packages. This makes the process of installing packages more time consuming, and also potentially more difficult if a particular CPAN package fails to build due to missing dependencies or failing tests. A further limitation of perlbrew
when compared to conda
is that it does not allow you to maintain different environments using the same version of perl
.
We describe how to set up a local installation of perl
using conda
and perlbrew
in the following sections.
Managing your own perl installation with conda
We provide a conda environment containing the same packages as the environment used by /library/software/bin/bioperl
in /library/software/conda_environments/BioPerl-1.7.2/BioPerl-1.7.2.yml
. This can be used to add some additional packages, as in the following example (note that this example assumes that conda has already been installed, as described in the previous section).
$ conda env create -f /library/software/conda_environments/BioPerl-1.7.2/BioPerl-1.7.2.yml
Collecting package metadata (repodata.json): done
Solving environment: done
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
# $ conda activate bioperl-1.7.2
#
# To deactivate an active environment, use
#
# $ conda deactivate
$ conda activate bioperl-1.7.2
$ which perl
~/miniconda3/envs/bioperl-1.7.2/bin/perl
$ perl -E 'use DateTime; say "Success!"'
Can't locate DateTime.pm in @INC (you may need to install the DateTime module) (@INC contains: /homes/hyweldd/miniconda3/envs/bioperl-1.7.2/lib/site_perl/5.26.2/x86_64-linux-thread-multi /homes/hyweldd/miniconda3/envs/bioperl-1.7.2/lib/site_perl/5.26.2 /homes/hyweldd/miniconda3/envs/bioperl-1.7.2/lib/5.26.2/x86_64-linux-thread-multi /homes/hyweldd/miniconda3/envs/bioperl-1.7.2/lib/5.26.2 .) at -e line 1.
BEGIN failed--compilation aborted at -e line 1.
$ conda search perl-datetime
Loading channels: done
# Name Version Build Channel
perl-datetime 1.42 pl5.22.0_0 bioconda
perl-datetime 1.42 pl526h2d50403_2 bioconda
$ conda install perl-datetime
Collecting package metadata (current_repodata.json): done
Solving environment: done
...
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
$ perl -E 'use DateTime; say "Success!"'
Success!
$
Managing your own perl installation with perlbrew
To use perlbrew
, you must first install it into your home directory. Once this is done, you can use the perlbrew
command to install perl
versions and download packages from CPAN for each version. The following example walks through the process of installing perlbrew
, using it to install perl 5.32.1
, and installing the DateTime package from CPAN using the cpan
command. Further information on perlbrew
can be found at https://perlbrew.pl.
$ curl -L https://install.perlbrew.pl | bash
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 170 100 170 0 0 168 0 0:00:01 0:00:01 --:--:-- 168
100 1574 100 1574 0 0 1263 0 0:00:01 0:00:01 --:--:-- 1263
## Download the latest perlbrew
## Installing perlbrew
Using Perl </usr/bin/perl>
perlbrew is installed: ~/perl5/perlbrew/bin/perlbrew
perlbrew root (~/perl5/perlbrew) is initialized.
Append the following piece of code to the end of your ~/.bash_profile and start a
new shell, perlbrew should be up and fully functional from there:
source ~/perl5/perlbrew/etc/bashrc
Simply run `perlbrew` for usage details.
Happy brewing!
## Installing patchperl
## Done.
$ echo 'source ~/perl5/perlbrew/etc/bashrc' >> ~/.bash_profile
$ source ~/.bash_profile
$ which perlbrew
~/perl5/perlbrew/bin/perlbrew
$ perlbrew --notest install 5.32.1
Installing /homes/hyweldd/perl5/perlbrew/build/perl-5.32.1/perl-5.32.1 into ~/perl5/perlbrew/perls/perl-5.32.1
This could take a while. You can run the following command on another shell to track the status:
tail -f ~/perl5/perlbrew/build.perl-5.32.1.log
perl-5.32.1 is successfully installed.
$ perlbrew use 5.32.1
$ cpan DateTime
Loading internal logger. Log::Log4perl recommended for better logging
Reading '/homes/hyweldd/.cpan/Metadata'
Database was generated on Fri, 04 Jun 2021 20:55:40 GMT
Running install for module 'DateTime'
Fetching with HTTP::Tiny:
http://www.cpan.org/authors/id/D/DR/DROLSKY/DateTime-1.54.tar.gz
...
DROLSKY/DateTime-1.54.tar.gz
/usr/bin/make install -- OK
$ perlbrew list-modules
...
DateTime
...
$ which perl
~/perl5/perlbrew/perls/perl-5.32.1/bin/perl
$ perl -v | grep version
This is perl 5, version 32, subversion 1 (v5.32.1) built for x86_64-linux
$ perl -E 'use DateTime; say "Success!"'
Success!
$
Note: Due to the way the network is set up, installing BioPerl on bifx-core requires the NO_NETWORK_TESTING
environment variable to be set, so the correct cpan command for installing BioPerl would be:
$ NO_NETWORK_TESTING=1 cpan BioPerl
For further information, see https://github.com/libwww-perl/libwww-perl/issues/370.