Content

While developing with the L7|ESP SDK, all content is defined within the content/ folder of the L7|ESP SDK. That folder can be organized however you like, but a directory structure similar to the following is recommended:

content/
├── admin/
│   ├── Users.yml
├── workflows/
│   ├── QC-Workflow.yml
├── protocols/
│   ├── QC-Quantification.yml
│   └── QC-Report.yml
├── pipelines/
│   └── QC-Report-Pipeline.yml
├── tasks/
│   └── generate_qc_report.py
└── inventory/
    ├── Sample-Types.yml
    └── Container-Types.yml

In the example above, note that directories are used to separate different types of content files. Generally, most of the content added to the content directory will be in a YAML format that the L7|ESP Python client can import into the application. For more information about the various types of YAML config files that can be imported into the application, please see the L7|ESP Python client documentation.

Referencing Resources

In addition to configuration files, resources referenced in the application (i.e. scripts or artifacts used during pipeline execution) should be referenced using a special path identifier that works in both development and production. Below are useful paths for referencing custom or stock content:

  • $LAB7DATA - The path to the application data folder, which contains the application database, pipeline artifacts, and all custom content defined in the repository.

  • $LAB7DATA/content - Path to the content directory from the L7|ESP SDK. Any files you create in the content directory will be automatically mapped to this production directory on install/update.

  • $LAB7DATA/common - Path to common (not project-specific) content resources. Any stock content not part of a specific project repository may be referenced via this path.

In addition to these special paths, there are several default ParamGroups you can use when running Pipelines:

  • param('apps', 'python') - A reference to the version of Python installed with the application. This version of Python contains any requirements you specify in the requirements.txt file within the L7|ESP SDK, the L7|ESP Integrations Python module, and also the L7|ESP Python client. It’s generally recommended to use this param group instead of using python before calling scripts.

  • param('apps', 'integration') - A reference to the L7|ESP Integrations Python entrypoint, which is installed alongside the application for instrument integrations and other types of support.

  • param('apps', 'client') - A reference to the L7|ESP Python client entrypoint, which might be useful for some types of activities (i.e. ingests).

For example, if we have a script in our content directory that references another file in a separate directory, we can define our Pipeline to reference those two files, as follows:

# content folder with script and resource
content/
├── pipelines/
│   └── my-custom-pipeline.yml
└── tasks/
    ├── my_custom_pipeline.py
    └── pipeline_config.json
# contents of content/pipeline/my-custom-pipeline.yml
My Pipeline:

  tasks:
    - My Pipeline Script:
        cmd: "param('apps', 'python') $LAB7DATA/content/tasks/my_custom_pipeline.py -c $LAB7DATA/content/tasks/pipeline_config.json"

    - My Integration Script:
        cmd: "param('apps', 'integration') instrumentsupport illumina checkindexes --worksheet '{{sample_sheet_uuid}}'"

    - My Ingest Script:
        cmd: "param('apps', 'client') ingest my_ingest {{infile}}"

Loading Content

After developing and testing all content that needs to be in your production instance, you need to set up configuration to specify which content is loaded into L7|ESP by default. There are two components to this:

  1. Create seed files that define which configuration to import and what types of objects they represent in the system. A seed file is a standard format that is used to import configuration via the L7|ESP Python client. For information about the structure of these files, see the L7|ESP Python client documentation.

  2. Update your container playbook to import those seed files (roles/container.yml).

For a simple example, let’s consider a content/ folder with a Workflow, two Protocols, and a Pipeline that we want to go into production. We can define a seed file at roles/qc.yml, like so (you can name the seed file however you like):

# Contents of: roles/qc.yml

- model: Workflow
  data: $LAB7DATA/content/workflows/QC-Workflow.yml

- model: Protocol
  data: $LAB7DATA/content/protocols/QC-Quantification.yml

- model: Protocol
  data: $LAB7DATA/content/protocols/QC-Report.yml

- model: Pipeline
  data: $LAB7DATA/content/pipelines/QC-Report-Pipeline.yml

In the example above, $LAB7DATA is an environment variable referencing the L7|ESP SDK root directory. It can be used both in development and production deployments to reference the same location.

Once we’ve created our seed file, we can update our container playbook, roles/container.yml, to reference that seed file for the content that L7|ESP will load into the system by default:

---
seed:
  - '{{ sdk }}/roles/qc.yml'

In the file above, {{ sdk }} is an Ansible variable referencing the L7|ESP SDK root directory. It can be used both in development and production deployments to reference the same location.

Once you’ve completed all of this configuration, you can test it out using:

~$ make import

This command will use Ansible to import all of the content, which is the exact process that is used during install/update in production to import content.

Biobuilds

BioBuilds is a curated collection of open-source bioinformatics tools, pre-built for Linux on both Intel x86_64 and IBM POWER8 systems as well as Mac OS X.

Here we add samtools to the list of Biobuilds tools that will be included on install:

# contents of roles/container.yml
---
biobuilds:
  - bwa
  - samtools