Installing and Using a Sequence-based Naming Format for Entities, Experiments, and other models

What is a Sequence?

A sequence is a concept in databases and applications that allows software to calculate a new unique value in sequential order, generally used as a unique identifier. The most common sequence is an integer-based sequence.

Why is a Sequence useful?

A challenging aspect of managing large datasets, in the lab or otherwise, is the need to uniquely identify a record. Whether that record is a patient, extracted DNA, or a lab container, it is generally useful to be able to uniquely identify the record of interest in a software application. One challenge in paper-based and Excel processes is that generating unique names reliably can be hard. If the name is composed from properties of the record - rather than using a sequence number - it is likely that a user might calculate a duplicate name. For example, if you have extracted DNA samples and attempt to use the date and time to make the sample name unique, the precision of the date and time might lead to duplicate names. (DNA_07042024_1130) If the timestamp in the name is only specified to the minute, creating more than 1 sample per minute can result in records with identical names while representing different physical samples in the lab. There are other challenges in encoding record properties into the name (Which timezone will an embedded timestamp use? What about Daylight Savings Time?) that make sequences a better option for naming in software.

What other identifiers are available in L7|ESP?

The primary ways to identify a record in ESP are as follows:

Resource name
Resource UUID
Resource Barcode
Database Primary Key for table of interest

The name for any record in L7|ESP is stored in the resource record for that model in the database (resource table, name column). The resource name is generally where sequence-based naming formats are applied, although most resources have other options for naming like user entered free-text or context based naming formats for things like samples where the child number relative to the parent sample can be calculated and incorporated into the name. It is important to keep in mind the tradeoffs of not using a sequence based name. Carefully consider the possibility of calculating a duplicate name and the impact that might have on the users.

Alongside the name record in the resource table, a uniquely incrementing integer is maintained called the resource_id. This is the database resource table's Primary Key. This is a common database technique to efficiently join relationships between various tables in applications. The resource_id is primarily intended for developer use and is not generally displayed in the browser user interface.

L7|ESP also applies a universally unique identifier (UUID) to resource records. This UUID is used in L7|ESP's URL paths to link to records in the system. The UUID is a 36-character string (32 hexidecimal digits + 4 dashes, i.e. “20c9b919-03ce-455b-8839-f72f184416da”). This identifier format is random enough that it should be unique across many software systems, including other L7|ESP projects (and external software systems). While it has great guarantees on uniqueness, it is difficult for software end users to work with directly due to it's length and complexity. Other identifiers like resource_id, name, and barcode will generally not be guaranteed unique across different L7|ESP installations, as these identifiers will not be coordinated between different systems.

L7|ESP also maintains a barcode property on the resource table. For many models (samples, experiments, sheets, etc.), the barcode property will be initialized to match the the resource UUID. Projects can implement custom features to apply business logic and overwrite the barcode to meet project requirements. The Barcode column type in LIMS, for example, can be used to update a workflowable resource's barcode within the context of a workflow. Users can also visit the record's data page to manually update the barcode via user input text. Using the default UUID-based barcode can be useful, as it guarantees uniqueness when scanning a record's label (i.e. label for a sample, or for a container). Applying custom barcodes increases the risk of calculating a duplicate barcode. Keeping barcodes unique also ensures L7|ESP can determine which page to navigate to when scanning a record label when that shortcut is available in the user interface.

Note: Some features of L7|ESP use the resource Barcode field to provide a static label for a record in ESP. This concept is called “Fixed ID”. This concept is mostly applied to L7|Master definition models like Workflows, Protocols, and the Protocol & Entity Columns (Resource Variables / resource_vars). Having a static label in L7|ESP for these definitions allows developers to update the “Display Name” for something like a Protocol Column without breaking existing code or being forced to update existing code for things like expressions.

How is a new sequence installed and used in a naming format?

To install and use a sequence in a naming format, one will have to install the sequence into the database via configuration, and enable the naming format for use for each applicable resource type in Builders. Generally sequence based naming is available as a standard feature for Entities and Experiments. Sequence based naming might be useful when creating large quantities of other resources like inventory Items and location Containers, but these use cases might require some custom development to enable.

In the Configuration application, add a new sequence name and starting number in the esp Configuration, in the sequences block. Saving will add the sequence to the database in the esp_sequences table for use. The key for the sequence entry will be the name of the sequence you are adding, and the value will be the number with which you intialize with. This number will be the first number used in the naming format, and will increment each time
1. Note that in earlier version of L7|ESP platform, this step may have required a restart of the system to migrate the new sequence into the database. The sequence number is installed when it is present in the esp_sequences tables.
2. When working with multiple tenants on the same server, sequences segregates by each individual tenant.
In the Configuration application editing the same esp Configuration, use the sequence in a new naming format. The naming formats are added to the id_sequences block of the esp Configuration. Item formats are added to the item list, Entity formats are added to the entity list, and container formats are added to the container list. These formats are typically used to add a short prefix to the sequence number (often related to the entity type or other resource type it will be used with). Left padding of zeroes is often added to the sequence number to make the name more easily sortable. The left padding of 0s should account for the maximum number of resources of that type during the lifetime of the software, without making the name too long. (A common starting point is padding up to 6 digits for the sequence number, like "ESP000001") Save the updated configuration.
With the naming format created and using the sequence, the naming format must be enabled for resource types that will use it in the L7|Master application.
1. For example, an Entity sequence can be added by editing the Entity Type definition for the Entity Type that will use the sequence. Click on the Entity ID Sequences tab, and select your new sequence naming format by checking it's checkbox. Finally, save the definition.
2. For Experiments, the naming format must be allowed for the Workflow Definition in L7|ESP Master application.
To use the sequence, new records can be created in the user interface and the naming format will now be available for selection during the record creation process.

Developer Tips

Developers can implement their own functions to implement naming strategies. Preference should be given to using sequence based naming to ensure record uniqueness. Helpful identifying properties (date of creation, user who created the record, measurable properties of the record) are better exposed as other UI elements, as keeping the name in sync with changes to these underlying properties of the record often proves difficult. It is best to keep a single source of truth for any record's data, and embedding record data into a name can lead to inconsistencies.

Some ways to assist users with better understanding the identity of the sample, without embedding data into the name:

Include read-only columns in LIMS, that are useful for more thoroughly identifying the entity or workflowable resource and its properties. Add only columns which are useful for the protocol and workflow being executed.
Include additional metadata on physical sample labels like creation time, expiration date, lot number, to supplement the name rather than embedding metadata into the name
- L7|ESP Barcodes can also be added to the physical label, and when scanned in the L7|ESP user interface - will bring the user to the data page for the record. L7|ESP listens for barcode scans (effectively very fast keyboard presses) and will attempt to navigate to the record page of the scanned barcode.
Discuss the tradeoffs of using a non-sequence based name and the risk of name duplication

Naming Format Variables

Naming format creation is defined in Section 5.1 of the User Interface Documentation.

// ... snippet of "esp" Configuration
  "id_sequences": {
    "item": [
      {
        "name": "ITEM",
        "format": "ITEM{sequence_number:06}",
        "sequence": "item_sequemce"
      }
    ],
    "entity": [
      {
        "name": "PLASMA",
        "format": "PLSM{sequence_number:06}",
        "sequence": "plasma_sequemce"
      }
    ],
    "experiment": [
      {
        "name": "EXPERIMENT",
        "format": "EXP{sequence_number:06}",
        "sequence": "experiment_sequence"
      }
    ]
  },
  "sequences": {
    "experiment_sequence": 1,
	"plasma_sequence": 1,
	"item_sequence": 1,
  },
// ...

Sequence Naming with `esp` and `espclient` modules in Pipelines

New Entity records can often be created in bulk with espclient, using the count argument. These bulk created entities will use the default naming format for the Entity Type. It is quite common for an Entity Type to only have 1 allowable naming format with a dedicated prefix to better differentiate different Entity Types.

# espclient Pipeline script

from esp.models import Sample, SampleType

# Look up an existing Sample Type model called 'Saliva'
TARGET_SAMPLE_TYPE = 'Saliva'
sample_type = SampleType('Saliva')

# Create 10 Saliva samples, which may have names like SAL000001..SAL000010
samples = Sample.create(count=10, sample_type=sample_type)

Other models in esp and espclient Pipeline modules do not have standard sequence based naming features, but sequence or name request function could be easily set up to calculate the next names for a sequence. The standard REST API might also be used directly for more functionality (rather than the client wrapped REST API usage).

Experiments (Workflow Instances in the ORM models) can use sequence based naming formats via the REST API, in a manner similar to the user interface functionality.

# espclient Pipeline script

from esp.models import Project, Workflow, Sample
import esp.base as ebase

PROJECT_NAME = 'My Project'
WORKFLOW_NAME = 'Add Instrument Maintenance'

# Workflow Experiment creation setup
project = ebase.SESSION.get(f'/api/projects?name={PROJECT_NAME}¶ms=["name","uuid"]').json() # Check for existence of project
if not project:
      project = project.create(PROJECT_NAME) # Create project is does not exist
else:
      project = Project.from_data(project[0])
workflow = Workflow(WORKFLOW_NAME) # Queries and returns the Workflow, if existing
samples = Sample.create(count=10) # Creates 10 Generic Samples for this example

# Create the Experiment, using a Python requests package session
# client modules use the request package, wrapping around the L7|ESP REST API
experiment = ebase.SESSION.post('/api/workflow_instances', json={
               "name": "", # A naming format must be specified in the Workflow definition
               "workflow": workflow.uuid, # if a name is set to None, the REST API will check for a default naming format for the workflow
               "project": project.uuid,
               "submit": True,
               "samples": [x.uuid for x in samples]
       })

Sequence Naming with `lab7` server-side extensions like Python invokables

Server side extensions use the lab7 modules which is a Python interpreter separate from the Pipeline interpreter and client modules esp and espclient. A Python function decorated with the from lab7.extensions import invokable decorator creates a custom endpoint at the path /api/invoke/my_function_name. This can be a possible entrypoint for setting up custom naming logic, including automated naming and bulk creation of other models - particularly using sequence based naming.

The main function of interest for using a sequence is from lab7.sample.api import next_sequence_value. The next_sequence_value function will consume and return the next integer in the sequence. This action is immediately persisted to the database.

This could be used to create an endpoint to create a container with the next name in a sequence. This could be as simple as creating one's own calculation of a name string using the next_sequence_value or a more Configuration driven example like below:

from lab7.extensions import invokable
from sqlalchemy.orm import Session
from typing import Union


@invokable(session="session")
def create_container_autoname(
    agent: str,
    session: Session,
    def_name: str,
    lab7_id_seq: Union[str, None] = None,
    return_dict: bool = True
    **values
):
    """
    Uses the next sequential name to create a new Container.  Requires an update to the "esp" Configuration
    to add a new block to the id_sequences section. "experiment", "entity", and "item" are already part of
    the standard configuration.  Add a new "container" block with the naming formats as you would for
    other model types (see example).  As this is a custom backend feature, there is no way in L7|Master
    to constrain the name format to a subset of Container Types (which is normally required for standard
    autonaming) in this implementation.  Standard UI creation of containers in the Locations app will also
    
    Arguments
    agent: The current User object.
    session: The currently bound SQLAlchemy ESPSession.
    def_name: the name of the container type targeted for creation.  The most recent definition will be used.
    lab7_id_seq: name of naming format to use, otherwise uses default for container type
    **values: keyword argument capture used in general model creation. The kwargs are filtered
    based on keyword match in the function.  Currently accepted keywords include name, desc,
    tags, items, barcode, barcode_type, resource_vals, view_template, augment, status. See REST API for contianer
    for more details
    return_dict: if set to True, will return the dictionary format of the container representation.  Otherwise
    it will return the class instance of the Container ORM model which can be more efficient in more complex
    functionality (i,e. sub-function of a larger invokable)
    
    Returns
    Dictionary or model representation of lab7.container.models.Container - a SQL Alchemy ORM model for a L7|ESP Container, 
    a subclass of Resource.
    
    Configuration
    Container name format configuration block for "esp" Configuration:
    "container": [
      {
        "name": "96-Well Sequence",
        "format": "96W{sequence_number:06}",
        "sequence": "96w_sequence"
      }
    Note that the 96w_sequence integer should be initialized (usually to 1) in the sequences configuration block 
    for the above example.  Left padding shown here to 6 digits, but could be increased depending on the lab's
    throughput.
    """
    from lab7.container.api import query_container_types, create_container
    from lab7.sample.api import next_sequence_value, ListFormatter
    from lab7.utils import configs

    name_context = {}

	# Searches by the Container Type fixed ID and display name for convenience
    # Ultimately we need the Container Type Definition
    # Query existing by fixed_id
    container_types = None
    container_types = query_container_types(
        filters={"barcode": def_name},
        return_dict=False,
        ignore_deleted=True, # default, assume archived models are not intended for use
        session=session,
        agent=agent,
    )
    # If no containers were found by 'fixed_id', try then by name
    if not container_types:
        container_types = query_container_types(
            filters={"name": def_name},
            return_dict=False,
            ignore_deleted=True,
            session=session,
            agent=agent,
        )
	
	# Use the most recent Container Type Definition for the matched Container Type
    container_type = container_types[0]
    current_container_def = container_type.head

	# Use Configuration, similar to Entities, Experiments, and Items to drive naming
    if lab7_id_seq is None:
        lab7_id_seq = current_container_def.meta["lab7_id_sequence"][0]
    tenant_config = configs.get()["esp"]["config"]
    container_id_sequences = tenant_config["id_sequences"].get("container", [])

    seq_name = None
    id_fmt = None
    if lab7_id_seq is not None:
        for container_id_seq in container_id_sequences:
            if container_id_seq["name"] == lab7_id_seq:
                seq_name = container_id_seq["sequence"]
                id_fmt = container_id_seq["format"]
                break
    else:
        raise ValueError("Cannot create container without an ID sequence")

    if not (seq_name and id_fmt):
        raise ValueError(
            "There are no configured ID sequence formats available for this Container. Please assign a Naming Format or provide lab7_id_seq argument with the format name."
        )

    # This function immediately obtains the next integer in a naming sequence
    sequence_value = next_sequence_value(seq_name, session)

    # This block converts the formatting string into a name, by plugging in the text placeholders
    name_context.update(
        dict(
            date=session.info["utc_timestamp"].strftime("%Y-%m-%d"),
            time=session.info["utc_timestamp"].strftime("%H%M%S"),
            datetime=session.info["utc_timestamp"],
            item_type=current_container_def.name,
            sequence_number=sequence_value,
        )
    )

    formatter = ListFormatter()
    try:
        name = formatter.format(id_fmt, **name_context)
    except KeyError:
        raise ValueError(f"Sample could not be created due to an invalid name format, possibly due to an invalid \
        naming variable.  Please update the naming format in the esp Configuration and try again. '{lab7_id_seq}' \
        Format:  {id_fmt}, Name Context Variables: {name_context}")
    

    other_opts = {
        arg: values.get(arg)
        for arg in [
            "name",
            "desc",
            "tags",
            "items",
            "barcode",
            "barcode_type",
            "resource_vals",
            "view_template",
            "augment",
            "status",
        ]
        if values.get(arg, None) is not None
    }

    return create_container(
        current_container_def, session=session, agent=agent, name=name, **other_opts
    )

A custom function might also be used to consume the next N names from a sequence. This should be slightly more efficient as it represents less roundtrips between the application and the database.

def next_sequence_value_bulk(name, count=1, session=None, agent=None):
    """
    Get the next N count integers in the sequence.  Persists sequence usage immediately, even in case of failure.
    """
    from sqlalchemy.sql import text

    if name in configs.get()["esp"]["config"]["sequences"]:
        query = """
            UPDATE esp_sequences SET seq_number = seq_number + :count
            WHERE seq_name = :seq_name
            RETURNING seq_number - :count as the_seq
        """
        bind_data = {"seq_name": name, "count": count}
        rows = session.execute(text(query), bind_data)
        row = rows.first()
        return row["the_seq"] if count == 1 else [row["the_seq"] + i for i in range(count)]
    else:
        raise ValueError(_("Unknown sequence: {}").format(name))

In this section: