System Tuning
This section outlines some recommendations with respect to tuning L7|ESP application servers.
Operating System Tuning
As L7|ESP is cross-compiled and runs on many operating systems; we don’t
currently have any specific requirements for tuning at the operating
system level (apart from hardware sizing recommendations which are in
another document). Depending on the operating system, it’s possible that
some of the default values may not be suitable for a server environment.
One such example of this pertains to the default per-user/application
ulimit
for open file descriptors, which can often be set to a very low
value such as 1024
while a more suitable value for any server
environment may be much higher (e.g. 24000
) depending on the amount of
traffic the server is handling. Our recommendation is to increase these
limits if you find you are bumping against them, however the exact
values chosen may vary widely depending on scale.
Application Tuning
L7 recommends adjusting the following settings in a production L7|ESP installation, contingent on the server running the DB having adequate resources to do so.
Postgresql Configuration
The postgresql configuration (in Lab7_ESP/Data/conf/postgresql.conf) should be modified as follows:
max_connections: should be a minimum of 6x the total number of L7|ESP processes. For instance, if
l7 status
shows:$ l7 status l7-esp.broker RUNNING pid 95176, uptime 4 days, 4:44:44 l7-esp.concierge RUNNING pid 95184, uptime 4 days, 4:44:44 l7-esp.database RUNNING pid 95113, uptime 4 days, 4:44:46 l7-esp.executor RUNNING pid 95185, uptime 4 days, 4:44:44 l7-esp.haproxy RUNNING pid 95178, uptime 4 days, 4:44:44 l7-esp.http:l7-esp.http.0 RUNNING pid 95182, uptime 4 days, 4:44:44 l7-esp.http:l7-esp.http.1 RUNNING pid 95181, uptime 4 days, 4:44:44 l7-esp.http:l7-esp.http.2 RUNNING pid 95180, uptime 4 days, 4:44:44 l7-esp.logger RUNNING pid 95179, uptime 4 days, 4:44:44 l7-esp.notification RUNNING pid 95175, uptime 4 days, 4:44:44 l7-esp.pipeline RUNNING pid 95183, uptime 4 days, 4:44:44 l7-esp.scheduler RUNNING pid 95177, uptime 4 days, 4:44:44
Then you should set a minimum # of connections of 72. For 3-worker configurations, L7 routinely sets this value to 100, and for systems with more web workers (the “http” processes), L7 routinely sets the value to 200 for systems with more web workers.
2. shared_buffers: should be set to ~25% of the available system RAM*, but not more than 8GB. (*assumes you have a dedicated DB server OR enough RAM to handle the load of the DB + the app servers. See below for app server requirements).
3. temp_buffers: Not less than 64MB. L7 routinely sets this to 128MB on dedicated DB servers in production.
4. work_mem: Not less than 256MB. L7 routinely sets this to 1GB on dedicated DB servers in production.
5. maintenance_work_mem: Not less than 128MB. L7 routinely sets this to 256MB on dedicated DB servers in production.
6. effective_cache_size: On a dedicated DB server - 75% of available RAM. On other servers, 2x the shared_buffer size.
timezone: This value should always be set to
'UTC'
L7 also recommends indexing the following columns of the “resource_val” table (note: these indexes will be applied by default starting in L7|ESP 2.4):
The
bound_resource_id
columnThe
step_instance_sample_id
column.
Note
When running Postgres inside a container, it’s important to
configure the shared memory size correctly. Docker provides
containers with only 64MB of shared memory by default, which might
be insufficient for Postgres to operate efficiently, especially
during the execution of parallel queries. To increase the shared
memory size available to Postgres, add the shm_size
option to
the Docker Compose file. Here’s an example:
services:
server:
image: l7esp/server
shm_size: 1gb
The shm_size
value in Docker corresponds to the shared memory
segment in /dev/shm
used by Postgres for parallel queries. The
amount of shared memory a query uses is controlled by work_mem
.
Each query can use up to work_mem
for each node in the
EXPLAIN
plan, for each process. This can significantly increase
if you have many parallel worker processes and/or many
tables/partitions being sorted or hashed in your query.
To calculate a reasonable shm_size
, consider the total RAM
available on the server, the amount allocated to Postgres, and the
value of work_mem
. Also, keep in mind these PostgreSQL settings
that affect shared memory usage:
max_connections
: Each connection to the database server requires additional shared memory, typically a few kilobytes per connection.max_locks_per_transaction
: More locks lead to more shared memory usage, with each additional lock using just a few bytes.max_pred_locks_per_transaction
: This setting, used for predicate locking in Serializable transactions, also consumes additional shared memory, similar tomax_locks_per_transaction
.max_prepared_transactions
: Each prepared transaction consumes additional shared memory, generally in the kilobytes range.wal_buffers
: The memory used to store transaction logs, typically set to a few megabytes.timezone
: The timezone set for PostgreSQL server instance.
On a memory-constrained system, you may want to consider disabling
parallel queries. While parallel queries can make individual queries
faster, they use more resources per query, which might not improve
the overall throughput. To disable parallel queries, set
max_parallel_workers
to 0
.
Remember to adjust the shm_size
value based on actual usage and
requirements, while also considering available resources on the
server and potential performance issues. Allocating too much memory
to shared memory may leave insufficient memory for other operations
within the container.
L7|ESP Worker Configuration
L7|ESP’s web workers are currently set, for most requests, to handle 1 request per worker*. Thus, the number of workers should be carefully considered when examining concurrent traffic load. (*Some routes such as adding/removing samples to/from LIMS worksheets allow concurrent requests to the same worker; the routes that support this will be expanding in L7|ESP 2.5).
At rest, an L7|ESP worker consumes ~250-500mb of RAM, depending on a number of implementation-specific factors. Under large workflow loads (large batch sizes and/or large workflows), the worker memory can spike to ~2GB of RAM. If you observe memory spikes in excess of 2-3GB per worker, please notify L7.
This means a baseline system with L7|ESP running 3 web workers will require ~6-8 GB of RAM for routine operations, excluding the DB needs, to properly service ~5 users.
Another consideration is automated processes (pipelines). In a 3 web worker configuration, all API requests from pipelines are sent to a single web worker and user requests may _also_ be sent to this web worker. For production configurations, L7|ESP recommends a minimum of 4 workers. With 4 workers and a single “executor” thread for pipeline tasks, a maximum of one pipeline tasks will be executed at a time and all pipeline API requests will be sent to the fourth worker; all UI API requests will be routed to the first three workers. For a configuration supporting 10 users with a standalone DB server and an application server with 16GB of RAM, L7 recommends a minimum of 6 web workers as follows:
- Set the
num_workers
key in theexecutor
block of L7|ESP configuration to
2
(this can be done via the Config app in the user interface)
- Set the
- Set the
num_workers
key in theweb_server
block of the L7|ESP configuration to
6
(this can be done via the Config app in the user interface)
- Set the
- Set the
numprocs
value of the[program:l7-esp.http]
stanza of the
supervisord.conf
file to6
- Set the
- Set the
-n
argument to6
in thecommand
key of the [program:l7-esp.haproxy]
stanza.
- Set the
For systems where the DB server is co-located with the application server, the system should have a minimum of 32GB of RAM for production use in environments where large worksheets are anticipated. (For this document, a large worksheet is any worksheet processing > 96 samples through a workflow with a combined total # of columns >32 across all protocols. For instance, a server anticipating large loads should run with 32GB of RAM, an effective_cache_size (postgresql.conf) or 16GB, shared_buffers of 8GB, and number of web workers = 3-6 and number of executors = 1-2 depending on anticipated concurrent user use.
HA Cluster Setup
L7|ESP application servers don’t store files in the database, instead
writing these to disk and storing a reference to the file path in the
database. This is the case for any pipeline scripts that are generated
and executed, the stdout
/stderr
logs we persist for the runs of such
pipelines, and files that are generated by these pipelines and/or
user-uploaded to the system via the Data application or related API
endpoints.
L7|ESP writes all such state to a directory in the server root named
“data”. Depending on the installation, this may also contain logs
generated by the running L7|ESP services and the PostgreSQL data
directory if using the included local database. This works well for
Dockerized use-cases where you may wish to make the filesystem read-only
and capture all persistent state in a volume, though usually logging
would be reconfigured to redirect to stdout
in this case.
This can also be leveraged to create highly available architectures,
where two or more L7|ESP application servers are placed behind a load
balancer and are all accepting traffic, or you wish to have a hot
standby that isn’t accepting traffic in the event that the primary
application server experiences a failure. In such mode, you will want to
mount this directory to a network storage location (such as AWS EFS) and
careful consideration should be given to the supervisord
configuration,
particularly to where log files and PID files are written and, if
written to a shared storage, if the names of such files will collide
and/or cause multiple processes to write to them.
Your options here vary depending on whether you wish these services to
log to disk, to syslog, to stdout (e.g. Docker) and, if logging to disk,
if you wish to log to the shared storage or not (e.g. for backup
purposes). The simplest recommendation in a HA setup is to adjust the
LAB7LOGDIR
and LAB7VARRUN
environment variables to some directory
outside of the Data directory, though it’s also possible to tailor
logging in supervisord.conf to your exact requirements, such as
including the hostname environment variable as part of the log filenames
using the special %(ENV_HOSTNAME)s
syntax.
In the event that the data volume should be mounted to a shared location but sharding occurs (e.g. the mount doesn’t appear on one or more L7|ESP application servers for some reason and they begin writing these files to local disk), recovery should be as simple as merging the directories and files in these folders back together as all automatically generated file and directory names have either the database UUID they reference or a timestamp in their path or filename. Files that are not dynamically generated, such as custom Python scripts that pipelines may execute should always be the same on each application server as these are usually placed there at install time.
Database User and Grants
The L7|ESP application doesn’t need any permissions outside of its own PostgreSQL database so it is recommended to create a single database schema for the application, as well as a dedicated role with matching name, and grant all permissions for that role to that schema.
Currently, we don’t separate “read-only”, “read/write”, “read/write/DDL
modification permissions” users at the application level. The same user
is used in l7 init
(DDL migrations) as in normal application execution.
The credentials stored in the database config file
$LAB7DATA/conf/database.json
are used in both instances.
However, it is possible to switch these credentials with a more
privileged account, run l7 init
to perform any database migrations and
then switch back to the less privileged account credentials for
day-to-day application execution. Note that database migrations will
only occur if you are upgrading the underlying L7|ESP version (e.g.
L7|ESP 2.3.3 to L7|ESP 2.4) but won’t take place during normal upgrades
that only seed new content.
The database configuration file at $LAB7DATA/conf/database.json
should
take the following format when using an external hosted PostgreSQL
solution such as AWS RDS:
{
"host": "postgres",
"port": 5432,
"user": "l7esp",
"pass": "password",
"name": "l7esp",
"start_service": false
}
Regarding encryption in transit, the PostgreSQL driver will automatically negotiate an appropriate connection based on the SSL mode defined on the database server.
Unix Application User
L7|ESP should be installed in Linux user-space and by default will only
listen on user/registered ports, rather than system/well-known ports
below 1024
that require root privileges.
The installation requires some basic tools, such as make
, wget
, bzip2
,
and rsync
. This is enough for the installer to use Miniconda to
bootstrap a Python environment where Ansible will be installed and
automate the complete L7|ESP installation.
As root
, you should install the basic requirements, and create a
non-privileged Linux user/group for the application, and switch to that
user:
root@l7espapp:~# yum install make wget bzip2 rsync
root@l7espapp:~# groupadd l7esp
root@l7espapp:~# useradd --groups l7esp l7esp
root@l7espapp:~# su - l7esp
As the L7|ESP user, you may fetch the deployment bundle, extract it and perform the installation:
l7esp@l7espapp:~$ wget "${ESP_DEPLOYMENT_BUNDLE_URL}"
l7esp@l7espapp:~$ tar xf "${ESP_DEPLOYMENT_BUNDLE_FILENAME}"
l7esp@l7espapp:~$ cd "${CUSTOMER_NAME}"
l7esp@l7espapp:~/${CUSTOMER_NAME}$ make install
To aide in system administration tasks, you may additionally wish to add the following lines to this Linux user’s Bash profile so that common utilities will be made available via $PATH the path environment variable, as well as some other useful environment variables:
l7esp@l7espapp:~$ cat ~/.profile
source /data/ESP/Lab7_ESP/current/bin/env.sh
PATH="/data/ESP/client/bin:$PATH"
l7esp@l7espapp:~$ which l7
/data/ESP/Lab7_ESP/current/bin/l7
l7esp@l7espapp:~$ which esp
/data/ESP/client/bin/esp
l7esp@l7espapp:~$ env \| grep LAB7
LAB7LOGDIR=/data/ESP/Lab7_ESP/Data/log
l7esp@l7espapp:~$ tail -n 0 -f $LAB7LOGDIR/\*
==> /data/ESP/Lab7_ESP/Data/log/l7-esp.http.0.access.log <==
If for some reason, you need to migrate an existing L7|ESP instance to a different Linux account on the same machine (or any other machine for that matter), the steps would be as follows:
Stop the running L7|ESP instance with the
l7 stop
command.If the L7|ESP server is installed in a user’s home directory, move the entire server root directory to the target user’s home directory.
Change file ownership on the entirely server root directory to the new Linux user/group (e.g.
chown -R l7esp:l7esp /data/ESP
).Start the L7|ESP instance back up again with the
l7 start
command.
Application Startup on Boot
The deployment bundle has the ability to install L7|ESP as a systemd
service, however this will require the application user to be able to
escalate to sudo privileges and that the Ansible variable service: True
is set in the roles/container.yml
file in your L7|ESP deployment bundle. You
may also find the template used to create the service in the file
roles/esp/templates/service
and the related Ansible tasks can be found
in the roles/esp/tasks/run.yml
file. If you wish to make changes to any
of this, please let us know so that we may include and test your desired
defaults in future L7|ESP deployment bundle revision you receive from
us.
L7|ESP Revision History
Simply put, all previous versions of workflows, protocols, and entities (samples) are stored in the database. In terms of old/new data values for a given field, L7|ESP captures the full history of changes in the resource_action table.
The resource_action database table provides an audit log for each resource in the system, which can typically be viewed under the History tab in the UI for a given resource.
When it comes to configuration items, such as workflow and protocol versions: a given version of a protocol (e.g.) is immutable. Each time you save a protocol or workflow, you are actually creating a new version of the protocol under the hood. The old version remains unmodified.
Currently, experiments use the most recent version of a workflow, in addition to the most recent version of the protocols nested in this workflow. Once submitted, the samples included in the experiment are “locked” into these versions - subsequent changes to the workflow or protocol definitions will not impact any “in-flight” samples.