DataLakeHouse uses linux shell scripts for creating most of the virtual machines to install the application components that make up the platform.
As such each script after the #!/bin/bash first line of the script, we use a linux set comment. Typically either set -x or set -ex. If you did a web search you probably even had to use the more canonical terms, set dash x, set dash ex, or something similar.
So let’s look at how to use set -ex in linux, and in the below we’ll call attention to each set -x and set -ex:
set -e (set dash e)
TL;DR; set -e will exit the entire script at the point of initial failure.
Bash/Shell scripting is fairly poor at conducting native error handling. If you have an error any where in your logic then most likely you want the subsequent lines not to execute. The reason why we’d want to use this is that if you have logic that has dependencies, and those dependencies fail but the call requiring the dependencies continue then you end up with false positives in your logic. This typically causes wasted development time through troubleshooting.
A typical example is in library and package installations using the package manager. As an example,
yum install -y gcc
yum install -y bzip2
yum install -y wgets
yum install -y curl
If in the example above the call to ‘wgets’ runs, but later in our 500 lines of code we attempt to download an application, it will fail because the machine doesn’t have the ‘wget’ application installed.
And trying to solve this problem using the linux AND ( i.e.: ‘&&’) may work for a small set of code but it doesn’t make since to use the ‘&&’ command to stitch together 500 lines of code, thus using the linux AND to concatenate statement is not scalable.
set -x (set dash x)
set -x does something really cool to your standard out which is adding a ‘+’ symbol to the front of each executing command line of code to standard out.
If you script looks like this:
#!/bin/bash
set -x
# Create the main application use (script runs at start up and runs as root)
useradd --create-home --comment "Account for running this application" --shell /bin/bash dremio
groupadd dremio
usermod -a -G dremio dremio
# create additional folder for artifacts
# Make a dir that may store universally reachable dags and other dremio artifacts (potential future usage)
mkdir -p /home/dremio/artifacts
mkdir -p /opt/dremio/artifacts
Then when executed your code will look similar to the following:
+ # Create the main application use (script runs at start up and runs as root)
+ useradd --create-home --comment "Account for running this application" --shell /bin/bash dremio+
groupadd dremio
+ usermod -a -G dremio dremio
+ # create additional folder for artifacts
+ # Make a dir that may store universally reachable dags and other dremio artifacts (potential future usage)
+ mkdir -p /home/dremio/artifacts
+ mkdir -p /opt/dremio/artifacts
...
What we like about this is that when we terraform DataLakeHouse as the installation script, we can clearly see all the commands executed by the start-up script even if the boot loader for the VM on the cloud provider was still running parts of its process. This allows for a distinctive review of the start up log and helps for troubleshooting.
Summary
At the end of the day we need simplicity and the ability to troubleshoot quickly and without frustration.
Putting set -ex together gives the bit of extra that writing code in linux scripting needs. We use it as a best practice.
Comment below to let us know what other commands you’ve used to help with your bash/shell linux scripts.