Table Of Contents

Introduction

This guide is currently still a work in progress. It grows a little every day. You are invited to make additions or modifications so long as you can keep them accurate (and linguistically correct).

All the information here is presented without any warranty or guarantee of accuracy. Use it at your own risk. When in doubt, please consult the man pages or the GNU info pages as the authoritative references.

BASH is a BourneShell compatible shell, which adds many new features to its ancestor. Most of them are available in the 'KornShell', too. (Bourne Again - you will be ;-) )


About This Guide

This guide aims to become a point of reference for people interested in learning to work with BASH. It aspires to teach its readers good practice techniques in developing scripts for the BASH interpreter and educate them about the internal operation of BASH.

This guide is targeted at beginning users. It assumes no basic knowledge, but rather expects you to have enough common sense to put two and two together. If something is unclear to you, you should report this so that it may be clarified in this document for future readers.

You are invited to contribute to the development of this document by extending it or correcting invalid or incomplete information.

The maintainer(s) of this document:


A Definition

BASH is an acronym for Bourne Again Shell. It is based on the Bourne shell and is mostly compatible with its features.

Shells are applications that provide users with the ability to give commands to their operating system interactively, or to allow them to execute batch processes quickly. In no way are they required for execution of processes; they are merely a layer between system function calls and the user.



  • Shell: A (possibly interactive) layer between the user and the system.
    BASH: The Bourne Again Shell, a Bourne compatible shell.


Using Bash

Most users that think of BASH think of it as a prompt and a command line. That is BASH in interactive mode. BASH can also run in non-interactive mode through scripts. We can use scripts to automate certain logic. Scripts are basically lists of commands that you can type on the command line. When such a script is executed, all these commands are executed sequentially, one after another.

We'll start with the basics in an interactive shell. Once you're familiar with those, you can put them together in scripts.

Important!
You should make yourself familiar with the man and apropos commands on the shell. They will be vital to your self-tutoring.

    $ man man
    $ man apropos



  • Interactive mode: A mode of operation where a prompt asks you for one command at a time. Script: A file that contains a sequence of commands to execute one after the other.


Scripts

A script is basically a sequence of commands that BASH processes in order. It only moves on to the next command when the current one has ended, unless the current one has been executed asynchronously (in the background). Don't worry too much about the latter case yet -- you'll learn about how that works later on.

Virtually any example that you see in this guide can be used in a script just as well as on the command line.

Making a script is easy. You just make a new file, and put this in it at the top:

    #!/bin/bash

This header makes sure that whenever your script is executed, BASH will be used as its interpreter. Please do not be fooled by examples on the Internet that use /bin/sh as interpreter. sh is not bash. Even though sh's syntax and bash's look very much alike and even though most bash scripts will run in sh, a lot of the examples in this guide only apply to bash and will just break or cause unexpected behaviour in sh. Also, please refrain from giving your scripts that stupid .sh extension. It serves no purpose, and it's completely misleading (since it's going to be a bash script, not an sh script).

And by the way, it's perfectly fine if you use Windows to write your scripts, but if at all possible, avoid using Notepad for writing scripts. Microsoft Notepad can only make files with DOS-style line-endings. That means that each line you make in notepad will be ended by two characters: a Carriage Return and a Newline character. BASH reads lines as terminated by Newline characters only. As a result, the Carriage Return character will cause you incredible headache if you don't know it's there (very weird error messages). If at all possible, use a decent editor like Vim, Emacs, kate, GEdit, GVIM or xemacs. If you don't, then you will need to remove the carriage returns from your scripts before running them.

Once your script file has been made, you can run it like this:

    $ bash myscript

In this example, we execute BASH and tell it to run our script. Alternatively, you can give your script executable permissions. When you do this, you can actually execute the script instead of executing BASH with it:

    $ chmod +x myscript
    $ ./myscript

Some people like to keep their scripts in a personal directory. Others like to keep their scripts somewhere in the PATH variable. Most like to do both at once. Here's what I suggest you do:

    $ mkdir -p "$HOME/bin"
    $ echo 'PATH="$HOME/bin:$PATH"' >> "$HOME/.bashrc"

The first command will make a directory called bin in your home directory. The second command will add a line to your .bashrc file which adds the directory we just made to the beginning of the PATH variable. Every new instance of BASH will now check for executable scripts in your bin directory.

To apply the changes we added to .bashrc we obviously need to actually process .bashrc first. You can do that by closing your existing terminal and opening a new one. BASH will then initialize itself again by reading .bashrc among others. Alternatively you can just execute that line of code on the command line (PATH="$HOME/bin:$PATH") or manually process your .bashrc file in the running shell by running source "$HOME/.bashrc".

As a result, we can now put our script in our bin directory and execute it as a normal command (we no longer need to prepend our script's name with its path, which was the ./ part in the previous examples):

    $ mv myscript "$HOME/bin"
    $ myscript


  • Tip:
    While you're defining the interpreter in your header, you might want to take the time to explain your script's function and expected arguments a little too:

    #! /bin/bash
    #
    #   scriptname argument [argument] ...
    #
    # A short explanation of your script's purpose.
    #
    # Copyright [date], [name]


  • Tip:
    You can use this header to specify up to one word of optional arguments that you want to pass to the interpreter. For example, the following arguments will turn on some verbose debugging:

    #! /bin/bash -xv


  • Header: The header of a script determines the application that will function as its interpreter (e.g. bash, sh, perl, ...).


The Basics

The Parser

It is very important that, before you start experimenting, you understand how Bash takes your commands and turns them into something that can be executed.

If not the most practical, this chapter is probably the most important chapter to understand in the entire guide; so pay attention.

  • Step 1: Read data to execute.

    • Bash always reads your script or commands on the bash command prompt line by line. If your line ends with a backslash character, bash reads another line before processing the command and appends that other line to the current, with a literal newline inbetween.

      (I will from here on refer to the chunk of data Bash read in as the line of data; even though it is technically possible that this line contains one or more newlines.)

      • Step Input:
        echo "What's your name?"
        read name; echo "$name"

        Step Output:
        echo "What's your name?"

        • and

        read name; echo "$name"

  • Step 2: Process quotes.

    • Once Bash has read in your line of data, it looks through it in search of quotes. The first quote it finds triggers a quoted state for all characters that follow up until the next quote of the same type. If the quoted state was triggered by a double quote ("..."), all characters except for $, " and \ loose any special meaning they might have. That includes single quotes, spaces and newlines, etc. If the quoted state was triggered by a single quote ('...'), all characters except for ' loose their special meaning. Yes, also $ and \. Therefore, the following command will produce literal output:

          $ echo 'Back\Slash $dollar "Quote"'
         Back\Slash $dollar "Quote"
         
      The fact that the backslash looses its ability to cancel out the meaning of the next character means that this will not work:
          $ echo 'Don\'t do this'
          > 
         

      Bash will ask you for the next line of input because unlike what we thought we did, the second quote, the one we tried to escape with the backslash, actually closed our quoted state meaning the t do this was not quoted. The last quote on the line then opened our quoted state again, and bash asks for more input until it is closed again (it tries to finish step 1: it reads data until it finds an unescaped newline. The opened single quote state is escaping our newline). Now that bash knows which of the characters in the line of data are escaped (stripped of their ability to mean anything special to Bash) and which are not, Bash removes the quotes that were used to determine this from the data and proceeds to the next step.

      • Step Input:
        echo "What's your name?"

        Step Output:
        echo What's your name?

        • (Note: Every character originally between the double quotes has been marked as escaped. I will mark escaped characters in these examples by making them italic.)


  • Step 3: Split the read data into commands.

    • Our line is now split up into separate commands using ; as a command separator. Remember from the previous step that any ; characters that were quoted or escaped do not have their special meaning anymore and will not be used for command splitting. They will just appear in the resulting command line literally:

          $ echo "What a lovely day; and sunny, too!"
         What a lovely day; and sunny, too!
         
      • Step Input:
        read name; echo $name

        Step Output:
        read name

        • and

        echo $name

The following steps are executed for each command that resulted from splitting up the line of data:

  • Step 4: Parse special operators.

    • Look through the command to see whether there are any special operators such as {..}, <(..), < ..., <<< .., .. | .., etc. These are all processed in a specific order. Redirection operators are removed from the command line, other operators are replaced by their resulting expression (eg. {a..c} is replaced by a b c).

      • Step Input:
        diff <(foo) <(bar)

        Step Output:
        diff /dev/fd/63 /dev/fd/62

        • (Note: The <(..) operator starts a background process to execute the command foo (and one for bar, too) and sends the output to a file. It then replaces itself with the pathname of that file.)


  • Step 5: Perform Expansions.

    • Bash has many operators that involve expansion. The simplest of these is $parameter. The dollar sign followed by the name of a parameter, which optionally might be surrounded by braces, is called Parameter Expansion. What Bash does here is basically just replace the Parameter Expansion operator with the contents of that parameter. As such, the command echo $USER will in this step be converted to echo lhunath with me.

      Other expansions include Pathname Expansion (echo *.txt), Command Substitution (rm "$(which nano)"), etc.

      • Step Input:
        echo $PWD has these files that match *.txt : *.txt

        Step Output:
        echo /home/lhunath/docs has these files that match *.txt : bar.txt foo.txt

  • Step 6: Split the command into a command name and arguments.

    • The name of the command Bash has to execute is always the first word in the line. The rest of the command data is split into words which make the arguments. This process is called Word Splitting. Bash basically cuts the command line into pieces wherever it sees whitespace. This whitespace is completely removed and the pieces are called words. Whitespace in this context means: Any spaces, tabs or newlines that are not escaped. (Escaped spaces, such as spaces inside quotes, loose their special meaning of whitespace and are not used for splitting up the command line. They appear literally in the resulting arguments.) As such, if the name of the name of the command that you want to execute or one of the arguments you want to pass contains spaces that you don't want bash to use for cutting the command line into words, you can use quotes or the backslash character:

         My Command /foo/bar   ## This will execute the command named 'My' because it is the first word.
         "My Command" /foo/bar ## This will execute the command named 'My Command' because the space inside the quotes has lost its special meaning allowing it to split words.
         
      • Step Input:
        echo /home/lhunath/docs has these files that match *.txt : bar.txt foo.txt

        Step Output:
        Command Name: 'echo'
        Argument 1: '/home/lhunath/docs has these files that match *.txt :'
        Argument 2: 'bar.txt'
        Argument 3: 'foo.txt'

  • Step 7: Execute the command.

    • Now that the command has been parsed into a command name and a set of arguments, Bash executes the command and sets the command's arguments to the list of words it has generated in the previous step. If the command type is a function or builtin, the command is executed by the same Bash process that just went through all these steps. Otherwise, Bash will first fork off (create a new bash process), initialize the new bash processes with the settings that were parsed out of this command (redirections, arguments, etc.) and execute the command in the forked off bash process (child process). The parent (the Bash that did these steps) waits for the child to complete the command.
      • Step Input:
        sleep 5

        Causes:
        ├┬· 33321 lhunath -bash
        │├──· 46931 lhunath sleep 5

After these steps, the next command, or next line is processed. Once the end of the file is reached (end of the script or the interactive bash session is closed) bash stops and returns the exit code of the last command it has executed.

These steps might seem like common sense after looking at them closely, but they can often seem counter intuitive for certain specific cases. As an example, let me enumerate a few cases where people have often made mistakes against the way they think bash will interprete their command:

  • start=1; end=5; for number in {$start..$end}: Brace Expansion happens in step 4 while Parameter Expansion happens in step 5. Brace Expansion tries to expand {$start..$end} but can't. It sees the $start and $end as strings, not Parameter Expansions and gives up:

    • Step 4 Results:
      start=1
      end=5
      for number in {$start..$end
      }Step 5 Results:
      start=1
      end=5
      for number in {1..5
      }

      And number will now become {1..5} instead of 1. No Brace Expansion has been performed.

  • [ $name = B. Foo ]: Word Splitting will break this example. The test program ([) looks for four arguments in this case. A left hand side, an operator, a right hand side, and a closing ]. To find out what's wrong with this command, do as Bash does: Chop the command up into arguments. Assuming name contains B. Foo:

    • [

    • B.

    • Foo

    • =

    • B.

    • Foo

    • ]

    • A whole lot more than four. You need to use Quotes to prevent the space between B. and Foo from causing Word Splitting. Quote the B. Foo AND the $name so that when $name is expanded, the whitespace in B. Foo is treated the same as on the right hand side.

      It is important to remember that step 5 (Perform Expansion) comes before step 6 (Split the command into a command name and arguments). That means that $name is not safe from having its result cut up, because the cutting up happens after $name is replaced by the value within name.

Commands And Arguments

BASH reads commands from its input (which is either a terminal or a file). These commands can be aliases, functions, builtins, keywords, or executables.

  • Aliasses: Aliasses are a way of shortening commands. They are only used in interactive shells, not in scripts. An alias is a name that is mapped to a certain string. Whenever that name is used as a command name, it is replaced by the string before executing the command.

    • So, instead of executing:
           $ nmap -P0 -A --osscan_limit 192.168.0.1
      You could use an alias like this:
           $ alias nmapp='nmap -P0 -A --osscan_limit'
           $ nmapp 192.168.0.1
  • Functions: Functions in BASH are somewhat like aliasses, but more powerful. Unlike aliasses they can be used in scripts. A function contains shell commands, very much like a small script. When a function is called, the commands in it are executed.

  • Builtins: BASH has some basic commands built into it, such as cd (change directory), if (conditional command execution), and so on. You can think of them as functions that are provided already.

  • Keywords: Keywords are quite like builtins, but the main difference is that special parsing rules apply to them. For example, [ is a bash builtin, while [[ is bash keyword. They are both used for testing stuff, but since [[ is a keyword rather than a builtin, it benefits from a few special parsing rules which make it a lot better:

    •      $ [ a < b ]
          -bash: b: No such file or directory
           $ [[ a < b ]]

      The first example returns an error because bash tries to redirect the file b to the command [ a ] (See File Redirection). The second example actually does what you expect it to. The special character < no longer has it's special meaning of File Redirection operator.

  • Executables: The last option of commands that can be executed in bash is executables. The command name of an executable is always the pathname of to the executable to execute. If the executable is in the current directory; use ./myprogram. If it's in the /usr/local/bin directory, use /usr/local/bin/myprogram.

    • To make life a little easier for you, though, BASH uses a variable that tells it where to find applications in case you just use the name of the application but not its full pathname. This variable is called PATH, and it is a set of directory names separated by colons -- for example, /bin:/usr/bin. When a command is specified in BASH without a pathname (e.g. myprgram, or ls), and it isn't an alias, function, builtin or keyword, BASH searches through the directories in PATH, in order from left to right, to see whether they contain an executable by the name of the command name you typed.

Each command can be followed by arguments. Arguments are words you specify after the command name. Arguments are separated from the command name and from each other by white space. This is important to remember. For example, the following is wrong:

    $ [-f file]

You want the [ command name to be separated from the arguments -f, file and ]. If you do not separate [ and -f from each other with whitespace, bash will think you are trying to execute the command name [-f and look in PATH for a program named [-f. Additionally, the arguments file and ] also need to be separated by spaces. The [ command expects the last argument to be ]. The correct command separates all arguments with spaces:

    $ [ -f file ]

NOTE:
It is very important that you understand how this works exactly. If you don't grasp these concepts well, the quality of your code will degrade significantly and you will introduce very dangerous bugs. Read Argument Splitting very carefully.

    $ ls
    a  b  c

ls is a command that lists files in the current directory. It's intended to be used only for producing human-readable results. Please don't try to parse, pipe, grep, capture, read, or loop over the output of ls in a script. It's dangerous and there's always a better way. While an invaluable tool on the interactive shell, ls should therefore never be used in scripts. You will understand why as you go through this guide.

    $ mkdir d
    $ cd d
    $ ls

mkdir is a command that creates a new directory. We specified the argument d to that command. This way, the application mkdir is instructed to create a directory called d. After that, we use the builtin command cd to change the shell's current directory to d. ls shows us that the current directory (which is now d) is empty, since it doesn't display any filenames.

In BASH scripts, arguments that were passed to the script are saved in 'Positional Parameters'. You can read these by using $1, $2, and so on for the respective argument. You can also use $@ and $* but more about this later on.


  • Tip:
    You can use the type command to figure out the type of a command.
    For example:

    $ type rm
    rm is hashed (/bin/rm)
    $ type cd
    cd is a shell builtin



  • Alias: A name that is mapped to a string. Whenever that name is used as a command, it is replaced by the string it has mapped.
    Function: A name that is mapped to a script. Whenever that name is used as a command, the script is called with the arguments provided to the function's name on the command line.
    Builtin: Certain features have been built into BASH. These are handled internally whenever they are executed on the command line (and often do not create a new process).
    Application: A binary that can be executed by referring to it (/bin/ls) or if its location is in your PATH variable, you can execute it simply by using its name (ls).


Argument Splitting

Commands in BASH can take multiple arguments. These arguments are used to tell the command exactly what it's supposed to do. In BASH, you separate these arguments by whitespace (spaces and tabs).

Assume you're in an empty directory. (If you want to try this code out; you can create and go into an empty directory called test by running: mkdir test; cd test.)

    $ ls                # List files in the current directory (no output: no files).
    $ touch a b c       # Create files 'a', 'b' and 'c'.
    $ ls                # List all files again; this time the output shows 'a', 'b' and 'c'.
    a  b  c

touch is an application that changes the 'Last Modified'-time of a certain file to the current time. If the filename that it's given does not exist yet, it simply creates that file, as a new and empty file. In this example, we passed three arguments. touch creates a file for each argument. ls shows us that three files have been created.

    $ rm *              # Remove all files in the current directory.
    $ ls                # List files in the current directory (no output: no files).
    $ touch a   b c     # Create files 'a', 'b' and 'c'.
    $ ls                # List all files again; this time the output shows 'a', 'b' and 'c'.
    a  b  c

rm is an application that removes all the files that it was given. * is a glob. It basically means all files in the current directory. You will read more about this later on.

Now, did you notice that there are several spaces between a and b, and only one between b and c? Also, notice that the files that were created by touch are no different than the first time. You now know that the amount of whitespace between arguments does not matter. This is important to know. For example:

    $ echo This is a test.
    This is a test.
    $ echo This    is    a    test.
    This is a test.

In this case, we provide the echo command with four arguments. 'This', 'is', 'a' and 'test.'. echo takes these arguments, and prints them out one by one with a space in between. In the second case, the exact same thing happens. The extra spaces make no difference. If we actually want the extra whitespace, we need to pass the sentence as one single argument. We can do this by using quotes:

    $ echo "This    is    a    test."
    This    is    a    test.

Quotes group everything inside them into a single argument. This argument is 'This    is    a    test.', properly spaced. echo prints this single argument out just like it always does.

Be very careful to avoid the following:

    $ ls                                          # List files in the current directory.
    The secret voice in your head.mp3  secret
    $ rm The secret voice in your head.mp3        # Executes rm with 6 arguments; not 1!
    rm: cannot remove `The': No such file or directory
    rm: cannot remove `voice': No such file or directory
    rm: cannot remove `in': No such file or directory
    rm: cannot remove `your': No such file or directory
    rm: cannot remove `head.mp3': No such file or directory
    $ ls                                          # List files in the current directory: It is still there.
    The secret voice in your head.mp3

You need to make sure you quote filenames properly. If you don't you'll end up deleting the wrong things! rm takes filenames as arguments. If you do not quote filenames with spaces, rm thinks that each argument is a separate file. Since BASH splits your arguments at the spaces, rm will try to remove each word. The above example tried to delete files for each word in the filename of the song, instead of the filename of the song. That caused our file secret to be deleted, and our song to remain behind!

Please have a good look at http://bash-hackers.org/wiki/doku.php?id=syntax:words if all this isn't very clear to you yet.


  • Good Practice:
    You should always quote sentences or strings that belong together, even if it's not absolutely necessary. This will keep you alert and reduce the risk of human error in your scripts.
    For example, you should always quote arguments to the echo command.



  • Arguments: These are the optional additional words you can specify when running commands. They are appended to the command's name ('ls -l foo' executes ls with two arguments).
    Quotes: The two forms of quotes (' and ") are used to protect certain special characters inside them from being interpreted as special by BASH. The difference between ' and " will be discussed later.


Special Characters

There are several special characters in BASH that have a non-literal meaning. When we use these characters, BASH evaluates these characters and their meaning, but usually does not pass them on to the underlying commands.

Here are a few of those special characters, and what they do:

  • [whitespace]: Whitespace (spaces, tabs and newlines). BASH uses whitespace to determine where words begin and end. The first word of each command is used as the command name; any additional words become arguments to that command.

  • "text": Double quotes. Double quotes protect the text inside from being split into multiple words or arguments. They also prevent the special meaning of single quotes inside. However, other special characters retain their special meanings.

  • 'text': Single quotes. Single quotes protect the text inside from any kind of expansion by the shell and keeps it from being split into multiple words or arguments. They also prevent the special meaning of all special characters inside.

  • # text: Comment character. Any word beginning with # begins a comment that extends to the next newline. Comments are not processed by the shell.

  • ;: Command separator. The semicolon is used to separate multiple commands from each other if the user chooses to keep them on the same line. It's basically the same thing as a newline.

  • \: Escape character. The backslash protects the next character from being used in any special sort of way.

  • > or <: Redirection character. These characters are used to modify (redirect) the input and/or output of a command.

  • [[ expression ]]: Test expression. This evaluates the conditional expression.

  • { commands; }: Command Group. This executes the commands inside the braces as though they were only one command. It is convenient for places where BASH syntax requires only one command to be present.

  • `command`, $(command): Command substitution (The latter form is highly preferred). Command substitution executes the command inside the substitution form first, and replaces itself by that command's output.

  • (command): Subshell Execution. This executes the command in a new bash shell, instead of in the current.

  • ((expression)): Arithmetic Evaluation. Inside the parentheses, operators such as +, -, * and / are seen as mathematical operators.

  • $((expression)): Arithmetic Expansion. Comparable to the above, but this expression is replaced with the result of its arithmetic evaluation.

  • $: Expansion character. This character is used for any form of parameter expansion. More about this later.

Some examples:

    $ echo "I am $LOGNAME"
    I am lhunath
    $ echo 'I am $LOGNAME'
    I am $LOGNAME
    $ # boo
    $ echo An open\ \ \ space
    An open   space
    $ echo "My computer is $(hostname)"
    My computer is Lyndir
    $ echo boo > file
    $ echo $(( 5 + 5 ))
    10
    $ (( 5 > 0 )) && echo "Five is bigger than zero."
    Five is bigger than zero.



  • Special Characters: Characters that have a special meaning to BASH. Usually their meaning is interpreted and afterwards they are removed from the command before executing it.


Parameters

Parameters should be seen as a sort of named space in memory where you can store your data. Generally speaking, they will store string data, but can also be used to store integers or arrays.


  • Parameters: Parameters store data that can be retrieved through a symbol or a name.


Special Parameters and Variables

Let's get our vocabulary straight before we get into the real deal. There are Parameters and Variables. Variables are actually just a kind of parameters: parameters that are denoted by a name. Parameters that aren't variables are called Special Parameters. I'm sure you'll understand things better with a few examples:

    $ # Some parameters that aren't variables:
    $ echo My shell is $0, and was started with these options: $-
    My shell is -bash, and was started with these options: himB
    $ # Some parameters that ARE variables:
    $ echo I am $LOGNAME, and I live at $HOME.
    I am lhunath, and I live at /home/lhunath.

Please note: Unlike PHP/Perl/... parameters do NOT start with a $-sign. The $-sign you see in the examples merely causes the parameter that follows it to be expanded. Expansion basically means that the shell replaces it by its content. As such, LOGNAME is the parameter (variable), that contains your username. $LOGNAME will be replaced with its content; which in my case, is lhunath.

I think you've got the drift now. Here's a summary of most Special Parameters:

  • 0: Contains the name of the script. (This is not always reliable.)

  • Positional Parameters: 1, 2, ...; They contain the arguments that were passed to the current script.

  • *: Expands to all the words of all the positional parameters. If double quoted, it expands to a single string containing all the positional parameters separated by the first character of the IFS variable (which will be discussed later).

  • @: Expands to all the words of all the positional parameters. If double quoted, it expands to a list of all the positional parameters as individual words.

  • #: Expands to the number of positional parameters that are currently set.

  • ?: Expands to the exit code of the most recently completed foreground command.

  • $: Expands to the PID (process ID number) of the current shell.

  • !: Expands to the PID of the application most recently executed in the background.

  • _: Expands to the last argument of the last command that was executed.

And here are some examples of Variables that the shell initializes for you:

  • BASH_VERSION: Contains a string describing the version of BASH.

  • HOSTNAME: Contains the hostname of your computer, I swear.

  • PPID: Contains the PID of the parent process of this shell.

  • PWD: Contains the current directory.

  • RANDOM: Each time you expand this variable, a random number between 0 and 32767 is generated.

  • UID: The ID number of the current user.

  • COLUMNS: The number of characters that fit on one line in your terminal. (The width of your terminal in characters.)

  • LINES: The number of lines that fit in your terminal. (The height of your terminal in lines.)

  • HOME: The current user's home directory.

  • PATH: A colon-separated list of paths that will be searched to find the executable for a command that is executed, if it is not an alias, function or builtin command (or absolutely referenced).

  • PS1: Contains a string that describes the format of your shell prompt.

  • TMPDIR: Contains the directory that is used to store temporary files (by the shell).

Of course, you aren't restricted to only these variables. Feel free to define your own:

    $ country=Canada
    $ echo "I am $LOGNAME and I currently live in $country."
    I am lhunath and I currently live in Canada.

Notice what we did to assign the value Canada to the variable country. Remember that you are NOT allowed to have any spaces before or after that equals sign!

    $ language = PHP
    -bash: language: command not found
    $ language=PHP
    $ echo "I'm far too used to $language."
    I'm far too used to PHP.

Remember that BASH is not Perl or PHP. You need to be very well aware of how expansion works to avoid big trouble. If you don't, you'll end up creating very dangerous situations in your scripts, especially when making this mistake with rm:

    $ ls
    no secret  secret
    $ file='no secret'
    $ rm $file
    rm: cannot remove `no': No such file or directory

Imagine we have two files, no secret and secret. The first contains nothing useful, but the second contains the secret that will save the world from impending doom. Unthoughtful as you are, you forgot to quote your parameter expansion of file. BASH expands the parameter and the result is rm no secret. BASH splits the arguments up by their whitespace as it normally does, and rm is passed two arguments; 'no' and 'secret'. As a result, it fails to find the file no and it deletes the file secret. The secret is lost!


  • Good Practice:
    You should always keep parameter expansions well quoted. This prevents the whitespace or the possible globs inside of them from giving you gray hair or unexpectedly wiping stuff off your computer. The only good PE, is a quoted PE.




  • Variable: A variable is a kind of parameter that you can create and modify directly. It is denoted by a name, which must begin with a letter, and must consist only of letters, digits, and the underscore (_). Variable names are case-sensitive.
    Expansion: Expansion happens when a parameter is prefixed by a dollar sign. BASH takes the parameter's value and replaces the parameter's expansion by its value before executing the command.


Variable Types

Although BASH is not a typed language, it does have a few different types of variables. These types define the kind of content they have. They are stored in the variable's attributes.

Attributes are settings for a variable. They define the way the variable will behave. Here are the attributes you can assign to a variable:

  • Array: (declare -a [variable]): The variable is an array of strings.

  • Integer: (declare -i [variable]): The variable holds an integer. Assigning values to this variable automatically triggers Arithmetic Evaluation.

  • Read Only: (declare -r [variable]): The variable can no longer be modified or unset.

  • Export: (declare -x [variable]): The variable is marked for export which means it will be inherited by any subshell.

Arrays are basically lists of strings. They are very convenient for their ability to store different elements without relying on a delimiter. That way, you don't need to worry about the fact that this delimiter could possibly end up being part of an element's content and thus split that element up:

    $ files='one:two:three:four'

Here we try to use a string to contain a list of files. To do that, we need to rely on a delimiter to keep the files apart. We choose ':'. As a result, we cannot add any files to the list that have a ':' in their filename. That's why arrays are so convenient:

    $ files=( 'one' 'two' 'three' 'four' '5: five' )

As shown above, you can assign arrays using (...). In this case, elements are separated by whitespace; but you can protect an element's whitespace with quotes. If you want to use some form of expansion to assign values to an array, rather than literal, be aware that BASH will obviously need to perform some form of word splitting to figure out which parts of your expansion should be put in which elements of the array:

    $ files='one:two:three:four'
    $ IFS=:
    $ files=( $files )

For this word splitting, BASH looks at the first character in IFS again. There, it finds the delimiter to use for splitting the result of the expansion up in elements.

Defining variables as integers has the advantage that you can leave out some syntax when trying to assign or modify them:

    $ a=5; a+=2; echo $a; unset a
    52
    $ a=5; let a+=2; echo $a; unset a
    7
    $ declare -i a=5; a+=2; echo $a; unset a
    7
    $ a=5+2; echo $a; unset a
    5+2
    $ declare -i a=5+2; echo $a; unset a
    7


  • String: A string is a sequence of characters.
    Array: An array is a list of strings that does not use a delimiter to separate them.
    Integer: An integer is a form of data that can only contain digits.
    Read Only: Parameters that are read-only cannot be modified or unset.
    Export: Variables that are marked for export will be inherited by any subshell.


Parameter Expansion

Parameter Expansion is the term that refers to any operation that causes a parameter to be expanded (replaced by content). In its most basic appearance, the parameter expansion of a parameter is achieved by prefixing that parameter with a $ sign. In certain situations, additional curly brackets around the parameter's name are required:

    $ echo "'$USER', '$USERs', '${USER}s'"
    'lhunath', '', 'lhunaths'

This example illustrates what basic parameter expansions look like. The second PE results in an empty string. That's because the parameter USERs is empty. We did not intend to have the s be part of the parameter name. Since there's no way for BASH to determine whether you want the s appended to the name of the parameter or its value you need to use curly brackets to mark the beginning and end of the parameter name. That's what we do in the third PE in our example above.

Parameter Expansion can also be used to modify the string that will be expanded. These operations are terribly convenient:

    $ for file in *.JPG *.jpeg
    > do mv "$file" "${file%.*}.jpg"
    > done

The code above can be used to rename all JPEG files with a JPG or a jpeg extension to have a normal jpg extension. The PE (${file%.*}) cuts off everything from the end until it finds a period (.). Then, in the same quotes, a new extension is appended to the expansion result.

Here's a summary of most PEs that are available:

  • ${parameter:-word}: Use Default Values. If 'parameter' is unset or null, the expansion of 'word' is substituted. Otherwise, the value of 'parameter' is substituted.

  • ${parameter:=word}: Assign Default Values. If 'parameter' is unset or null, the expansion of 'word' is assigned to 'parameter'. The value of 'parameter' is then substituted.

  • ${parameter:?word}: Display Error if Null or Unset. If 'parameter' is null or unset, the expansion of 'word' (or a message to that effect if 'word' is not present) is written to the standard error and the shell, if it is not interactive, exits. Otherwise, the value of 'parameter' is substituted.

  • ${parameter:+word}: Use Alternate Value. If 'parameter' is null or unset, nothing is substituted, otherwise the expansion of 'word' is substituted.

  • ${parameter:offset:length} Substring Expansion. Expands to up to 'length' characters of 'parameter' starting at the character specified by 'offset'. If 'length' is omitted, expands to the substring of 'parameter' starting at the character specified by 'offset'.

  • ${#parameter}: The length in characters of the value of 'parameter' is substituted.

  • ${parameter#pattern}: The 'pattern' is anchored to the beginning of 'parameter'. The result of the expansion is the expanded value of 'parameter' with the shortest match deleted.

  • ${parameter##pattern}: The 'pattern' is anchored to the beginning of 'parameter'. The result of the expansion is the expanded value of 'parameter' with the longest match deleted.

  • ${parameter%pattern}: The 'pattern' is anchored to the end of 'parameter'. The result of the expansion is the expanded value of 'parameter' with the shortest match deleted.

  • ${parameter%%pattern}: The 'pattern' is anchored to the end of 'parameter'. The result of the expansion is the expanded value of 'parameter' with the longest match deleted.

  • ${parameter/pattern/string}: The 'pattern' is not anchored but evaluated from left to right in the value of 'parameter'. The result of the expansion is the expanded value of 'parameter' with the first match of 'pattern' replaced by 'string'.

  • ${parameter//pattern/string}: As above, but every match of 'pattern' is replaced.

You will learn them all through experience. They will come in handy far more often than you think they might. Here's a few examples to kickstart you:

    $ file="$HOME/.secrets/007"; \
    > echo "File location: $file"; \
    > echo "Filename: ${file##*/}"; \
    > echo "Directory of file: ${file%/*}"; \
    > echo "Non-secret file: ${file/secrets/not_secret}"; \
    > echo; \
    > echo "Other file location: ${other:-There is no other file}"; \
    > echo "Using file if there is no other file: ${other:=$file}"; \
    > echo "Other filename: ${other##*/}"; \
    > echo "Other file location length: ${#other}"
    File location: /home/lhunath/.secrets/007
    Filename: 007
    Directory of file: /home/lhunath/.secrets
    Non-secret file: /home/lhunath/.not_secret/007
    Other file location: There is no other file
    Using file if there is no other file: /home/lhunath/.secrets/007
    Other filename: 007
    Other file location length: 26

Remember the difference between ${v#p} and ${v##p}. The doubling of the # character means metacharacters will become greedy. The same goes for %:

    $ version=1.5.9; echo "MAJOR: ${version%%.*}, MINOR: ${version#*.}."
    MAJOR: 1, MINOR: 5.9.
    $ echo "Dash: ${version/./-}, Dashes: ${version//./-}."
    Dash: 1-5.9, Dashes: 1-5-9.

Note: You cannot nest PEs together. If you need to execute multiple PEs on a parameter, you will need to use multiple statements:

    $ file=$HOME/image.jpg; file=${file##*/}; echo "${file%.*}"
    image


  • Good Practice:
    You may be tempted to use external applications such as sed, awk, cut, perl or others to modify your strings. Be aware that all of these require an extra process to be started, which in some cases can cause slowdowns. Parameter Expansions are the perfect alternative.




  • Parameter Expansion: Any expansion (see earlier definition) of a parameter. Certain operations are possible during this expansion that are performed on the value that will be expanded.


Patterns

Patterns are strings that are used to match a whole range of strings. They have a special format depending on the pattern dialect which describes the kinds of strings that they match. Regular Expression patterns can even be used to grab certain pieces out of the strings they match.

On the command line you will mostly use Glob Patterns. They are a fairly straight-forward form of patterns that can easily be used to match a range of files.

Since version 3.0, BASH also supports Regular Expression patterns. These will be useful mainly in scripts to test user input or parse data.


  • Pattern: A pattern is a string with a special format designed to be a sort of key that matches several other strings of a kind.


Glob Patterns

Globs are a very important concept in BASH, if only for their incredible convenience. Properly understanding globs will benefit you in many ways. Globs are basically patterns that can be used to match filenames or other strings.

Globs are composed of normal characters and meta characters. Meta characters are characters that have a special meaning. These are the basic meta characters:

  • *: Matches any string, including the null string.

  • ?: Matches any single character.

  • [...]: Matches any one of the enclosed characters.

Here's an example of how we can use glob patterns to expand to filenames:

    $ ls
    a  abc  b  c
    $ echo *
    a abc b c
    $ echo a*
    a abc

BASH sees the glob, for example a*. It expands this glob, by looking in the current directory and matching it against all files there. Any filenames that match the glob, are enumerated and used in place of the glob. As a result, the statement echo a* is replaced by the statement echo a abc, and is then executed.

BASH will always make sure that whitespace and special characters are escaped properly when expanding the glob. For example:

    $ touch "a b.txt"
    $ ls
    a b.txt
    $ rm *
    $ ls

Here, rm * is expanded into rm a\ b.txt. This makes sure that the string a b.txt is passed as a single argument to rm, since it represents a single file. It is important to understand that using globs to enumerate files is nearly always a better idea than using ls for that purpose. Here's an example with some more complex syntax which we will cover later on, but it will illustrate the problem very well:

    $ ls
    a b.txt
    $ for file in `ls`; do rm "$file"; done
    rm: cannot remove `a': No such file or directory
    rm: cannot remove `b.txt': No such file or directory
    $ for file in *; do rm "$file"; done
    $ ls

Here we use the for command to go through the output of the ls command. The ls command results in a string a b.txt. The for command splits that string into arguments over which it iterates. As a result, for iterates over a and b.txt. Naturally, this is not what we want. The glob however expands in the proper form. It results in the string a\ b.txt, which for takes as a single argument.

BASH also supports a feature called Extended Globs. These globs are more powerful in nature. This feature is turned off by default, but can be turned on with the shopt command, which is used to toggle shell options:

    $ shopt -s extglob
  • ?(list): Matches zero or one occurrence of the given patterns.

  • *(list): Matches zero or more occurrences of the given patterns.

  • +(list): Matches one or more occurrences of the given patterns.

  • @(list): Matches one of the given patterns.

  • !(list): Matches anything except one of the given patterns.

The list inside the parentheses is a list of globs separated by the | character. Here's an example:

    $ ls
    names.txt  tokyo.jpg  california.bmp
    $ echo !(*jpg|*bmp)
    names.txt

Our glob now expands to anything that does not match the *jpg or the *bmp pattern. Only the text file passes for that, so it is expanded.

Then, there is Brace Expansion. Brace Expansion technically does not fit in the category of Globs, but it is similar. Globs only expand to actual filenames, where brace expansion will expand to any permutation of the pattern. Here's how they work:

    $ echo th{e,a}n
    then than
    $ echo {/home/*,/root}/.*profile
    /home/axxo/.bash_profile /home/lhunath/.profile /root/.bash_profile /root/.profile
    $ echo {1..9}
    1 2 3 4 5 6 7 8 9
    $ echo {0,1}{0..9}
    00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19


  • Good Practice:
    You should always use globs instead of ls (or similar) to enumerate files. Globs will always expand safely and minimize the risk for bugs.
    You can sometimes end up with some very weird filenames. Generally speaking, scripts aren't always tested against all the odd cases that they may end up being used with.




  • Glob: A glob is a string composed of glob meta characters that can match certain strings or filenames.


Regular Expressions

Regular Expressions (regex) are similar to Glob Patterns but cannot be used for filename matching in BASH. Since 3.0 BASH supports the =~ operator to the [[ built-in. This operator matches the string that comes before it against the regex pattern that follows it. When the string matches the pattern, [[ returns with an exit code of 0 ("true"). If the string does not match the pattern, an exit code of 1 ("false") is returned. In case the pattern's syntax is invalid, [[ will abort the operation and return an exit code of 2.

BASH uses the Extended Regular Expression (ERE) dialect. I will not teach you about regex in this guide, but if you are interested in this concept, please read up on Extended Regular Expressions or Google for a tutorial.

Regular Expression patterns that use capturing groups will have their captured strings assigned to the BASH_REMATCH variable for later retrieval.

Let's illustrate how regex can be used in BASH:

    $ if [[ $LANG =~ (..)_(..) ]]
    > then echo "You live in ${BASH_REMATCH[2]} and speak ${BASH_REMATCH[1]}."
    > else echo "Your locale was not recognised"
    > fi

Be aware that regex parsing in BASH has changed between releases 3.1 and 3.2. Before 3.2 it was safe to wrap your regex pattern in quotes but this has changed in 3.2. Since then, regex should always be unquoted. You should protect any special characters by escaping it using a backslash.

    $ [[ "My sentence" =~ My\ sentence ]]

Be careful to escape any characters that the shell could misinterpret, such as whitespace, dollar signs followed by text, braces, etc.


  • Good Practice:
    Since the way regex is used in 3.2 is also valid in 3.1 we highly recommend you just never quote your regex. Remember to keep special characters properly escaped!

  • For cross-compatibility (to avoid having to escape parentheses, pipes and so on) use a variable to store your regex eg re='^\*( >| *Applying |.*\.diff|.*\.patch)'; [[ $var =~ $re ]] This is much easier to maintain since you only write ERE syntax and avoid the need for shell-escaping, as well as being compatible with all 3.x BASH.




  • Regular Expression: A regular expression is a more complex pattern that can be used to match specific strings (but unlike globs cannot expand to filenames).


Tests and Conditionals

Sequential execution of applications is one thing, but to achieve a sort of logic in your scripts or your command line one-liners, you'll need variables