4-12: PHP programming language basics – Writing and using your own functions

Why functions are a convenient way to structure code

We have seen how PHP offers a wide variety of built-in or predefined functions to deal with strings, variables, arrays, numbers regular expressions and more. There are times however in which no built-in functions will allow to perform easily a more specific task, for which you have to write your own code. Embedding an original code that performs a well defined task in the form of a function has several advantages. Just to name a few:

  • Code in functions is re-usable
  • It makes you main code more readable
  • Your code main will be shorter
  • Your code main will be easier to maintain
  • You can modify the function code or add functionality in a single central place

Storing functions in a dedicated file and then importing it into scripts

Typically although not necessarily, your functions will be placed in a dedicated file, separated from the main script or scripts, called for example functions.php and contained in a directory called “include”.

Then, this code is imported into your scripts by using an include statement, often placed at the top of the scripts. On including the functions file, or any other PHP code, within a script with an include statement, the included code will be executed once. If a function was defined in the functions file, it’s name will then be available to use within the script.

Writing a function in PHP, basic syntax

For now, let’s place the code for our functions in the same file as the rest of the script code (which is perfectly fine if we define the functions before using them in the flow of the script) and concentrate of how to write a function.

In order to define a function, we need 3 parts: the function name, round brackets to specify the arguments and curly brackets for the actual function code.

Let’s start by writing an exceedingly simple and useless function, that takes no arguments, just to highlight a couple of points:

We have defined a function with name “meaning_of_life”.

This function, on call, will just return a number (42). This is accomplished through a return statement. Within functions, it is common to include a return statement. If no return statement is used, the function will just return the last computed value or declared variable. However we do encourage you, for the sake of code readability, to always include a return statement in your functions.

  • In order to call the function, you will write the function name followed by round brackets.
  • If the function requires arguments, arguments must be provided.
  • Some functions may require no arguments, one argument, two argument or more. For a successful function call the right number of arguments must be passed to the function, or PHP will give an error and the script will not execute correctly.
  • Some arguments may be optional. Optional arguments are passed to the function after the mandatory arguments.

Here’s a slightly more useful function (the truth? not quite, really), multiply_and_add(), that performs a simple mathematical operation. Mind that the following example is meant more to be read than executed.

Understanding name spaces

A fundamental concept in programming is that within code you may have several “name spaces” that are totally segregated from each other. For example on defining a function you will give names to the argument variables and these names will only exist within the name space of the function itself.

If you use some variable names within a function, for example, $i, $j or any other variable name, you do not have to take care to avoid those variable names within a script code that uses the function, as the name space of the script is segregated from the name space of the function itself.

Similarly, a variable name used in a script is not available within a function. In order to use the value of a variable defined in a script, within a function, the only way is to pass this value, or the variable, as argument to the function. Once passed as argument, this value may well get a different name inside the function’s name space. In the sample code above, the value of a variable called $x in the script (3) was attributed to a variable called $a inside the function space name because $x was passed as first argument to the function and at function definition time the first argument was called $a. It is crucial that these concepts are well understood in order to write working, error free code.

Making arguments optional by assigning defaults

Some argument can be made optional for the function call by assigning defaults to them at the time of function definition. We now rewrite the multiply_and_add() function by setting 10 as default value for the number to be added (third argument, $c).

Once an argument is optional:

– if we do not provide the argument at function call, the default assigned in the function definition will be used
– if we do provide the argument, the value given will be used instead of the default one (it will override the default)

Therefore, with the third argument optional, we will be able to call multiply_and_add() with either 2 or 3 arguments, both calls will be correct and issue no errors.

Writing a reverse-complement function

In section 4-7 we wrote a snippet of code that allows to get the reverse-complement of a DNA sequence. Let us transform this code into a function. We will write it so that we can reverse, complement or reverse complement an input sequence based on an optional argument passed at call time.

This was the original code:

And this could be our reverse complement function revcomp():

Here’s the output of this code:

Input Sequence
ATGGTGAAGCAGATCGA

Reverse Complement
TCGATCTGCTTCACCAT

Complement
TACCACTTCGTCTAGCT

Reverse
AGCTAGACGAAGTGGTA

Writing a “sequence breaker” seqbreak() function

Long uninterrupted strings given in output in a web page have the potential to break the page layout by creating a very wide page which requires horizontal scrolling to be viewed fully. This can easily happen with biological sequences, for example. Let us write an handy function able to introduce break tags in a string every 80 characters. This is of obvious use as an aid to format a sequence according to the FASTA format. To make the function a little more widely usable, let us set the default to 80 but also leave the opportunity to change that at call time with an optional argument. Let’s also make it so that the break tag could be replaced with something else if we wish so.

You can run this code live here.

Writing a function to handle FASTA sequences

In the previous section we wrote a piece of code to process a FASTA sequence so that we could have the FASTA header line and the sequence stored in 2 different variables.

Here is the relevant part of the code:

What about turning this quite useful little piece of code, that we may indeed want to re-use often in web applications for bioinformatics, into a process_fasta() function? Let’s do it!

To make it a bit more interesting, let us add here as well, as we did for the revcomp() function above, a $mode argument that will allow us to tune what the function returns. We will make it so that if $mode is “all”, an array of two elements will be returned, with the header line as first element and the sequence as second element. If $mode is “seq”, the function will only return the sequence. If $mode is “header”, the function will only return the header. The default will be “all”. This flexibility will make the function slightly more adaptable to different requirements of the script in which we call it. Maybe at times we will just need to get out the header, while we may just need the sequence for other implementations.

Here’s the output of this script:

Output of process_fasta() with mode “all”

array(2) {
[0]=>
string(124) “>gi|28373620|pdb|1MA9|A Chain A, Crystal Structure Of The Complex Of Human Vitamin D Binding Protein And Rabbit Muscle Actin”
[1]=>
string(458) “LERGRDYEKNKVCKEFSHLGKEDFTSLSLVLYSRKFPSGTFEQVSQLVKEVVSLTEACCAEGADPDCYDTRTSALSAKSCESNSPFPVHPGTAECCTKEGLERKLCMAALKHQPQEFPTYVEPTNDEICEAFRKDPKEYANQFMWEYSTNYGQAPLSLLVSYTKSYLSMVGSCCTSASPTVCFLKERLQLKHLSLLTTLSNRVCSQYAAYGEKKSRLSNLIKLAQKVPTADLEDVLPLAEDITNILSKCCESASEDCMAKELPEHTVKLCDNLSTKNSKFEDCCQEKTAMDVFVCTYFMPAAQLPELPDVELPTNKDVCDPGNTKVMDKYTFELSRRTHLPEVFLSKVLEPTLKSLGECCDVEDSTTCFNAKGPLLKKELSSFIDKGQELCADYSENTFTEYKKKLAERLKAKLPEATPTELAKLVNKRSDFASNCCSINSPPLYCDSEIDAELKNIL”
}

Output of process_fasta() with mode “header”

string(124) “>gi|28373620|pdb|1MA9|A Chain A, Crystal Structure Of The Complex Of Human Vitamin D Binding Protein And Rabbit Muscle Actin”

Output of process_fasta() with mode “seq”

string(458) “LERGRDYEKNKVCKEFSHLGKEDFTSLSLVLYSRKFPSGTFEQVSQLVKEVVSLTEACCAEGADPDCYDTRTSALSAKSCESNSPFPVHPGTAECCTKEGLERKLCMAALKHQPQEFPTYVEPTNDEICEAFRKDPKEYANQFMWEYSTNYGQAPLSLLVSYTKSYLSMVGSCCTSASPTVCFLKERLQLKHLSLLTTLSNRVCSQYAAYGEKKSRLSNLIKLAQKVPTADLEDVLPLAEDITNILSKCCESASEDCMAKELPEHTVKLCDNLSTKNSKFEDCCQEKTAMDVFVCTYFMPAAQLPELPDVELPTNKDVCDPGNTKVMDKYTFELSRRTHLPEVFLSKVLEPTLKSLGECCDVEDSTTCFNAKGPLLKKELSSFIDKGQELCADYSENTFTEYKKKLAERLKAKLPEATPTELAKLVNKRSDFASNCCSINSPPLYCDSEIDAELKNIL”

Everything is working out nicely: on calling the function with just one argument, in the default “all” mode, we get an array with the header as first element and the sequence as second element, while in the other 2 modes we get just a string, the header or the sequence, depending if we use the “header” or “seq” mode.

Let us move the revcomp() and process_fasta() to a functions.php file and use them together in a script.

We will create a directory for this exercise called test_fasta.

test_fasta contains one file, index.php (the script file) and a sub-directory called include. The include directory contains the functions.php file.

test_fasta
    index.php
    include
        functions.php

The functions.php file.

The index.php file:

You can run this code live here.

Two functions to handle FASTA sequences in batch

As a final exercise for this section, let’s write a version of process_fasta() able to process several FASTA sequences in batch, rather that just one single sequence.

We will generate two versions of this function.

The first one, fasta_file_to_array(), will work on a text file with FASTA sequences, taking a file path as argument. This is a good opportunity to learn how to open and read a file, line by line with the fopen() and fgets() functions. For a change, we will use a “while” cycle instead of the usual “foreach” cycle in this one.

The second, fasta_sequences_to_array(), will accept the FASTA sequences argument as a variable.

fasta_file_to_array()

In this function, we open the FASTA file whose path is accepted as argument by the function, by using the fopen() function. fopen() (file open) will be called with two arguments. The first is the path of the file to be opened and the second is the opening mode, in this case “read” (r), as all we want to do here is to read the file. fopen() will return an handle to the file, that we can then use to read the file lines by passing it to the fgets() function. fgets() takes the file handle as argument and returns one line of the file. Each time we call fgets() it will return the next line with respect to the previous call. The first time we call fgets() it will return the first line of the file. The second time, the second line, and so on. Therefore by using fgets() in a cycle we can basically read all the file, line by line.

In order to cycle we use, instead of the usual “foreach” cycle, a “while” cycle. We check if we have reached the end of the file by using the feof() function as a condition, that tests for the end of the file on a file handle.

The syntax will be as follows:

Which can be read as: until (while) the end of the file is not (! is used to negate what follows) reached (!feof($file_handle)), you keep executing the code inside the curly brackets.

The output of the fasta_file_to_array() function will be an array (as the name of the function implies). Each element of this output array will be itself an array with two elements: a header and a sequence. We could represent the structure of the output array of fasta_file_to_array() as follows:

[(seq1 header, seq1 sequence),(seq2 header, seq2 sequence),(seq3 header, seq3 sequence), etc…]

By cycling on this output array we will be able to access each FASTA sequence as an array whose first element is the header line and the second element is the corresponding sequence.

In the function code, inside the while cycle, we make sure we clean up each line for possible trailing spaces or other unwanted characters with the PHP trim() function.

Let us follow what happens in the function to better understand the code below. Before starting to go through the file lines in the while cycle, we declare two empty variables, $sequence and $header_line, and a counter $i set to 0. Then we start cycling. The first line of the file will most likely be the header line of the first sequence. preg_match(“/^>/”,$line), will be true and the value of $i will be 0. We store this line in the $header_line variable.

Then we start to accumulate the following lines (sequence lines) in the $sequence variable until the next header line is reached. Each header line, excluding the first one, is used as a signal that we have finished to read (parse) the previous sequence. When we find one of those header lines (we know an header line is not the first as $i will be different from 0) the bit of code that stores the ($header_line, $sequence) array (with data from the previous sequence) inside the final output array will be executed, and the $sequence variable reset to empty:

Since inside the while cycle each ($header_line, $sequence) array for a sequence is stored in the output array only when the header line for the next sequence is encountered, for the very last sequence of the file we have to do this storage operation outside the while cycle. This is what we do three lines before the end of the function code:

Here you go with the code!

fasta_sequences_to_array()

In this second version we accept a variable directly containing FASTA sequences as argument, instead of a file path. The flow of the function is very similar to the one of fasta_file_to_array(). The difference is that we split the input sequences into lines as we did in process_fasta() with a preg_split() call:

and then iterate through the lines with a foreach cycle, again as we did in the original process_fasta() function. The logic of the cycle, in which, within the cycle, we store the results for one sequence when we come across the header for the next sequence, and then store the results for the very last sequence with a dedicated line of code outside the cycle is the same as the fasta_file_to_array() function. As in fasta_file_to_array(), the structure of the output is an array with this structure:

[(seq1 header, seq1 sequence),(seq2 header, seq2 sequence),(seq3 header, seq3 sequence), etc…]

Here is the code:

We leave it to you as an exercise to use these two functions in the context of a script, on your own server.

With this final section of the chapter we have covered most of the basics of PHP programming. If you followed closely, understood the code samples and practiced a little bit, you should now be a proficient PHP programmer. Feel the power! In the next chapter we will learn how to gather input on the web from users by using web forms, a crucial step toward the building of full fledged web applications for bioinformatics.

Chapter Sections

[pagelist include=”435″]

[siblings]

Leave a Reply

Your email address will not be published. Required fields are marked *