Why functions are a convenient way to structure code
We have seen how PHP offers a wide variety of built-in or predefined functions to deal with strings, variables, arrays, numbers regular expressions and more. There are times however in which no built-in functions will allow to perform easily a more specific task, for which you have to write your own code. Embedding an original code that performs a well defined task in the form of a function has several advantages. Just to name a few:
- Code in functions is re-usable
- It makes you main code more readable
- Your code main will be shorter
- Your code main will be easier to maintain
- You can modify the function code or add functionality in a single central place
Storing functions in a dedicated file and then importing it into scripts
Typically although not necessarily, your functions will be placed in a dedicated file, separated from the main script or scripts, called for example functions.php and contained in a directory called “include”.
Then, this code is imported into your scripts by using an include statement, often placed at the top of the scripts. On including the functions file, or any other PHP code, within a script with an include statement, the included code will be executed once. If a function was defined in the functions file, it’s name will then be available to use within the script.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
<?php // The include statement takes an absolute or relative path as an argument // Included PHP code will be executed once // In this example the directory that contains this script also contains a directory called "include" // The functions.php file is located inside the "include" directory // so it's path relative to the main script is "include/functions.php" include("include/functions.php"); ?> |
Writing a function in PHP, basic syntax
For now, let’s place the code for our functions in the same file as the rest of the script code (which is perfectly fine if we define the functions before using them in the flow of the script) and concentrate of how to write a function.
In order to define a function, we need 3 parts: the function name, round brackets to specify the arguments and curly brackets for the actual function code.
1 2 3 4 5 6 7 8 9 |
<?php function function_name($argument1, $argument2, etc..){ actual code executed on function call; } ?> |
Let’s start by writing an exceedingly simple and useless function, that takes no arguments, just to highlight a couple of points:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
<?php // The function definition function meaning_of_life(){ return 42; } // We now compute the meaning of life $mol = meaning_of_life(); // Value of $mol is now 42 echo "<p>The meaning of life is $mol!</p>"; ?> |
We have defined a function with name “meaning_of_life”.
This function, on call, will just return a number (42). This is accomplished through a return statement. Within functions, it is common to include a return statement. If no return statement is used, the function will just return the last computed value or declared variable. However we do encourage you, for the sake of code readability, to always include a return statement in your functions.
- In order to call the function, you will write the function name followed by round brackets.
- If the function requires arguments, arguments must be provided.
- Some functions may require no arguments, one argument, two argument or more. For a successful function call the right number of arguments must be passed to the function, or PHP will give an error and the script will not execute correctly.
- Some arguments may be optional. Optional arguments are passed to the function after the mandatory arguments.
Here’s a slightly more useful function (the truth? not quite, really), multiply_and_add(), that performs a simple mathematical operation. Mind that the following example is meant more to be read than executed.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
<?php // The function definition function multiply_and_add($a, $b, $c){ // The names used for the argument variables $a, $b and $c // only exist and have a meaning INSIDE the function space // that is inside the curly brackets where this very comment is located $out = ($a * $b) + $c; return $out; } // Arguments can be passed directly as values (depending on the function, strings, numbers, booleans, arrays etc...) $var1 = multiply_and_add(20, 2, 3); // $var1 value is now (20 * 2) + 3 = 43 $x = 3; $y = 4; $z = 10; // Arguments can be passed as variables // within the function's round brackets, the variables will be interpolated to their values // If arguments are passed as variables, the variables have to be defined before the function call $var2 = multiply_and_add($x, $y, $z); // $var2 value is (3 * 4) + 10 = 22 echo $a; // will give an error as $a is not defined at this time. Even if the first argument of the function // was called $a at function definition time, $a exists ONLY within the name space of the function itself // and not in the name space of the script // Therefore on calling: $var2 = multiply_and_add($x, $y, $z); // the value of $x (3 in this case) will be attributed to the $a variable INSIDE the function's name space, as // at the time of function definition the first argument was called $a // whatever we pass as first argument to the function will be assigned to a variable called $a inside it // once the function has been called and executed, there is no $a variable around anymore $a = 66; // We now define an $a variable in the name space of the script, not the function $var3 = multiply_and_add(4, 6, 1); // Value of $var3 is now (4 * 6) + 1 = 25 echo $a; // the value of $a is still 66. The fact that the function itself contains inside // , in the function's name space, a variable called $a does not make any impact // on what happens in the script's name space as the name space of the function is // segregated from the name space of the script ?> |
Understanding name spaces
A fundamental concept in programming is that within code you may have several “name spaces” that are totally segregated from each other. For example on defining a function you will give names to the argument variables and these names will only exist within the name space of the function itself.
If you use some variable names within a function, for example, $i, $j or any other variable name, you do not have to take care to avoid those variable names within a script code that uses the function, as the name space of the script is segregated from the name space of the function itself.
Similarly, a variable name used in a script is not available within a function. In order to use the value of a variable defined in a script, within a function, the only way is to pass this value, or the variable, as argument to the function. Once passed as argument, this value may well get a different name inside the function’s name space. In the sample code above, the value of a variable called $x in the script (3) was attributed to a variable called $a inside the function space name because $x was passed as first argument to the function and at function definition time the first argument was called $a. It is crucial that these concepts are well understood in order to write working, error free code.
Making arguments optional by assigning defaults
Some argument can be made optional for the function call by assigning defaults to them at the time of function definition. We now rewrite the multiply_and_add() function by setting 10 as default value for the number to be added (third argument, $c).
Once an argument is optional:
– if we do not provide the argument at function call, the default assigned in the function definition will be used
– if we do provide the argument, the value given will be used instead of the default one (it will override the default)
Therefore, with the third argument optional, we will be able to call multiply_and_add() with either 2 or 3 arguments, both calls will be correct and issue no errors.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
<?php // The function definition, third argument now optional because we provide a default for it function multiply_and_add($a, $b, $c=10){ $out = ($a * $b) + $c; return $out; } $var1 = multiply_and_add(2,4); // The value of $var1 is now (2 * 4) + 10 = 18 $var2 = multiply_and_add(2,4,5); // The value of $var2 is now (2 * 4) + 5 = 13 ?> |
Writing a reverse-complement function
In section 4-7 we wrote a snippet of code that allows to get the reverse-complement of a DNA sequence. Let us transform this code into a function. We will write it so that we can reverse, complement or reverse complement an input sequence based on an optional argument passed at call time.
This was the original code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
<?php $complement_dict = array( "A" => "T", "T" => "A", "G" => "C", "C" => "G" ); $sequence = "ATGGTGAAGCAGATCGA"; $nucleotides = str_split($sequence,1); $complement_sequence = ""; foreach($nucleotides as $nucleotide){ $complement_sequence = $complement_sequence.$complement_dict[$nucleotide]; } $revcomp_sequence = strrev($complement_sequence); echo "<p>\n<strong>Input Sequence</strong><br>\n<span style=\"font-family:courier;\">$sequence</span>\n</p>\n<p>\n<strong>Reverse Complement</strong><br>\n<span style=\"font-family:courier;\">$revcomp_sequence</span>\n</p>"; ?> |
And this could be our reverse complement function revcomp():
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
<?php // Defining the revcomp() function function revcomp($sequence, $mode="revcomp"){ $complement_dict = array( "A" => "T", "T" => "A", "G" => "C", "C" => "G" ); $nucleotides = str_split($sequence,1); // Let's compute the complement sequence first $complement_sequence = ""; foreach($nucleotides as $nucleotide){ $complement_sequence = $complement_sequence.$complement_dict[$nucleotide]; } // The complement sequence is now stored in the $complement_sequence variable $revcomp_sequence = strrev($complement_sequence); // This is the reverse complement sequence $reverse_sequence = strrev($sequence); // This is the reverse sequence // We return different things depending on the $mode (second optional argument of this function) // if we call the function with just one argument, the value of $mode will be the default, "revcomp" // additional supported values for the $mode argument are "comp" and "rev", see below // Note that when a function returns, it also exits, no more code inside the function is executed if($mode == "revcomp"){ return $revcomp_sequence; } elseif($mode == "comp"){ return $complement_sequence; } elseif($mode == "rev"){ return $reverse_sequence; } else{ // This part may help us in debugging code in which the function is used return "WARNING: revcomp mode not supported"; } } // We now have the function defined, let's test it on a sequence $sequence = "ATGGTGAAGCAGATCGA"; // Function call, the revcomp() function in action $reverse_complement = revcomp($sequence); // If we call the function with a single argument, the value of $mode will be the default (revcomp) $reverse = revcomp($sequence, "rev"); $complement = revcomp($sequence, "comp"); echo "<p>\n<strong>Input Sequence</strong><br>\n<span style=\"font-family:courier;\">$sequence</span>\n</p>\n"; echo "<p>\n<strong>Reverse Complement</strong><br>\n<span style=\"font-family:courier;\">$reverse_complement</span>\n</p>"; echo "<p>\n<strong>Complement</strong><br>\n<span style=\"font-family:courier;\">complement</span>\n</p>"; echo "<p>\n<strong>Reverse</strong><br>\n<span style=\"font-family:courier;\">$reverse</span>\n</p>"; ?> |
Here’s the output of this code:
Input Sequence
ATGGTGAAGCAGATCGA
Reverse Complement
TCGATCTGCTTCACCAT
Complement
TACCACTTCGTCTAGCT
Reverse
AGCTAGACGAAGTGGTA
Writing a “sequence breaker” seqbreak() function
Long uninterrupted strings given in output in a web page have the potential to break the page layout by creating a very wide page which requires horizontal scrolling to be viewed fully. This can easily happen with biological sequences, for example. Let us write an handy function able to introduce break tags in a string every 80 characters. This is of obvious use as an aid to format a sequence according to the FASTA format. To make the function a little more widely usable, let us set the default to 80 but also leave the opportunity to change that at call time with an optional argument. Let’s also make it so that the break tag could be replaced with something else if we wish so.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
<?php function seqbreak($sequence, $brlen=80, $brel="<br>"){ // $brel => breaking element $chars = str_split($sequence, 1); $i = 1; $out = ""; foreach($chars as $char){ if(is_int($i/$brlen)){ $out = $out.$char.$brel; } else{ $out = $out.$char; } $i++; } return $out; } // Let's test it $seq = "LERGRDYEKNKVCKEFSHLGKEDFTSLSLVLYSRKFPSGTFEQVSQLVKEVVSLTEACCAEGADPDCYDTRTSALSAKSCESNSPFPVHPGTAECCTKEGLERKLCMAALKHQPQEFPTYVEPTNDEICEAFRKDPKEYANQFMWEYSTNYGQAPLSLLVSYTKSYLSMVGSCCTSASPTVCFLKERLQLKHLSLLTTLSNRVCSQYAAYGEKKSRLSNLIKLAQKVPTADLEDVLPLAEDITNILSKCCESASEDCMAKELPEHTVKLCDNLSTKNSKFEDCCQEKTAMDVFVCTYFMPAAQLPELPDVELPTNKDVCDPGNTKVMDKYTFELSRRTHLPEVFLSKVLEPTLKSLGECCDVEDSTTCFNAKGPLLKKELSSFIDKGQELCADYSENTFTEYKKKLAERLKAKLPEATPTELAKLVNKRSDFASNCCSINSPPLYCDSEIDAELKNIL"; $formatted = seqbreak($seq); echo "<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n<meta charset=\"utf-8\">\n<title>Test seqbreak function</title>\n</head>\n<body>"; echo "<p><strong>Non formatted</strong><br><span style=\"font-family:courier;\">$seq</span></p>"; echo "<p><strong>Formatted with seqbreak</strong><br><span style=\"font-family:courier;\">$formatted</span></p>"; echo "</body>\n</html>"; ?> |
You can run this code live here.
Writing a function to handle FASTA sequences
In the previous section we wrote a piece of code to process a FASTA sequence so that we could have the FASTA header line and the sequence stored in 2 different variables.
Here is the relevant part of the code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
<?php $fasta_sequence = file_get_contents("http://www.cellbiol.com/bioinformatics_web_development/uploads/AL591983.1_1-50000.fasta"); // Storing the FASTA header and sequence into different variables $fasta_lines = explode("\n", $fasta_sequence); $header = "> Generic"; // We will store the header line here during the next foreach cycle. // If we have no header line (input sequence not in FASTA format), we still give it a value of "Generic" $sequence = ""; // We will store the sequence here during the next foreach cycle foreach($fasta_lines as $line){ // We strip possible whitespace (or other characters such as \t\n\r\0\x0B) // from the beginning and end of the line $line = trim($line); if(preg_match("/^>/", $line)){ // If the line starts with a > it's the header line $header = $line; } elseif($line != ""){ $sequence = $sequence.$line; // We concatenate each new sequence line in the $sequence variable } } // At this point we should have the FASTA header in the $header variable // and the whole sequence in the $sequence variable ?> |
What about turning this quite useful little piece of code, that we may indeed want to re-use often in web applications for bioinformatics, into a process_fasta() function? Let’s do it!
To make it a bit more interesting, let us add here as well, as we did for the revcomp() function above, a $mode argument that will allow us to tune what the function returns. We will make it so that if $mode is “all”, an array of two elements will be returned, with the header line as first element and the sequence as second element. If $mode is “seq”, the function will only return the sequence. If $mode is “header”, the function will only return the header. The default will be “all”. This flexibility will make the function slightly more adaptable to different requirements of the script in which we call it. Maybe at times we will just need to get out the header, while we may just need the sequence for other implementations.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
<?php function process_fasta($fasta_sequence, $mode="all"){ $fasta_lines = explode("\n", $fasta_sequence); $header = "> Generic"; // We will store the header line here during the next foreach cycle $sequence = ""; // We will store the sequence here during the next foreach cycle foreach($fasta_lines as $line){ // We strip possible whitespace (or other characters) from the beginning and end of the line $line = trim($line); if(preg_match("/^>/", $line)){ // If the line starts with a > it's the header line $header = $line; } elseif($line != ""){ $sequence = $sequence.$line; // We concatenate each new sequence line in the $sequence variable } } // At this point we should have the FASTA header in the $header variable // and the whole sequence in the $sequence variable // And now the return part, that depends on value of $mode if($mode == "all"){ return array($header, $sequence); } elseif($mode == "seq"){ return $sequence; } elseif($mode == "header"){ return $header; } else{ return "WARNING: process_fasta mode not supported"; } } // The function process_fasta() is now defined // Let us call the function and check the outputs with some var_dumps on a test sequence $seq1 = file_get_contents("http://www.cellbiol.com/bioinformatics_web_development/uploads/gi-28373620.fasta"); $output_all = process_fasta($seq1); $output_header = process_fasta($seq1, "header"); $output_seq = process_fasta($seq1, "seq"); echo "<p><strong>Output of process_fasta() with mode \"all\"</strong></p>\n<p>"; var_dump($output_all); echo "</p>\n<p><strong>Output of process_fasta() with mode \"header\"</strong></p>\n<p>"; var_dump($output_header); echo "</p>\n<p><strong>Output of process_fasta() with mode \"seq\"</strong></p>\n<p>"; var_dump($output_seq); echo "\n</p>"; ?> |
Here’s the output of this script:
Output of process_fasta() with mode “all”
array(2) {
[0]=>
string(124) “>gi|28373620|pdb|1MA9|A Chain A, Crystal Structure Of The Complex Of Human Vitamin D Binding Protein And Rabbit Muscle Actin”
[1]=>
string(458) “LERGRDYEKNKVCKEFSHLGKEDFTSLSLVLYSRKFPSGTFEQVSQLVKEVVSLTEACCAEGADPDCYDTRTSALSAKSCESNSPFPVHPGTAECCTKEGLERKLCMAALKHQPQEFPTYVEPTNDEICEAFRKDPKEYANQFMWEYSTNYGQAPLSLLVSYTKSYLSMVGSCCTSASPTVCFLKERLQLKHLSLLTTLSNRVCSQYAAYGEKKSRLSNLIKLAQKVPTADLEDVLPLAEDITNILSKCCESASEDCMAKELPEHTVKLCDNLSTKNSKFEDCCQEKTAMDVFVCTYFMPAAQLPELPDVELPTNKDVCDPGNTKVMDKYTFELSRRTHLPEVFLSKVLEPTLKSLGECCDVEDSTTCFNAKGPLLKKELSSFIDKGQELCADYSENTFTEYKKKLAERLKAKLPEATPTELAKLVNKRSDFASNCCSINSPPLYCDSEIDAELKNIL”
}
Output of process_fasta() with mode “header”
string(124) “>gi|28373620|pdb|1MA9|A Chain A, Crystal Structure Of The Complex Of Human Vitamin D Binding Protein And Rabbit Muscle Actin”
Output of process_fasta() with mode “seq”
string(458) “LERGRDYEKNKVCKEFSHLGKEDFTSLSLVLYSRKFPSGTFEQVSQLVKEVVSLTEACCAEGADPDCYDTRTSALSAKSCESNSPFPVHPGTAECCTKEGLERKLCMAALKHQPQEFPTYVEPTNDEICEAFRKDPKEYANQFMWEYSTNYGQAPLSLLVSYTKSYLSMVGSCCTSASPTVCFLKERLQLKHLSLLTTLSNRVCSQYAAYGEKKSRLSNLIKLAQKVPTADLEDVLPLAEDITNILSKCCESASEDCMAKELPEHTVKLCDNLSTKNSKFEDCCQEKTAMDVFVCTYFMPAAQLPELPDVELPTNKDVCDPGNTKVMDKYTFELSRRTHLPEVFLSKVLEPTLKSLGECCDVEDSTTCFNAKGPLLKKELSSFIDKGQELCADYSENTFTEYKKKLAERLKAKLPEATPTELAKLVNKRSDFASNCCSINSPPLYCDSEIDAELKNIL”
Everything is working out nicely: on calling the function with just one argument, in the default “all” mode, we get an array with the header as first element and the sequence as second element, while in the other 2 modes we get just a string, the header or the sequence, depending if we use the “header” or “seq” mode.
Let us move the revcomp() and process_fasta() to a functions.php file and use them together in a script.
We will create a directory for this exercise called test_fasta.
test_fasta contains one file, index.php (the script file) and a sub-directory called include. The include directory contains the functions.php file.
test_fasta
index.php
include
functions.php
The functions.php file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
<?php function revcomp($sequence, $mode="revcomp"){ $complement_dict = array( "A" => "T", "T" => "A", "G" => "C", "C" => "G" ); $nucleotides = str_split($sequence,1); // Let's compute the complement sequence first $complement_sequence = ""; foreach($nucleotides as $nucleotide){ $complement_sequence = $complement_sequence.$complement_dict[$nucleotide]; } // The complement sequence is now stored in the $complement_sequence variable $revcomp_sequence = strrev($complement_sequence); // This is the reverse complement sequence $reverse_sequence = strrev($sequence); // This is the reverse sequence // We return different things depending on the $mode (second optional argument of this function) // if we call the function with just one argument, the value of $mode will be the default, "revcomp" // additional supported values for the $mode argument are "comp" and "rev", see below // Note that when a function returns, it also exits, no more code inside the function is executed if($mode == "revcomp"){ return $revcomp_sequence; } elseif($mode == "comp"){ return $complement_sequence; } elseif($mode == "rev"){ return $reverse_sequence; } else{ // This part may help us in debugging code in which the function is used return "WARNING: revcomp mode not supported"; } } function process_fasta($fasta_sequence, $mode="all"){ $fasta_lines = explode("\n", $fasta_sequence); $header = "> Generic"; // We will store the header line here during the next foreach cycle $sequence = ""; // We will store the sequence here during the next foreach cycle foreach($fasta_lines as $line){ // We strip possible whitespace (or other characters) from the beginning and end of the line $line = trim($line); if(preg_match("/^>/", $line)){ // If the line starts with a > it's the header line $header = $line; } elseif($line != ""){ $sequence = $sequence.$line; // We concatenate each new sequence line in the $sequence variable } } // At this point we should have the FASTA header in the $header variable // and the whole sequence in the $sequence variable // And now the return part, that depends on value of $mode if($mode == "all"){ return array($header, $sequence); } elseif($mode == "seq"){ return $sequence; } elseif($mode == "header"){ return $header; } else{ return "WARNING: process_fasta mode not supported"; } } function seqbreak($sequence, $brlen=80, $brel="<br>"){ $chars = str_split($sequence, 1); $i = 1; $out = ""; foreach($chars as $char){ if(is_int($i/$brlen)){ $out = $out.$char.$brel; } else{ $out = $out.$char; } $i++; } return $out; } ?> |
The index.php file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
<?php include("include/functions.php"); echo "<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n<meta charset=\"utf-8\">\n<title>Test FASTA</title>\n</head>\n<body>"; $fasta_seq = file_get_contents("http://www.cellbiol.com/bioinformatics_web_development/uploads/21407461.fasta"); $seq_only = process_fasta($fasta_seq, "seq"); $header = process_fasta($fasta_seq, "header"); $revcomp = revcomp($seq_only); $revcomp_formatted = seqbreak($revcomp); echo "<p>The reverse complement of sequence</p>\n$header<p>is the following:</p>\n<p style=\"font-family:courier;\">$revcomp_formatted</p>\n"; echo "</body>\n</html>"; ?> |
You can run this code live here.
Two functions to handle FASTA sequences in batch
As a final exercise for this section, let’s write a version of process_fasta() able to process several FASTA sequences in batch, rather that just one single sequence.
We will generate two versions of this function.
The first one, fasta_file_to_array(), will work on a text file with FASTA sequences, taking a file path as argument. This is a good opportunity to learn how to open and read a file, line by line with the fopen() and fgets() functions. For a change, we will use a “while” cycle instead of the usual “foreach” cycle in this one.
The second, fasta_sequences_to_array(), will accept the FASTA sequences argument as a variable.
fasta_file_to_array()
In this function, we open the FASTA file whose path is accepted as argument by the function, by using the fopen() function. fopen() (file open) will be called with two arguments. The first is the path of the file to be opened and the second is the opening mode, in this case “read” (r), as all we want to do here is to read the file. fopen() will return an handle to the file, that we can then use to read the file lines by passing it to the fgets() function. fgets() takes the file handle as argument and returns one line of the file. Each time we call fgets() it will return the next line with respect to the previous call. The first time we call fgets() it will return the first line of the file. The second time, the second line, and so on. Therefore by using fgets() in a cycle we can basically read all the file, line by line.
In order to cycle we use, instead of the usual “foreach” cycle, a “while” cycle. We check if we have reached the end of the file by using the feof() function as a condition, that tests for the end of the file on a file handle.
The syntax will be as follows:
1 2 3 4 5 |
while(!feof($file_handle)){ // We do stuff here } |
Which can be read as: until (while) the end of the file is not (! is used to negate what follows) reached (!feof($file_handle)), you keep executing the code inside the curly brackets.
The output of the fasta_file_to_array() function will be an array (as the name of the function implies). Each element of this output array will be itself an array with two elements: a header and a sequence. We could represent the structure of the output array of fasta_file_to_array() as follows:
[(seq1 header, seq1 sequence),(seq2 header, seq2 sequence),(seq3 header, seq3 sequence), etc…]
By cycling on this output array we will be able to access each FASTA sequence as an array whose first element is the header line and the second element is the corresponding sequence.
In the function code, inside the while cycle, we make sure we clean up each line for possible trailing spaces or other unwanted characters with the PHP trim() function.
Let us follow what happens in the function to better understand the code below. Before starting to go through the file lines in the while cycle, we declare two empty variables, $sequence and $header_line, and a counter $i set to 0. Then we start cycling. The first line of the file will most likely be the header line of the first sequence. preg_match(“/^>/”,$line), will be true and the value of $i will be 0. We store this line in the $header_line variable.
Then we start to accumulate the following lines (sequence lines) in the $sequence variable until the next header line is reached. Each header line, excluding the first one, is used as a signal that we have finished to read (parse) the previous sequence. When we find one of those header lines (we know an header line is not the first as $i will be different from 0) the bit of code that stores the ($header_line, $sequence) array (with data from the previous sequence) inside the final output array will be executed, and the $sequence variable reset to empty:
1 2 3 4 5 6 |
if($i != 0){ $seqs_array[] = array($header_line,$sequence); $sequence = ''; } |
Since inside the while cycle each ($header_line, $sequence) array for a sequence is stored in the output array only when the header line for the next sequence is encountered, for the very last sequence of the file we have to do this storage operation outside the while cycle. This is what we do three lines before the end of the function code:
1 2 3 |
$seqs_array[] = array($header_line,$sequence); // Putting the last sequence information in the output array |
Here you go with the code!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
function fasta_file_to_array($fasta_file){ // Takes a FASTA file path as input $fasta_handle = fopen($fasta_file, "r"); // Getting an handle to the file with fopen() $seqs_array = array(); // We initialise the output array as an empty array $sequence = ''; $header_line = ''; $i = 0; // The while cycle to read the file line by line and fill in the output array while(!feof($fasta_handle)){ // Until the end of the file is reached $line = fgets($fasta_handle); // Get the next line of the file if(preg_match("/^>/",$line)){ // If it is an header line if($i != 0){ // If this header line is not the first header line of the file // we store the results for the previous sequence in the output array $seqs_array $seqs_array[] = array($header_line,$sequence); $sequence = ''; // and we reset the $sequence to an empty variable } $header_line = trim($line); $i++; // From the second header line of the file $i becomes different from 0 } elseif($line != ''){ // if the line is not an header line and it is not empty $sequence .= strtoupper(trim($line)); // we accumulate it in the $sequence variable after cleaning it with trim() } } $seqs_array[] = array($header_line,$sequence); // Putting the last sequence information in the output array return $seqs_array; } |
fasta_sequences_to_array()
In this second version we accept a variable directly containing FASTA sequences as argument, instead of a file path. The flow of the function is very similar to the one of fasta_file_to_array(). The difference is that we split the input sequences into lines as we did in process_fasta() with a preg_split() call:
1 2 3 |
$lines = preg_split("/\n/", $fasta_sequences); |
and then iterate through the lines with a foreach cycle, again as we did in the original process_fasta() function. The logic of the cycle, in which, within the cycle, we store the results for one sequence when we come across the header for the next sequence, and then store the results for the very last sequence with a dedicated line of code outside the cycle is the same as the fasta_file_to_array() function. As in fasta_file_to_array(), the structure of the output is an array with this structure:
[(seq1 header, seq1 sequence),(seq2 header, seq2 sequence),(seq3 header, seq3 sequence), etc…]
Here is the code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
<?php function fasta_sequences_to_array($fasta_sequences){ // Takes a variable with FASTA sequences as input $lines = preg_split("/\n/", $fasta_sequences); // Individual lines to an array $seqs_array = array(); $sequence = ''; $header_line = ''; $i = 0; foreach($lines as $line){ if(preg_match("/^>/",$line)){ if($i != 0){ $seqs_array[] = array($header_line,$sequence); $sequence = ''; } $header_line = trim($line); $i++; } elseif($line != ''){ $sequence .= strtoupper(trim($line)); } } $seqs_array[] = array($header_line,$sequence); return $seqs_array; } ?> |
We leave it to you as an exercise to use these two functions in the context of a script, on your own server.
With this final section of the chapter we have covered most of the basics of PHP programming. If you followed closely, understood the code samples and practiced a little bit, you should now be a proficient PHP programmer. Feel the power! In the next chapter we will learn how to gather input on the web from users by using web forms, a crucial step toward the building of full fledged web applications for bioinformatics.
Chapter Sections
[pagelist include=”435″]
[siblings]