Writing Your Own Form Handling Scripts, Part I
If you've been wanting to write form handling scripts, this
article is for you.
For demonstration, we'll use a form with a hidden field,
text field inputs, a textarea, radio buttons, a drop-down
list box, and checkboxes.
This tutorial is in four parts:
- How to put information from the form into the
script.
- The example form.
- Beginning the script.
- Getting the info into the script.
- Verifying email addresses are formatted
correctly, displaying error messages, and
an exit subroutine.
- Some security issues and how to handle them:
- Referrer check.
- GET versus POST.
- Form content length.
- Embedded HTML.
- The script up to this point.
- How to store the form information in a database
file on your server -- in any plain text format,
including tab- and comma-delimited formats that
can be imported into Excel and other spreadsheet
and database programs.
- How to send the form information to yourself in
an email -- formatted however you please,
including HTML.
- How to personalize the "thank you"/confirmation
page for your form user.
Okay, let's get right to it.
I. How to put information from the form into the script.
a. The example form.
Below is the example form we'll use for this tutorial. (It
is recommended, although not strictly necessary, that you
know how to make forms before pursuing this tutorial. See
"HTML Form Tutorial"
-- it can be read online or downloaded as a PDF ebook.)
<form method="POST" action="myscript.cgi">
<input type="hidden" name="Subject" value="testing...">
Name:
<input type="text" name="username">
<br><br>
Email:
<input type="text" name="email">
<br><br>
Gender:
<input type="radio" name="gender" value="female">Female
<input type="radio" name="gender" value="male">Male
<br><br>
Favorite Color (select one):
<select name="favorite color">
<option value="white">White</option>
<option value="red">Red</option>
<option value="yellow">Yellow</option>
<option value="blue">Blue</option>
<option value="purple">Purple</option>
<option value="orange">Orange</option>
<option value="green">Green</option>
<option value="black">Black</option>
</select>
<br><br>
Favorite Color Runner-ups (select any):<br>
<input type="checkbox" name="c2" value="white">White<br>
<input type="checkbox" name="c2" value="red">Red<br>
<input type="checkbox" name="c2" value="yellow">Yellow<br>
<input type="checkbox" name="c2" value="blue">Blue<br>
<input type="checkbox" name="c2" value="purple">Purple<br>
<input type="checkbox" name="c2" value="orange">Orange<br>
<input type="checkbox" name="c2" value="green">Green<br>
<input type="checkbox" name="c2" value="black">Black<br>
<br>
Your Message:<br>
<textarea name="message" cols="22" rows="5"></textarea>
<br><br>
<input type="submit" value="Send Form Info">
</form>
b. Beginning the script.
It is highly recommended that you have a Perl reference
available. This tutorial will not attempt to explain basic
Perl concepts except within the context of the subject at
hand.
If you do not have a programming background, I recommend
"Perl for Dummies (with CD ROM)" by Paul Hoffman. The CD
contains a Perl interpreter to install on your computer.
(A similar title by the same author, "Perl 5 for Dummies,"
is also on bookshelves. I've never read that one, so can't
make a recommendation.)
For learning Perl, I also recommend "Learning Perl" by
Randal L. Schwartz & Tom Christiansen.
As a reference, "Programming Perl" by Larry Wall, Tom
Christiansen & Randal L. Schwartz can't be beat, at least
in my opinion. My copy is dog-eared and the covers are
curling. I use it every day.
Begin your form handling script with these four lines and
save it as "myscript.cgi":
#!/usr/bin/perl
# By [your name here]
use strict;
my %In = ();
The first line is the location of Perl on your server and
the second line is a comment where you put your name as
author, date, version number, whatever information you want
recorded with the script. You may use additional comment
lines as needed.
The third line specifies that only lexical variables
are used in the script (unless specifically named with
"use vars"). Lexical variables are preceded with the
word "my" when declared or the first time they are used.
"use strict" is a great typo catcher; if you mistype a
variable name, it will generate an error message.
The fourth line declares the variable %In and ensures the
variable is empty. This is the variable that will be used
to hold the information submitted by the form.
Use "use strict" and "my" variables. It will save you a lot
of grief from chasing elusive bugs.
c. Getting the info into the script.
When the form is submitted to your script, the variable %In
will contain the form information. (%In could be named
%FORM or any other name beginning with "%" -- I use %In
because it represents information brought in.)
Here is a subroutine that will put form information into
the variable %In
sub Parse
{
my $buffer;
if ($ENV{REQUEST_METHOD} eq 'GET')
{ $buffer = $ENV{QUERY_STRING}; }
else { read(STDIN,$buffer,$ENV{CONTENT_LENGTH}); }
my @p = split(/&/,$buffer);
foreach(@p)
{
$_ =~ tr/+/ /;
my ($n,$v) = split(/=/,$_,2);
$n =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C",hex($1))/eg;
$v =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C",hex($1))/eg;
$v =~ s/(\<.*?)(embed|object|script|applet)(.*?\>)/$1$3/gis;
if($In{$n}) { $In{$n} .= "\t$v"; }
else { $In{$n} = $v; }
}
} # sub Parse
To call the subroutine, use:
Parse();
The working program, complete with all of today's code, is
printed near the end of this article.
d. Verifying email addresses are formatted correctly,
displaying error messages, and an exit subroutine.
The subroutine ValidEmail returns 1 if the email address it
was given is correctly formatted. Otherwise, it returns 0.
The subroutine ErrorHTML creates a web page with the error
message(s) it was given, the messages in an unordered list.
The subroutine Exit merely calls Perl's built-in exit
function. Exit is a placeholder in case your system requires
a different exit routine (if your system is mod_perl, for
example). Whenever your script needs to exit, call this Exit
subroutine instead of the Perl exit function. That way, if
you need a different exit routine sometime in the future,
you do not have to search the entire script for exit calls;
you merely modify this one Exit subroutine. (Note that the
ErrorHTML subroutine calls the Exit subroutine.)
Here are the three subroutines:
sub ValidEmail
{
if($_[0]=~/([\.\-\_]{2,})|(@[\.\-\_])|([\.\-\_]@)|(\A\.)/)
{ return 0; }
if($_[0]=~/^[\w\.\-\_]+\@\[?[\w\.\-\_]+\.([\w\.\-\_]{2,3}|[0-9])\]?$/)
{ return 1; }
return 0;
} # sub ValidEmail
sub ErrorHTML
{
my $s = join("\n<li>",@_);
print "Content-type: text/html\n\n";
print <<HTML;
<html><body bgcolor="white">
<blockquote><blockquote>
<h4>Message:</h4>
<ul>
<li>$s
</ul>
</blockquote></blockquote>
</body></html>
HTML
Exit();
} # sub ErrorHTML
sub Exit { exit; }
Here are examples of use:
unless($In{email})
{ ErrorHTML('An email address is required.\ }
unless(ValidEmail($In{email}))
{ ErrorHTML('Sorry, invalid email address format.\ }
Note: $In{email} contains the email address provided on the form.
e. Some security issues and how to handle them.
i. Referrer check.
You'll probably want to restrict your form's use to your
own domain. The following code will present an error
message unless the form being used is on your server with
your domain's URL:
my $AuthorizedDomain = 'mydomain.com';
my $FormDomain = lc $ENV{HTTP_REFERER};
$FormDomain =~ s!^https?://(?:www\.)?(.*?)(?:/.*)$!$1!;
unless($FormDomain eq lc $AuthorizedDomain)
{ ErrorHTML('Unauthorized access.\ }
Replace mydomain.com, above, with your own domain name.
ii. GET versus POST.
Method GET creates URLs with a ? followed by data, like
http://domain.com/script.cgi?one=this&two=that
The Parse subroutine, above, accepts information with
method GET as well as method POST (method POST transmits
the information in a way not visible in the URL).
In some situations, GET is preferred, such as when you want
the information visible within the URL, in which case you
would change your example form's FORM tag from method="POST"
to method="GET". However, if you won't need method GET, then
replace the above Parse subroutine with the following
ParsePost subroutine. ParsePost will process only method
POST information. It will return a 1 if method POST
information is available and return 0 otherwise.
sub ParsePost
{
return 0 unless $ENV{REQUEST_METHOD} =~ /POST/i;
my $buffer;
read(STDIN,$buffer,$ENV{CONTENT_LENGTH});
my @p = split(/&/,$buffer);
foreach(@p)
{
$_ =~ tr/+/ /;
my ($n,$v) = split(/=/,$_,2);
$n =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C",hex($1))/eg;
$v =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C",hex($1))/eg;
$v =~ s/(\<.*?)(embed|object|script|applet)(.*?\>)/$1$3/gis;
if($In{$n}) { $In{$n} .= "\t$v"; }
else { $In{$n} = $v; }
}
return 1;
} # sub ParsePost
You can call the above subroutine with:
unless(ParsePost())
{ ErrorHTML('Unauthorized access.\ }
iii. Form content length.
Most browsers have a maximum content size of 32k. However,
there may still be situations where you want to limit the
length of the content. The following will keep the first
250 characters of the form textarea name="message" whenever
more than 250 characters are entered:
if(length($In{message}) > 250)
{ $In{message} = substr($In{message},0,250); }
iv. Embedded HTML.
In the Parse and ParsePost subroutines, you'll find the line:
$v =~ s/(\<.*?)(embed|object|script|applet)(.*?\>)/$1$3/gis;
That line deletes the HTML tag names "embed," "object,"
"script," or "applet" whenever found between angle brackets.
If you prefer to delete all HTML tags (everything between
angle brackets, including the angle brackets themselves),
use this line instead:
$v =~ s/(\<.*?\>)//gs;
f. The script up to this point.
Here is the form handling script up to this point:
#!/usr/bin/perl
# By [your name here]
use strict;
my $AuthorizedDomain = 'mydomain.com';
my %In = ();
my $FormDomain = lc $ENV{HTTP_REFERER};
$FormDomain =~ s!^https?://(?:www\.)?(.*?)(?:/.*)$!$1!;
unless($FormDomain eq lc $AuthorizedDomain)
{ ErrorHTML('Unauthorized access.\ }
unless(ParsePost())
{ ErrorHTML('Unauthorized access.\ }
unless($In{email})
{ ErrorHTML('An email address is required.\ }
unless(ValidEmail($In{email}))
{ ErrorHTML('Sorry, invalid email address format.\ }
if(length($In{message}) > 250)
{ $In{message} = substr($In{message},0,250); }
ErrorHTML('Script paused here.\ # temporary line
Exit();
sub ParsePost
{
return 0 unless $ENV{REQUEST_METHOD} =~ /POST/i;
my $buffer;
read(STDIN,$buffer,$ENV{CONTENT_LENGTH});
my @p = split(/&/,$buffer);
foreach(@p)
{
$_ =~ tr/+/ /;
my ($n,$v) = split(/=/,$_,2);
$n =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C",hex($1))/eg;
$v =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C",hex($1))/eg;
$v =~ s/(\<.*?)(embed|object|script|applet)(.*?\>)/$1$3/gis;
if($In{$n}) { $In{$n} .= "\t$v"; }
else { $In{$n} = $v; }
}
return 1;
} # sub ParsePost
sub ValidEmail
{
if($_[0]=~/([\.\-\_]{2,})|(@[\.\-\_])|([\.\-\_]@)|(\A\.)/)
{ return 0; }
if($_[0]=~/^[\w\.\-\_]+\@\[?[\w\.\-\_]+\.([\w\.\-\_]{2,3}|[0-9])\]?$/)
{ return 1; }
return 0;
} # sub ValidEmail
sub ErrorHTML
{
my $s = join("\n<li>",@_);
print "Content-type: text/html\n\n";
print <<HTML;
<html><body bgcolor="white">
<blockquote><blockquote>
<h4>Message:</h4>
<ul>
<li>$s
</ul>
</blockquote></blockquote>
</body></html>
HTML
Exit();
} # sub ErrorHTML
sub Exit { exit; }
The "Script paused here." message in the above code will be
removed when the script is completed.
Part II of this series shows you how to take the information
in the %In variable and email it to yourself (or to anybody
else).
See you then :)
Will Bontrager
©2001 Bontrager Connection, LLC
Please note:
Articles on this website are presented "as is". However -
If you have a question about a CGI script, HTML, CSS, PHP, or JavaScript
Ask one of our Experts and you'll have your answer!
Click here for details.