V-TEK Weblog about webdevelopment and linux

13Oct/070

8 best defensive programming practices to prevent breaking your sites

I found this interesting article at phpclasses.org. It contains 8 points which you really should read when you frequently build websites. It is too long to post on this main page, so click on the title to read the entire post.

- 1. Handle unexpected conditions

Are you handling all the possible conditions under which your programs will run?

For instance, do you always have a "default" case in you "switch" statements?

[php]

switch($some_value)
{
case 1:
$another_value = 1;
break;
case 2:
$another_value = 4;
break;
}
return(1 / $another_value);

[/php]

What if $some_value is not 1 nor 2?

Notice: Undefined variable: another_value

Warning: Division by zero

What about "if" conditions? Do you have an "else" code section to all important "if" statements?

If your program is not expecting certain conditions but those conditions are not impossible to occur, having simple calls to error_log may help you to be aware of the problems under unexpected situations.

[php]

switch($some_value)
{
case 1:
$another_value = 1;
break;
case 2:
$another_value = 4;
break;
default:
error_log('unexpected some_value '.$some_value.' found ');
exit;
}

[/php]

See below about monitoring errors.

- 2. Process external systems data properly

Are you processing data from external systems with proper care?

External systems, often called actors, include everything that interacts with your software. It may be an user, or a remote server, or even a database with information that was not produced by your application.

Most application security problems arise from missing or inappropriate handling of data obtained from external systems.

Not every problem caused by the lack of defensive practices lead to security bugs. However, eventual security bugs are more problematic because they may lead to system abuses by people aware of the security holes.

There are two main defensive practices that are widely recommended:

a) Validate your input

What do you do with the data entered by the users? What about data of files or data retrieved from remote sites? Are you verifying whether it comes in the format that your application expects?

I recommend to always validate external data in any case. This means that you should always check if data comes in the expected format, and do not proceed in case the data is not valid.

In the PHPClasses site and other Web projects I use this popular forms generation and validation class. It can be used to perform most common types of validation checks.

http://www.phpclasses.org/formsgeneration

Besides that, it can also discard invalid values and restore safe defaults when validation conditions are not met. This is an important detail, for instance to minimize eventual damage that could be caused, not by real users submitting invalid form values, but rather by robots spoofing data to be submitted via hidden inputs.

Another interesting feature implemented recently by a plug-in named secure_submit is meant to avoid CSRF (Cross-Site Request Forgery) attacks.

I am not going into much detail about these attacks, but they can be used to make your browser access sites and perform actions in your behalf even when you do not want to execute such actions.

This kind of attacks was used in the past to forge votes of users in certain Digg posts and even adding unwanted items to carts in shopping sites.

b) Encode your output

Are you properly encoding all your data when you serve HTML pages?

Some text characters need to be properly encoded when they are displayed in HTML pages. If you neglect that such characters may appear in the data that you want to display, you may be risking to introduce cross-site scripting security holes.

In many cases the solution may be as simple as using the PHP function HtmlSpecialChars.

http://www.php.net/htmlspecialchars

Are you using the same encoding of data taken from a database to be displayed in your site Web pages?

Nowadays, most browsers and databases support Unicode encodings such as UTF-8. If you take data from a database encoding in UTF-8, it must be displayed in your Web pages as UTF-8 or be converted to whatever encoding is used in the Web page.

Are you escaping literal text values when you execute database queries?

Text literal values used in SQL queries are usually delimited by single quote ' characters. Some less experienced developers just take whatever text they want to use in a query and add single quotes before and after the text.

If the text also has single quotes it may either fail in error or execute a query that is different than what is intended. This is often exploited by crackers to perform SQL injection attacks.

The solution is to either use text escaping functions or prepared queries. The PHPClasses uses the Metabase database abstraction package. It provides a database independent API to encode text literal values and also supports prepared queries. If you use a different database API, you should also look for how to use equivalent features.

http://www.phpclasses.org/metabase

- 3. Test your code

Regression tests are great. You just build scripts that execute your application components and then you verify whether the results are what is expected.

Usually you run regression tests before you install or update your application in production. If you changed something that breaks your application behavior, you will be able to fix the application (or the tests) before the eventual bugs that were introduced cause major problems to your site.

But I have to be honest with you. Regression tests are boring and expensive to produce. Often you do not have the time and the patience to write good regression tests.

There are some tools to make that task easier, but it is still boring and expensive to produce tests that cover all situations that your applications have to handle.

If you like to build test scripts and whoever pays your salary can afford the additional time that producing test scripts will take you, congratulations! Otherwise, I will not blame you for not bothering with that.

I have written many unit test scripts but those were mostly to test base components, like the database abstraction package that I use, or the e-mail composing components, forms generation, etc...

I see a lot of preaching towards test driven development. But often I also notice that many regression test implementations are mostly for base components and were only produced after the fact, i.e. after someone reported a bug that probably already caused trouble to a site. That is better than nothing, but is not by far exactly what the test driven development theory preaches.

Alternatively, you can publish your application or its components as Open Source and let the users help you with the testing. If there are any problems, chances are that users will report the problems to you.

That is one of the reasons why I created the PHPClasses site: have my PHP components be tested by as many people as possible.

- 4. Monitor your site errors and act upon them

No matter how much you try to prevent possible errors, you should always be prepared to handle them.

You should monitor your site and your server to check its health. If possible, act pro-actively.

In the PHPClasses site there are several scripts that run periodically from the cron program that check things like the available disk space.

Exhausting the disk space is not a normal thing to happen with this site. So I do not wait till the disk is full to do something about it. Whenever the disk space is below a certain threshold, the script sends me an e-mail so I can promptly check what is going on.

When the site is executing CPU intensive tasks, like delivering newsletters, I use a small class, yet unreleased, that monitors the CPU usage.

When the CPU usage reaches a very high value, the script that is running is forced to rest for a while. I used the PHP sleep function from within the script. Later it checks whether the CPU usage is below a threshold before resuming.

This prevents making the site too slow to for the users that are browsing the site, while heavy tasks without priority are being run in the background.

These cases above are well anticipated situations. When unexpected situations happen, they must be detected and notified so I can do something about them.

What I do is to enable PHP error log setting the options in php.ini like these:

error_reporting = E_ALL
display_errors = Off
display_startup_errors = Off
log_errors = On
log_errors_max_len = 0
ignore_repeated_errors = On
ignore_repeated_source = Off
report_memleaks = On
track_errors = On
html_errors = Off
error_log = /path/to/php_error_log

Then I use a small class named Log Watcher to keep monitoring the PHP error log file.

http://www.phpclasses.org/logwatcher

This class composes and sends a message to me with the latest lines added to the PHP error log.

This way I can act promptly whenever an error occurs. I lost count of how many times this simple class save me from major trouble.

- 5. Do not disclose errors to the users

If a task is executed by a script that serves a page to the users unexpectedly fails in error, just present an user-friendly message like this: "Sorry, for the time being the site is not available. The site administration is already aware of the problem. Please come back later."

Do not disclose any details of the error to the user, nor tell him to contact you. It will only make the user panic and the situation will not be solved telling the user to send you panic messages.

If you need to be notified, make your error handling code send you a message with the error details.

I have seen many sites that display ridiculous stack traces with all the names of functions, classes, parameters that have been called wherever the error occurred.

It is OK to dump that information to the page if you are running the site in your development environment. Do not do that in the production environment.

If you disclose too much sensitive error information in your site pages, that information may be used by malicious users to abuse of your site.

- 6. Damage control

No matter how much care you take, bad things may still happen. Since you do not know yet what may go wrong, at least you should be concerned about minimizing losses.

For instance, sites are always subject of Denial Of Service attacks by flooding your server with excessive requests. In some cases you can take preventive measures.

That happened some time ago to the PHPClasses site. I noticed there were too many users, that love the site so much, that they wanted to mirror it in their own computers.

Unfortunately that is not viable because it takes too much bandwidth and slows down the server for every user accessing the site at the same time.

In other cases it may not be possible to do much. If the site takes too many Web requests, one solution is to refuse connections from the machines that performing too much requests. I have used Apache mod_throttle in the past, but it was not quite stable.

Still I need to log in the server machine and take some actions. The problem is if the site takes too many requests, it may exhaust the server memory with excessive database connections, until it becomes unresponsive.

To prevent that problem, I had to configure Apache to not accept requests above a limit according to the available memory with a directive like this:

Maxclients 200

In this other article I explain a bit more about this and other directives:

http://www.meta-language.net/metabase-faq.html#7.2.2

If the site remains unresponsive for too long, there is also a script that restarts the Web server automatically. It is not an ideal solution but at least the site will not remain stuck for too long, especially when I am away and I cannot do anything about it.

- 7. Backup

Data loss may be one of the bad things that may happen. It may be caused by a buggy application, erroneous database schema upgrade, damaged disks, or even invasion of your servers by crackers. In any case it is always good to have a fresh backup at hand, so you can minimize the loss.

If you use MySQL, using the mysqldump program or a similar script at least once a day is better than not doing any backup at all. If possible transfer the backup files to several other machines, eventually in a different data center where the site server is running.

But I have an additional tip. Sometime ago I read an article about a security company that had backup tapes of credit card databases that were robbed during transportation in a security vehicle to different building. It seemed to me like some insider knew about what the vehicle was transporting and arranged the robbery.

All the hassles could have been avoided if the backup data was encrypted. That is what my backups scripts do. The important detail is that the backups are generated using GnuPG/PGP.

The data is encrypted with the public key of a special recipient. So, if the backup data is stolen, it cannot be recovered by a person that does not have the the private key of the recipient. Obviously the private key file is not in the server where the backup is taken.

- 8. Do what you can as you can never get defensive enough

Despite all my efforts to develop and run a site that does not give me any troubles, sometimes I have to deal with eventual problems because I have not used practices that are defensive enough.

The problem I mentioned in the beginning of this article, was that the newsletter service got jammed by a spam message . I suppose that the spammer guessed the address of the that the newsletter service queue mailbox.

The newsletter service skipped that message. However, it also prevented to process all the subsequent newsletters that were queued after that. The result was over 100 newsletters and alert messages that were left pending for 5 days.

Nothing was lost, but I only realized what was going on when some users that missed their newsletters wrote me asking if there was a problem with their accounts.

All the newsletters were processed in the weekend. I would like to apologize to all users that missed their newsletters and only got them all at once several days after.

Meanwhile the system was improved. Although the newsletter system jamming is no longer possible, the site will send me alert messages when there are more than 10 messages in the queue to be processed. The system may still get stuck for instance due to not enough disk space.

The bottom line is: relax, do what you can, and handle new problems later as they happen.

* Recommended reading

I think I have covered most of the defensive practices that I apply. Still I would like to point you to a couple of pages where you can learn more about this subject. One is the Wikipedia page about this matter:

http://en.wikipedia.org/wiki/Defensive_programming

The other is a chapter of the Getting Real book written by the fine folks of 37 Signals. They prefer developing in Ruby. I prefer PHP. That does not mean we cannot agree on important matters.

http://gettingreal.37signals.com/ch09_Get_Defensive.php

Comments (0) Trackbacks (0)

No comments yet.


Leave a comment


No trackbacks yet.