Software, your way.
burger menu icon
WillMaster

WillMaster > LibraryWebsite Development and Maintenance

FREE! Coding tips, tricks, and treasures.

Possibilities weekly ezine

Get the weekly email website developers read:

 

Your email address

name@example.com
YES! Send Possibilities every week!

Valid File Name Characters

File names are composed of characters. Most simple printable characters are valid within file names. Some are not.

When you work with only one operating system, it is fairly easy to know if a certain character is valid when used within a file name. Try it and see if it works.

However, when your file may be transported to other operating systems, it is possible for a file name valid on your operating system to contain characters invalid on the destination system.

The destination operating system might automatically convert certain invalid characters into something it can use during the copying process, but it also might not. What might or might not happen when transferring a file with invalid characters depends on the characters and the destination operating system, and the software being used for copying.

Instead of listing all valid file name characters, a general rule for cross-OS file names is this: File names may be composed of any ASCII printable characters (including the keyboard spacebar character) except these eleven: * " / \ < > : | ? ^ ~

Note that file names themselves can be invalid even when they are composed only of valid file name characters. This can happen when the operating system reserves a certain file name for its own use.

With all Unix and Linux versions I am familiar with, these two file names are reserved by the operating system for its own use:

. ..

With various versions of Windows, file names may not end with a space or a dot. Also in Windows, these file names are reserved, with or without file name extensions:

CON, PRN, AUX, NUL COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9 LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, LPT9

Although it may be valid for an operating system, some software has issues when a space character begins or ends a file name.

Recently, I needed to use email addresses as file names for storing data. This was for PHP software that might run on any of various operating systems. Its data files might also be transported to other operating systems.

I made a function to sanitize the addresses for use as file names. This needed to be done because the software had no control over what the person might type in as an email address.

This is the function.

function SanitizeFileName($fn)
{
   $newfn = '';
   $ouch = array();
   foreach( unpack("C*",'*"/\<>:|?^~') as $c ) { $ouch[$c]=true; }
   foreach(unpack("C*",$fn) as $ord)
   {
      $newfn .= ( ($ord<32 or $ord>126 or isset($ouch[$ord]) ) ? '-' : chr($ord) );
      // Invalid characters and parts of characters (like UTF8 has) are replaced with "-".
   }
   $newfn = trim($newfn);
   // Space characters at either end of file name are removed.
   if($newfn == '.') { $newfn = '_-_'; }
   // Unix/Linux reserved file name "." is replaced with "_-_".
   if($newfn == '..') { $newfn = '_-_-_'; }
   // Unix/Linux reserved file name ".." is replaced with "_-_-_".
   $newfn = preg_replace('/^(CON|PRN|AUX|NUL|COM[1-9]|LPT[1-9])(\.|\Z)/i','_-_.$1',$newfn);
   // Windows reserved words, whether or not followed by a period, are replaced with "_-_." and the reserved word.
   return $newfn;
}

The SanitizeFileName() function is designed to accept any string of characters and return a valid file name for Linux, Unix, and Windows operating systems.

Here is code to test the function with. (The colors are for visual correlation and ease of reading.)

$tentativeFileName = 'will:will@example.com';
echo "<h1>$tentativeFileName</h1>";
$sanitizedFileName = SanitizeFileName($tentativeFileName);
echo "<h1>$sanitizedFileName</h1>";

The above will output (as H1 headers) these two lines:

will:will@example.com
will-will@example.com

When your file needs to be, or might be, transported to different computers, verify the file name contains only characters that are valid across operating systems. Also that the file name is valid that way.

If you wish, you can provide your file name to function SanitizeFileName() to obtain a sanitized file name. The above code for testing the function can be used.

(This content first appeared in Possibilities newsletter.)

Will Bontrager

Was this article helpful to you?
(anonymous form)

Support This Website

Some of our support is from people like you who see the value of all that's offered for FREE at this website.

"Yes, let me contribute."

Amount (USD):

Tap to Choose
Contribution
Method

All information in WillMaster Library articles is presented AS-IS.

We only suggest and recommend what we believe is of value. As remuneration for the time and research involved to provide quality links, we generally use affiliate links when we can. Whenever we link to something not our own, you should assume they are affiliate links or that we benefit in some way.

How Can We Help You? balloons
How Can We Help You?
bullet Custom Programming
bullet Ready-Made Software
bullet Technical Support
bullet Possibilities Newsletter
bullet Website "How-To" Info
bullet Useful Information List

© 1998-2001 William and Mari Bontrager
© 2001-2011 Bontrager Connection, LLC
© 2011-2025 Will Bontrager Software LLC