Understanding CSV File Format
CSV stands for Comma Separated Values. It is a plain text format that can be used to store data. The format can also be used to transfer information from one software to another.
Here is the format:
-
The values within a CSV file is separated with a comma.
-
If any value itself contains a comma or a linefeed, the value is delimited with quotation marks (which means the value has a quotation mark at its beginning and at its end).
-
Further, if the value itself contains a quotation mark, it is delimited with quotation marks and quotation marks within the value are escaped with either a backslash character or another quotation mark.
Records are separated with a linefeed.
All that can sound confusing. Here is an example CSV file with three records.
red,white,"also, white",blue "blue, too","and also ""grayish"", next to the blue" green,yellow,purple
Let's address each of the above records separately (one line per record).
The first record has four value items:
-
red
[the value contains no comma or quotation mark] -
white
[the value contains no comma or quotation mark] -
also, white
[in the CSV file, the values are delimited with quotation marks because the value contains a comma.] -
blue
[the value contains no comma or quotation mark]
The second record has two value items:
-
blue, too
[in the CSV file, the value is delimited with quotation marks because the value contains a comma.] -
and also ""grayish"", next to the blue
[in the CSV file, the value is delimited with quotation marks because the value contains a comma and also because the value contains at least one quotation mark. Quotation marks within the value are doubled to distinguish them from the delimiting quotation marks.]
The third record has three value items:
-
green
[the value contains no comma or quotation mark] -
yellow
[the value contains no comma or quotation mark] -
purple
[the value contains no comma or quotation mark]
Although they can be created manually, CSV files generally are created with software.
CSV files can be somewhat readable by humans. However, the files generally are meant to be read by software.
You now know how CSV files are formatted.
Rows of values are one record per line. Values are separated with a comma. Values that contain a comma, a linefeed, or a quotation mark are delimited with quotation marks. Quotation marks within values are doubled to distinguish them from the delimiting quotation marks.
(This content first appeared in Possibilities newsletter.)
Will Bontrager