Tab-separated values
A tab-separated values (TSV) file is a simple text format for storing data in a tabular structure, e.g., database table or spreadsheet data,[1] and a way of exchanging information between databases.[2] Each record in the table is one line of the text file. Each field value of a record is separated from the next by a tab character. The TSV format is thus a type of the more general delimiter-separated values format.
Filename extension | .tsv , .tab |
---|---|
Internet media type |
text/tab-separated-values |
Type of format | multiplatform, serial data streams |
Container for | database information organized as field separated lists |
Standard | IANA MIME type |
TSV is a simple file format that is widely supported, so it is often used in data exchange to move tabular data between different computer programs that support the format. For example, a TSV file might be used to transfer information from a database program to a spreadsheet.
TSV is an alternative to the common comma-separated values (CSV) format, which often causes difficulties because of the need to escape commas – literal commas are very common in text data, but literal tab stops are infrequent in running text. The IANA standard for TSV[2] achieves simplicity by simply disallowing tabs within fields.
Example
For example, the head of the Iris flower data set can be stored as a TSV using the following plain text (note that the HTML rendering may convert tabs to spaces):
Sepal length Sepal width Petal length Petal width Species 5.1 3.5 1.4 0.2 I. setosa 4.9 3.0 1.4 0.2 I. setosa 4.7 3.2 1.3 0.2 I. setosa 4.6 3.1 1.5 0.2 I. setosa 5.0 3.6 1.4 0.2 I. setosa
The TSV plain text above corresponds to the following tabular data:
Sepal length | Sepal width | Petal length | Petal width | Species |
---|---|---|---|---|
5.1 | 3.5 | 1.4 | 0.2 | I. setosa |
4.9 | 3.0 | 1.4 | 0.2 | I. setosa |
4.7 | 3.2 | 1.3 | 0.2 | I. setosa |
4.6 | 3.1 | 1.5 | 0.2 | I. setosa |
5.0 | 3.6 | 1.4 | 0.2 | I. setosa |
Conventions for lossless conversion to TSV
Since the values in the TSV format cannot contain literal tabs or new line characters, a convention is necessary for lossless conversion of text values with these characters. A common convention is to perform the following escapes:[3][4]
\n for newline, \t for tab, \r for carriage return, \\ for backslash.
References
- How To Use Tab Separated Value (TSV) Files Published by the International Monetary Fund
- "Definition of tab-separated-values (tsv)". Internet Assigned Numbers Authority (IANA).
- "Linear TSV". Data Protocols - Open Knowledge Foundation.
- "jq Manual". stedolan.github.io.
Bibliography
- IANA, Text Media Types, Definition of tab-separated-values (tsv), Paul Lindner, U of MN Internet Gopher Team, June 1993
- Tab Separated Values (TSV): a format for tabular data exchange, Jukka Korpela, created 2000-09-01, last update 2005-02-12.