Viewing Language Data in CSV Files
The MATScore and the MATReport tools both produce CSV files which
contain snippets of your input document. Viewing these CSV files is a
bit complicated, and deserves some attention.
The short answer is: use
OpenOffice rather than Excel.
Excel 2007 CSV import has some very unpleasant features which will
compromise your ability to view the data cleanly.
- If the file extension of your CSV file is .csv, when you open the
file normally (either by double-clicking or selecting "Open" from the
main menu), Excel will not offer you an import wizard, which will
to try to digest and interpret dates. So if your annotation happens to
span a date, Excel will recognize it as such and process it. Changing
the column format to Text after you import is useless, because Excel
has already discarded the original data. To avoid this, create a new
workbook, then select the
"From Text" option in the "Data" tab. This option is available only on
Windows; Mac Excel doesn't allow you to do this at all with a .csv
file, as far as we can tell.
- If the file extension of your CSV file is .txt, Excel will offer
you the import wizard (although now you can't open the file with a
double-click). The import wizard allows you to select the column
delimiter (comma), and also allows you to change the column format for
the columns you select. You
should change the columns which contain actual text to Text. However,
the import wizard screws up newlines in column data; even if they're
delimited with double-quotes, the import wizard treats them as separate
entries. So if there are any newlines in the spans of text displayed,
this strategy won't work.
Another issue is character encoding. All the CSV documents created
by the MAT tools are encoded in UTF-8. In order to view this data
correctly on Excel 2007, you must use the import wizard. Again, this
option is only available on Windows.
Because there's no consistent way of viewing the data in its clean
form, Excel isn't an appropriate tool, especially on the Mac.
Fortunately, OpenOffice 3 does
do the right thing. You'll be offered an import wizard when you open a
.csv file. Select the column delimiter (comma), and make sure to change
the column format to Text for each column which contains spans of text.