module Csv:sig
..end
typet =
string list list
class type in_obj_channel =object
..end
class type out_obj_channel =object
..end
exception Failure of int * int * string
Failure(nrecord, nfield, msg)
is raised to indicate a parsing
error for the field number nfield
on the record number
nrecord
, the description msg
says what is wrong. The first
record and the first field of a record are numbered 1
(to
correspond to the usual spreadsheet numbering but differing from
List.nth
of the OCaml representation).type
in_channel
val of_in_obj : ?separator:char -> ?excel_tricks:bool -> in_obj_channel -> in_channel
of_in_obj ?separator ?excel_tricks in_chan
creates a new "channel"
to access the data in CSV form available from the channel in_chan
.separator
: What character the separator is. The default is
','
. You should be aware however that, in the countries where
comma is used as a decimal separator, Excel will use ';'
as the
separator.excel_tricks
: enables Excel tricks, namely the fact that '"'
followed by '0' in a quoted string means ASCII NULL and the fact
that a field of the form ="..." only returns the string inside the
quotes. Default: true
.val of_channel : ?separator:char ->
?excel_tricks:bool -> Pervasives.in_channel -> in_channel
Csv.of_in_obj
except that the data is read from a
standard channel.val of_string : ?separator:char -> ?excel_tricks:bool -> string -> in_channel
Csv.of_in_obj
except that the data is read from a
string.val load : ?separator:char -> ?excel_tricks:bool -> string -> t
load fname
loads the CSV file fname
. If filename
is "-"
then load from stdin
.separator
: What character the separator is. The default
is ','
. You should be aware however that, in the countries
where comma is used as a decimal separator, Excel will use ';'
as the separator.excel_tricks
: enables Excel tricks, namely the fact that '"'
followed by '0' in a quoted string means ASCII NULL and the fact
that a field of the form ="..." only returns the string inside the
quotes. Default: true
.val load_in : ?separator:char -> ?excel_tricks:bool -> Pervasives.in_channel -> t
load_in ch
loads a CSV file from the input channel ch
.
See Csv.load
for the meaning of separator
and excel_tricks
.val to_in_obj : in_channel -> in_obj_channel
in_channel
buffers the data from
the original channel. If you want to examine the data by other
means than the methods below (say after a failure), you need to
use this function in order not to "loose" data in the
buffer.val close_in : in_channel -> unit
close_in ic
closes the channel ic
. The underlying channel
is closed as well.val next : in_channel -> string list
next ic
returns the next record in the CSV file.End_of_file
if no more record can be read.Csv.Failure
if the CSV format is not respected. The
partial record read is available with #current_record
.val fold_left : ('a -> string list -> 'a) -> 'a -> in_channel -> 'a
fold_left f a ic
computes (f ... (f (f a r0) r1) ... rN)
where r1,...,rN are the records in the CSV file. If f
raises an exception, the record available at that moment is
accessible through Csv.current_record
.val fold_right : (string list -> 'a -> 'a) -> in_channel -> 'a -> 'a
fold_right f ic a
computes (f r1 ... (f rN-1 (f rN a)) ...)
where r1,...,rN-1, rN are the records in the CSV file. All
records are read before applying f
so this method is not
convenient if your file is large.val iter : f:(string list -> unit) -> in_channel -> unit
iter f ic
iterates f
on all remaining records. If f
raises an exception, the record available at that moment is
accessible through Csv.current_record
.val input_all : in_channel -> t
input_all ic
return a list of the CSV records till the end of
the file.val current_record : in_channel -> string list
Failure
.val load_rows : ?separator:char ->
?excel_tricks:bool -> (string list -> unit) -> Pervasives.in_channel -> unit
type
out_channel
val to_out_obj : ?separator:char ->
?excel_tricks:bool -> out_obj_channel -> out_channel
to_out_obj ?separator ?excel_tricks out_chan
creates a new "channel"
to output the data in CSV form.separator
: What character the separator is. The default is ','
.excel_tricks
: enables Excel tricks, namely the fact that
'\000' is represented as '"' followed by '0' and the fact that a
field with leading or trailing spaces or a leading '0' will be
encoded as ="..." (to avoid Excel "helping" you). Default:
false
.val to_channel : ?separator:char ->
?excel_tricks:bool -> Pervasives.out_channel -> out_channel
Csv.to_out_obj
but output to a standard channel.val output_record : out_channel -> string list -> unit
output_record oc r
write the record r
is CSV form to the
channel oc
.val output_all : out_channel -> t -> unit
output_all oc csv
outputs all records in csv
to the channel
oc
.val save_out : ?separator:char ->
?excel_tricks:bool -> Pervasives.out_channel -> t -> unit
val save : ?separator:char -> ?excel_tricks:bool -> string -> t -> unit
save fname csv
Save the csv
data to the file fname
.val print : ?separator:char -> ?excel_tricks:bool -> t -> unit
val print_readable : t -> unit
stdout
in a human-readable format. Not
much is guaranteed about how the CSV is printed, except that it
will be easier to follow than a "raw" output done with
Csv.print
. This is a one-way operation. There is no easy way
to parse the output of this command back into CSV data.val save_out_readable : Pervasives.out_channel -> t -> unit
Csv.print_readable
, allowing the output to be sent to
a channel.val lines : t -> int
val columns : t -> int
val trim : ?top:bool -> ?left:bool -> ?right:bool -> ?bottom:bool -> t -> t
All four of the option arguments (~top
, ~left
, ~right
, ~bottom
)
default to true
.
The exact behaviour is:
~right
: If true, remove any empty cells at the right hand end of
any row. The number of columns in the resulting CSV structure will
not necessarily be the same for each row.
~top
: If true, remove any empty rows (no cells, or containing just empty
cells) from the top of the CSV structure.
~bottom
: If true, remove any empty rows from the bottom of the
CSV structure.
~left
: If true, remove any empty columns from the left of the
CSV structure. Note that ~left
and ~right
are quite different:
~left
considers the whole CSV structure, whereas ~right
considers
each row in isolation.
val square : t -> t
Csv.columns
.val is_square : t -> bool
val set_columns : int -> t -> t
set_columns cols csv
makes the CSV data square by forcing the
width to the given number of cols
. Any short rows are padded
with blank cells. Any long rows are truncated.val set_rows : int -> t -> t
set_rows rows csv
makes the CSV data have exactly rows
rows
by adding empty rows or truncating rows as necessary.
Note that set_rows
does not make the CSV square. If you want it
to be square, call either Csv.square
or Csv.set_columns
after.
val set_size : int -> int -> t -> t
set_size rows cols csv
makes the CSV data square by forcing
the size to rows * cols
, adding blank cells or truncating as
necessary. It is the same as calling set_columns cols
(set_rows rows csv)
val sub : int -> int -> int -> int -> t -> t
sub r c rows cols csv
returns a subset of csv
. The subset is
defined as having top left corner at row r
, column c
(counting
from 0
) and being rows
deep and cols
wide.
The returned CSV will be "square".
val compare : t -> t -> int
val concat : t list -> t
(To concatenate CSV files so that they appear from top to
bottom, just use List.concat
).
val transpose : t -> t
val to_array : t -> string array array
val of_array : string array array -> t
to_array
will produce a ragged matrix (not all
rows will have the same length) unless you call Csv.square
first.val associate : string list -> t -> (string * string) list list
associate header data
takes a block of data and converts each
row in turn into an assoc list which maps column header to data cell.
Typically a spreadsheet will have the format:
header1 header2 header3 data11 data12 data13 data21 data22 data23 ...
This function arranges the data into a more usable form which is robust against changes in column ordering. The output of the function is:
[ ["header1", "data11"; "header2", "data12"; "header3", "data13"]; ["header1", "data21"; "header2", "data22"; "header3", "data23"]; etc. ]
Each row is turned into an assoc list (see List.assoc
).
If a row is too short, it is padded with empty cells (""
). If
a row is too long, it is truncated.
You would typically call this function as:
let header, data = match csv with h :: d -> h, d | [] -> assert false;; let data = Csv.associate header data;;
The header strings are shared, so the actual space in memory consumed
by the spreadsheet is not much larger.