Select from where
WE SHOULD REWRITE THIS SECTION TO GET RID OF THE MANY TYPE DEFINITIONS. JUST USE THE ONE OF USE CASESCDuce is endowed with a select_from_where syntax to perform some SQL-like queries. The general form of select expressions is
select e from p1 in e1, p2 in e2, : pn in en where b
where e is an expression b a boolean expression, the pi's are patterns, and the ei's are sequence expressions.
The select_from_where construction is translated into:
transform e1 with p1 -> transform e2 with p2 -> ... transform en with pn -> if b then [e] else []
XPath-like expressions
XPath-like expressions are of two kind : e/t , e/@a , (and e//t ) where e is an expression, t a type, and a an attribute.
They are syntactic sugar for :
flatten(select x from <_ ..>[(x::t | _ )*] in e)
and
select x from <_ a=x ..>_ in e
Examples
Types and data for the examples
Let us consider the following DTD and the CDuce types representing a Bibliography
<!ELEMENT bib (book* )> <!ELEMENT book (title, (author+ | editor+ ), publisher, price )> <!ATTLIST book year CDATA #REQUIRED > <!ELEMENT author (last, first )> <!ELEMENT editor (last, first, affiliation )> <!ELEMENT title (#PCDATA )> <!ELEMENT last (#PCDATA )> <!ELEMENT first (#PCDATA )> <!ELEMENT affiliation (#PCDATA )> <!ELEMENT publisher (#PCDATA )> <!ELEMENT price (#PCDATA )> type bib = <bib>[book*] type book = <book year=String>[title (author+ | editor+ ) publisher price ] type author = <author>[last first ] type editor = <editor>[last first affiliation ] type title = <title>[PCDATA ] type last = <last>[PCDATA] type first = <first>[PCDATA] type affiliation = <affiliation>[PCDATA] type publisher = <publisher>[PCDATA] type price = <price>[PCDATA]
and some values
let biblio : bib = <bib>[ <book year="1994">[ <title>['TCP/IP Illustrated'] <author>[ <last>['Stevens'] <first>['W.']] <publisher>['Addison-Wesley'] <price>['65.95'] ] <book year="1992">[ <title>['Advanced Programming in the Unix environment'] <author>[ <last>['Stevens'] <first>['W.']] <publisher>['Addison-Wesley'] <price>['65.95'] ] <book year="2000">[ <title>['Data on the Web'] <author>[ <last>['Abiteboul'] <first>['Serge']] <author>[ <last>['Buneman'] <first>['Peter']] <author>[ <last>['Suciu'] <first>['Dan']] <publisher>['Morgan Kaufmann Publishers'] <price>['39.95'] ] <book year="1999">[ <title>['The Economics of Technology and Content for Digital TV'] <editor>[ <last>['Gerbarg'] <first>['Darcy'] <affiliation>['CITI'] ] <publisher>['Kluwer Academic Publishers'] <price>['129.95']] ]
Projections
- All titles in the bibliography biblio
let titles = [biblio]/book/title
Which yields to:
val titles : [ title* ] = [ <title>[ 'TCP/IP Illustrated' ] <title>[ 'Advanced Programming in the Unix environment' ] <title>[ 'Data on the Web' ] <title>[ 'The Economics of Technology and Content for Digital TV' ] ]
- All authors in the bibliography biblio
let authors = [biblio]/book/<author>_
Yielding the result:
val authors : [ <author>[ last first ]* ] = [ <author>[ <last>[ 'Stevens' ] <first>[ 'W.' ] ] <author>[ <last>[ 'Stevens' ] <first>[ 'W.' ] ] <author>[ <last>[ 'Abiteboul' ] <first>[ 'Serge' ] ] <author>[ <last>[ 'Buneman' ] <first>[ 'Peter' ] ] <author>[ <last>[ 'Suciu' ] <first>[ 'Dan' ] ] ]
Note the difference between this two projections.
In the fist one, we use the preset type title (type title = <title>[PCDATA ] ).
In the second one, the type <author>_ means all the xml fragments beginning by the tag author ( _ means Any), and this tag is without attribute. In contrary, we write note <author ..>_. - All books having an editor in the bibliography biblio
let edibooks = [biblio]/<book year=_>[_* editor _*]
Yielding:
val edibooks : [ <book year=String>[ title editor+ publisher price]* ] = [ <book year="1999">[ <title>[ 'The Economics of Technology and Content for Digital TV'] <editor>[ <last>[ 'Gerbarg' ] <first>[ 'Darcy' ] <affiliation>[ 'CITI' ] ] <publisher>[ 'Kluwer Academic Publishers' ] <price>[ '129.95' ] ] ]
- All books without authors.
let edibooks2 = [biblio]/<book ..>[(Any\author)*]
Yielding:
val edibooks2 : [ <book year=String>[ title editor+ publisher price]* ] = [ <book year="1999">[ <title>[ 'The Economics of Technology and Content for Digital TV'] <editor>[ <last>[ 'Gerbarg' ] <first>[ 'Darcy' ] <affiliation>[ 'CITI' ] ] <publisher>[ 'Kluwer Academic Publishers' ] <price>[ '129.95' ] ] ]
- The year of books having a price of 65.95 in the bibliography biblio
let books = [biblio]/<book ..>[_* <price>['65.95']]/@year
Yielding:
val books : [ String* ] = [ "1994" "1992" ]
- All the authors and editors in biblio
let aebooks = [biblio]/book/(author|editor)
Yielding:
val aebooks : [ (editor | author)* ] = [ <author>[ <last>[ 'Stevens' ] <first>[ 'W.' ] ] <author>[ <last>[ 'Stevens' ] <first>[ 'W.' ] ] <author>[ <last>[ 'Abiteboul' ] <first>[ 'Serge' ] ] <author>[ <last>[ 'Buneman' ] <first>[ 'Peter' ] ] <author>[ <last>[ 'Suciu' ] <first>[ 'Dan' ] ] <editor>[ <last>[ 'Gerbarg' ] <first>[ 'Darcy' ] <affiliation>[ 'CITI' ] ] ]
An interesting point in Cduce is the static typing of an expression. By example if we consider the third projection, "All books having an editor in the bibliography", CDuce knows the type of the result of the value edibooks:
val edibooks : [ <book year=String>[ title editor+ publisher price]* ] = ...
This type represents a book without author (see the book type in the type declaration in the top of this section). Now if we want to know all authors of this list of books edibooks:
let authorsofedibooks = edibooks/author
Yelding:
Warning at chars 24-39: This projection always returns the empty sequence val authorsofedibooks : [ ] = ""
In fact the value edibooks must be a subtype of [<_ ..>[Any* author Any*] *], and here this is not the case. If you want to be sure, test this:
match edibooks with [<_ ..>[_* author _*] *] -> "this is a subtype" | _ -> "this is not a subtype" ;;
An other projection should be convince you, is the query:
let freebooks = [biblio]/book/<price>['0'] ]
Yelding:
val freebooks : [ <price>[ '0' ]* ] = ""
There is no free books in this bibliography, That is not indicated by the type of biblio. Then, this projection returns the empty sequence ("")
Select_from_where
The same queries we wrote above can of course be programmed with the select_from_where construction
All the titles
let tquery = select y from x in [biblio]/book , y in [x]/title
This query is programmed in a XQuery-like style largely relying on the projections. Note that x and y are CDuce's patterns. The result is:
val tquery : [ title* ] = [ <title>[ 'TCP/IP Illustrated' ] <title>[ 'Advanced Programming in the Unix environment' ] <title>[ 'Data on the Web' ] <title>[ 'The Economics of Technology and Content for Digital TV' ] ]
Now let's program the same query with the translation given previously thus eliminating the y variable
let withouty = flatten(select [x] from x in [biblio]/book/title)
Yielding:
val tquery : [ title* ] = [ <title>[ 'TCP/IP Illustrated' ] <title>[ 'Advanced Programming in the Unix environment' ] <title>[ 'Data on the Web' ] <title>[ 'The Economics of Technology and Content for Digital TV' ] ]
But the select_from_where expressions are likely to be used for more complex queries such as the one that selects all titles whose at least one author is "Peter Buneman" or "Dan Suciu"
let sel = select y from x in [biblio]/book , y in [x]/title, z in [x]/author where ( (z = <author>[<last>['Buneman']<first>['Peter']]) || (z = <author>[<last>['Suciu'] <first>['Dan']]) )
Which yields:
val sel : [ title* ] = [ <title>[ 'Data on the Web' ] <title>[ 'Data on the Web' ] ]
Note that the corresponding semantics, as in SQL, is a multiset one. Thus duplicates are not eliminated. To discard them, one has to use the distinct_values operator.
A pure pattern example
This example computes the same result as the previous query except that duplicates are eliminated. It is written in a pure pattern form (i.e., without any XPath-like projections)
let sel = select t from <_ ..>[(x::book| _ )*] in [biblio], <_ ..>[ t&title _* (<author>[<last>['Buneman']<first>['Peter']] | <author>[<last>['Suciu'] <first>['Dan']]) ; _] in x
Note the pattern on the second line in the from clause. As the type of an element in x is type book = <book year=String>[title (author+ | editor+ ) publisher price ], we skip the tag : <_ ..>, then we then capture the corresponding title (t &title) then we skip authors _* until we find either Peter Buneman or Dan Suciu (<author>[<last>['Buneman']<first>['Peter']]| <author>[<last>['Suciu'] <first>['Dan']]), then we skip the remaining authors and the tail of the sequence by writing ; _
Result:
val sel : [ title* ] = [ <title>[ 'Data on the Web' ] ]
This pure pattern form of the query yields (in general) better performance than the same one written in an XQuery-like programming style. However, the query optimiser automatically translates the latter into a pure pattern one
Joins
This example is the exact transcription of query Q5 of XQuery use cases. On top of this section we give the corresponding CDuce types. We give here the type of the document to be joined, and the sample value.
type Reviews =<reviews>[Entry*] type Entry = <entry> [ Title Price Review] type Title = <title>[PCDATA] type Price= <price>[PCDATA] type Review =<review>[PCDATA] let bstore2 : Reviews = <reviews>[ <entry>[ <title>['Data on the Web'] <price>['34.95'] <review> ['A very good discussion of semi-structured database systems and XML.'] ] <entry>[ <title>['Advanced Programming in the Unix environment'] <price>['65.95'] <review> ['A clear and detailed discussion of UNIX programming.'] ] <entry>[ <title>['TCP/IP Illustrated'] <price>['65.95'] <review> ['One of the best books on TCP/IP.'] ] ]
The queries are expressed first in an XQuery-like style, then in a pure pattern style: the first pattern-based query is the one produced by the automatic translation from the first one. The last query correponds to a pattern aware programmer's version.
XQuery style
<books-with-prices> select <book-with-price>[t1 <price-bstore1>([p1]/Char) <price-bstore2>([p2]/Char)] from b in [biblio]/book , t1 in [b]/title, e in [bstore2]/Entry, t2 in [e]/Title, p2 in [e]/Price, p1 in [b]/price where t1=t2
Automatic translation of the previous query into a pure pattern (thus more efficient) one
<books-with-prices> select <book-with-price>[t1 <price-bstore1>x10 <price-bstore2>x11 ] from <_ ..>[(x3::book|_)*] in [biblio], <_ ..>[(x9::price|x5::title|_)*] in x3, t1 in x5, <_ ..>[(x6::Entry|_)*] in [bstore2], <_ ..>[(x7::Title|x8::Price|_)*] in x6, t2 in x7, <_ ..>[(x10::Char)*] in x9, <_ ..>[(x11::Char)*] in x8 where t1=t2
Pattern aware programmer's version of the same query (hence hand optimised). This version of the query is very efficient. Be aware of patterns.
<books-with-prices> select <book-with-price>[t2 <price-bstore1>p1 <price-bstore2>p2] from <bib>[b::book*] in [biblio], <book ..>[t1&title _* <price>p1] in b, <reviews>[e::Entry*] in [bstore2], <entry>[t2&Title <price>p2 ;_] in e where t1=t2
More complex Queries: on the power of patterns
let biblio = [biblio]/book;; <bib> select <book (a)> x from <book (a)>[ (x::(Any\editor)|_ )* ] in biblio
This expression returns all book in bib but removoing the editor element. If one wants to write more explicitly:
select <book (a)> x from <book (a)>[ (x::(Any\<editor ..>_)|_ )* ] in biblio
Or even:
select <book (a)> x from <book (a)>[ (x::(<(_\`editor) ..>_)|_ )* ] in biblio
Back to the first one:
<bib> select <book (a)> x from <(book) (a)>[ (x::(Any\editor)|_ )* ] in biblio
This query takes any element in bib, tranforms it in a book element and removes sub-elements editor, but you will get a warning as capture variable book in the from is never used: we should have written <(_) (a)> instead of <(book) (a)> the from
select <(book) (a)> x from <(book) (a)>[ (x::(Any\editor)|_ )* ] in biblio
Same thing but without tranforming tag to "book".
More interestingly:
select <(b) (a\id)> x from <(b) (a)>[ (x::(Any\editor)|_ )* ] in biblio
removes all "id" attribute (if any) from the attributes of the element in bib.
select <(b) (a\id+{bing=a.id})> x from <(b) (a)>[ (x::(Any\editor)|_ )* ] in biblio
Changes attribute id=x into bing=x However, one must be sure that each element in bib has an id attribute if such is not the case the expression is ill-typed. If one wants to perform this only for those elements which certainly have an id attribute then:
select <(b) (a\id+{bing=a.id})> x from <(b) (a&{id=_})>[ (x::(Any\editor)|_ )* ] in biblio
An unorthodox query: Formatted table generation
The following program generates a 10x10 multiplication table:
let bg ((Int , Int) -> String) (y, x) -> if (x mod 2 + y mod 2 <= 0) then "lightgreen" else if (y mod 2 <= 0) then "yellow" else if (x mod 2 <= 0) then "lightblue" else "white";; <table border="1"> select <tr> select <td align="right" style=("background:"@bg(x,y)) >[ (x*y) ] from y in [1 2 3 4 5 6 7 8 9 10] : [1--10*] from x in [1 2 3 4 5 6 7 8 9 10] : [1--10*];;
The result is the xhtml code that generates the following table:
|