Changes between Initial Version and Version 1 of SilkRouteIssues

05/08/06 13:34:25 (13 years ago)



  • SilkRouteIssues

    v1 v1  
     1= Issues with SilkRoute in production use = 
     3The ability to query a non-trivial XML view of a relational database is a powerful idea. 
     4It allows users to keep existing data sources and integrate them with varieties of data 
     5efficiently using a single query language.  However, for all of its promise, the current 
     6implementation has some deficiencies that make it difficult to use in a production  
     7environment.  There are 2 broad categories of problems: uninformative errors and an  
     8incomplete implementation of XQuery. 
     10== Query Syntax Errors == 
     12Take the example of a query with a simple syntax error, such as a misspelled "where" keyword: 
     14$ /usr/bin/silkroute -db pg_bmap_repo -u xbrain_user -p Yo_YoMa userquery51255.uq   
     15SilkRoute error was: Fatal error: exception Error.Query(_) 
     17The above error tells me almost nothing except that something is wrong with the query.  It gives me no indication that I misspelled “where” in the query.  Even if I were to use the `number(object)` function to cast a value to a number, I get the same error.  A possible solution to this is to validate the syntax before translating it to SQL. Saxon provides some syntax validation: 
     19$ java -cp $CP net.sf.saxon.Query myquery.xq 
     20Error on line 11 column 2 of file:myquery.xq: 
     21  XPST0003: XQuery syntax error in $a/results/patient whare $#: 
     22    expected "return", found name "whare" 
     23Failed to compile query 
     26== XQuery Problems == 
     27A worse problem is SilkRoute's incomplete implementation of XQuery.   
     28One major failing is that SilkRoute does not implement all of the built in functions.  This makes one of our queries nearly impossible to write as it requires a cast (find all labels with value less than 10, where `label` is a varchar).  To make matters worse, this query initially returned incorrect results because SilkRoute silently did a string comparison. 
     30Another failing is an incorrect implementation of XQuery comparisons, equals on 2 sequences, appears to be broken.  We should be able to select all trials that with error code 2 or 9 by using {{{trial/trialcode = (’2’, ‘9’)}}}.  Instead this returns no results in SilkRoute. 
     31With little active development, it also appears unlikely that SilkRoute will keep up with XQuery or Galax developments.  This will only put SilkRoute further behind. 
     33== Possible Workarounds == 
     35=== Fixing SilkRoute === 
     36The obvious solution is to apply our programming expertise to fix SilkRoute 
     37and/or add the required functionality to Galax, the XQuery engine  
     38Unfortunately, SilkRoute and Galax are written in OCaml and none of us 
     39have OCaml programming experience, not to mention we are not familiar  
     40with the codebase. 
     42=== Database functions === 
     43Another possible solution would be to implement the XQuery/XPath built in  
     44functions in the database using stored routines and provide no helpful  
     45casting in the translation. This would require a lot of work and  
     48=== In-Memory Alternative === 
     49The solution that seems most promising is to continue to use 
     50SilkRoute for its strength (an XQuery view over a RDB), but  
     51process queries through a different XQuery engine. This could 
     52be done either by retaining SilkRoute for rough initial results 
     53and then post-processing, or by fully materializing the XML 
     54view of the database and using an in-memory XQuery engine. 
     56We are already performing some post-processing of results; the 
     57main shortcoming of this approach is that we never know exactly 
     58what will cause SilkRoute to fail until we try it or notice some 
     59inconsistencies in results. 
     61Eider explored using the materialized database (currently a 22MB 
     62XML file) with an in-memory query processor. To ensure that it has 
     63adequate performance he benchmarked Saxon with our database  
     64pre-loaded into memory: 
     66||<rowstyle="background-color: #C0C0C0;">Query || Saxon results || SilkRoute speed || 
     67||P50 All mapped sites||0.3 s||fast|| 
     68||All trials with error code ‘2’||1.1 s||fast|| 
     69||All trials with error code ‘2’ or ‘9’||1.2 s||slow or incorrect|| 
     70||All sites with label < 10||0.9 s||incorrect|| 
     71SilkRoute was not accurately benchmarked, but “fast” should be interpreted as under 1 s and “slow” as greater than 15 s.  In a Web Service context Saxon is slightly faster than above and comparable to SilkRoute’s fast results. The disadvantage of this approach is that is requires materializing the view and takes a lot of memory (currently 200-300Mb for the 22MB file). 
     73== Conclusion == 
     74Both expert and novice users are likely to gain much of their experience in XQuery from tutorials, and trial and error.  Uninformative error messages makes the trial and error process burdensome.  An incomplete implementation makes it difficult to learn from tutorials, since users are constantly annoyed by finding described features missing.  To make the problem worse, no information is given that a query failed because a feature was missing (as opposed to a syntax error).  These issues are problematic, but are possible to work around.  The cases where incorrect results are returned are a much more vexing obstacle.  While a fully conforming SilkRoute remains the ideal solution, the current implementation is not adequate for our needs.  Unless SilkRoute sees further improvements, a Saxon based Web Service is a more viable alternative for us.