Changes between Version 7 and Version 8 of minutes_05-13-08


Ignore:
Timestamp:
05/16/08 15:55:52 (10 years ago)
Author:
onardmejino
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • minutes_05-13-08

    v7 v8  
    2121Marianne's AMIA paper. 
    2222 
    23 2. Jiun-Hung presented the initial results of the algorithm he used to identify the liver in a set of 5000 CT scans from 20 patients. See the attached slides. 
     232. Marianne has been looking at several different types of optimizations for the vSPARQL extensions.  
     24 
     25 * Schema variants * 
     26  She looked at several different schemas for speeding up recursive queries. ARQ/Jena has support for a db format called RDB; this is 
     27  basically a triple store with long strings and URLS put in a separate table. SDB/Jena has support for a db format called layout2/index (and   layout2/hash); this puts all strings and URLs into a separate nodes table and uses their sequence ID number in a table that contains all triples statements. 
     28 
     29  She ran transitive closure over the entire in two ways:  
     30       - ALL runs transitive closure over the entire graph for all properties at the same time. 
     31 
     32       - EACH runs transitive closure over the entire graph for each property, one after the other. 
     33 
     34  The different schemas that she used: 
     35       - RDB, standard:  
     36         - standard: out-of-the-box ARQ DBMS schema 
     37 
     38         - statement tables has subj, prop, obj varchar(250); long URLs and strings are in a separate table 
     39 
     40       - RDB, short URL: 
     41         - same schema as RDB, standard 
     42 
     43         - Remove "bioontology.org/projects/ontologies" from URLs to shorten their length 
     44                - This could potentially decrease memory footprint and the cost of comparisons 
     45 
     46       - RDB, string(10): 
     47         - same schema as RDB, standard, except varchar(10); this causes almost all URLs and strings to be placed in a separate table 
     48               - the idea was to see how much the size of strings affected our performance 
     49 
     50       - SDB, layout2/index: 
     51         - all strings and URLs are stored in a separate nodes table; the statement table uses sequence IDs from the nodes table 
     52 
     53         - sequence IDs are 4 byte integers 
     54 
     55 SEE ATTACHED SPREADSHEET for PostgreSQL tuning parameters and results. 
     56 
     57 * Optimizing queries generated by ARQ  
     58   - ARQ frequently generates large (1000s) of SQL queries for queries 
     59     that could be answered more efficiently using a higher-level db 
     60     construct, like a join. She is looking to see if there is benefit 
     61     to placing a "middleware" box that interposes on the ARQ queries 
     62     and determines whether or not smarter queries can be sent to the 
     63     DBMS. If a smarter query can be constructed, it will be submitted 
     64     to the DBMS and the results will be returned on subsequent ARQ 
     65     queries without consulting the DBMS.  
     66 
     67**** RADLEX from FMA **** 
     68 
     69 * She has been working with Onard to use Protege to create a concrete 
     70   "view" of the FMA that is equivalent to what Onard has previously 
     71   produced. Whereas Onard imposed structure on RADLEX, this would 
     72   simply be modifying the structure of the FMA to be appropriate for 
     73   RADLEX. 
     74 
     753. Jiun-Hung presented the initial results of the algorithm he used to identify the liver in a set of 5000 CT scans from 20 patients. See the attached slides.