wiki:minutes_05-13-08
Last modified 10 years ago Last modified on 05/16/08 15:55:52

Meeting minutes from 05/13/2008

Attendance:

  • UW: Jim Brinkley, Dan Cook, Linda Shapiro, Todd Detwiler, Dan Suciu, Marianne Shaw, Jiun-Hung Chen, Onard Mejino, Wolfgang Gatterbauer
  • Stanford: Daniel Rubin

Summary

  1. Todd introduced a new property function in Gleen, "Subgraph", that binds all of the triples traversed during the evaluation of a path expression. He tried to illustrate what would be returned through an expression and a graph that he drew on the board.

A multi-source query which combines results from Antoine's spatial query database with results from the FMA. The spatial query service tells us what is anterior to a given structure, in this case "Abdominal part of esophagus", while the FMA tells us which of these results are organs.

A sample call to Subgraph which demonstrates that that cyclic paths can be properly handled.

A VSparQL/Gleen hybrid query which produces the Liver view as described in Marianne's AMIA paper.

  1. Marianne has been looking at several different types of optimizations for the vSPARQL extensions.
  • Schema variants * She looked at several different schemas for speeding up recursive queries. ARQ/Jena has support for a db format called RDB; this is basically a triple store with long strings and URLS put in a separate table. SDB/Jena has support for a db format called layout2/index (and layout2/hash); this puts all strings and URLs into a separate nodes table and uses their sequence ID number in a table that contains all triples statements.

She ran transitive closure over the entire in two ways:

  • ALL runs transitive closure over the entire graph for all properties at the same time.
  • EACH runs transitive closure over the entire graph for each property, one after the other.

The different schemas that she used:

  • RDB, standard:
    • standard: out-of-the-box ARQ DBMS schema
  • statement tables has subj, prop, obj varchar(250); long URLs and strings are in a separate table
  • RDB, short URL:
    • same schema as RDB, standard
  • Remove "bioontology.org/projects/ontologies" from URLs to shorten their length
    • This could potentially decrease memory footprint and the cost of comparisons
  • RDB, string(10):
    • same schema as RDB, standard, except varchar(10); this causes almost all URLs and strings to be placed in a separate table
  • the idea was to see how much the size of strings affected our performance
  • SDB, layout2/index:
    • all strings and URLs are stored in a separate nodes table; the statement table uses sequence IDs from the nodes table
  • sequence IDs are 4 byte integers

SEE ATTACHED SPREADSHEET for PostgreSQL tuning parameters and results.

  • Optimizing queries generated by ARQ
    • ARQ frequently generates large (1000s) of SQL queries for queries that could be answered more efficiently using a higher-level db construct, like a join. She is looking to see if there is benefit to placing a "middleware" box that interposes on the ARQ queries and determines whether or not smarter queries can be sent to the DBMS. If a smarter query can be constructed, it will be submitted to the DBMS and the results will be returned on subsequent ARQ queries without consulting the DBMS.

RADLEX from FMA

  • She has been working with Onard to use Protege to create a concrete "view" of the FMA that is equivalent to what Onard has previously produced. Whereas Onard imposed structure on RADLEX, this would simply be modifying the structure of the FMA to be appropriate for RADLEX.
  1. Jiun-Hung presented the initial results of the algorithm he used to identify the liver in a set of 5000 CT scans from 20 patients. See the attached slides.

Attachments