wiki:VIQUEN
Last modified 7 years ago Last modified on 09/01/10 17:25:49

VIQUEN: A Visual Query Engine for RDF

VIQUEN is a graphical tool for semantic query construction, execution and visualization that is based on the IML data flow graph transformation language for manipulating RDF data. VIQUEN enables the formulation of queries using a set of graphical query components and GUI-based editing actions. The formulated queries are automatically compiled into the IML query language, before being executed over local or online RDF data sets. The RDF data set resulting from the IML query is then visualized as a graph.

This document provides information about VIQUEN's main components and how to use the system.

For background information on IML, RDF, SPARQL and vSPARQL, please refer to the IML documentation, which may be found here.

Download and install VIQUEN

You can download a copy of the application at http://sig.biostr.washington.edu/share/downloads/VIQUEN_executable.zip. Unzip the file and then double click on the VIQUEN_querybuilder application icon to run the application. Sample queries may be found in the Queries folder, and sample RDF may be found in the RDF folder.

To get started building queries, try one of the following tutorials:
Tutorial 1: Simple extraction
Tutorial 2: Binding variables
Tutorial 3: Combining operations

Alternatively, a number of sample queries have been provided with the application (located in the VIQUEN Queries directory).
Simple queries to try include: Abdominal cavity query, Left frontal lobe query, Lung parts query, Craniofacial query, Soft palate query.
Complex queries include: Mitotic cell cycle query, Organ spatial location query, NCI Thesaurus (NCIt) simplification query, Blood contained in the heart query, Biosimulation model editor query, Blood fluid properties query, Radiologist liver ontology query.

Implementation

VIQUEN has been implemented as a platform independent Java application. The GUI components of the system have been built using the Java Swing toolkit, and both the query builder environment and the visualization environment utilize the JGraph visualization library. After the queries have been compiled into IML, they are executed over online RDF data sets using the Java AMF connection protocol, which connects to the Query Manager server. The RDF data sets returned after executing queries are parsed using the Jena Framework for building semantic web applications.

Query-building Environment



The query-building environment is used to graphically formulate semantic queries. The user interface, shown above, may be divided into four main parts:

  1. The toolbar and system menus
  2. The operation library palettes
  3. The main query-building workspace
  4. The query-building workspace outline

1. The toolbar and system menus

The toolbar and system menus have been designed to provide easy, single-click options for managing the workspace, including saving and loading queries, copying, pasting or deleting query operations, compiling queries into IML or changing the look-and-feel of the application. Several layout options have been provided which automatically structure the flow of the query operations in a space efficient manner. These may be accessed through the Diagram menu (Diagram -> Layout).

Several toolbar buttons have specialized query-building functionality. These include:
The data sources button: used to add, remove or edit the data sources and namespaces specified in the query.
The compile query button: used to automatically compile the query into IML and open the query execution environment.
The visualization button: opens the visualization environment in a separate window to enable the visualization of local RDF files.

2. The operation library palettes

The operation library palettes contain icons which represent query operations that may be added to the workspace. The query operations have been divided into five different palettes, with similar operations being grouped together. Operations are added to the main query-building workspace by dragging and dropping the appropriate icon from the relevant palette. Each palette additionally contains an Edge icon for adding directed edges to the data flow workspace.

The Extract palette contains shortcuts for the 5 Extract operations: Extract Edges, Extract Tree, Extract Reachable, Extract Path and Extract Recursive.
The Delete palette contains shortcuts for the 4 Delete operations: Delete Edges. Delete Node, Delete Property and Delete Tree.
The Replace palette contains shortcuts for the 7 Replace operations: Replace Edge Subject, Replace Edge Object, Replace Edge Property, Replace Edge Literal, Replace Node, Replace Property and Replace Literal.
The Where palette contains shortcuts for the 4 Where operations: Match Statements, Union Statements, Filter Statements and Optional Statements.
The Basic palette contains shortcuts for the remainder of the query operations: Start, Input, Output, Add Edges and Union Graphs.

3. The main query-building workspace

The main query-building workspace has been designed to take advantage of the data flow graph transformation style of IML. Each high-level query operation is represented in its own visual node. Nodes of the same type are the same color for easy identification. The visual nodes are then chained together, using directed edges, to compose the entire query. Each node contains maximize (+) and minimize (-) buttons that are used to expand and collapse the node. The bottom right corner of the node may be clicked and dragged to resize the node.

START NODE

A query must begin with a Start node, which indicates the point from which the system will start to compile the query. By positioning the Start node appropriately, different chunks of the query may be executed individually before combining them into a larger query. After the Start node, the query is defined by adding one or more subquery blocks to the workspace.

INPUT NODE

Each subquery block begins with an Input node, which defines the data sources to be used as input to the query. Clicking on the "Select input sources" button will bring up a list of available data sources which may be selected for inclusion in the query.

OUTPUT NODE

A subquery block must end with an Output node, which specifies the output graph for the block. This output graph may easily be added to the list of potential input data sources by clicking on the "Add to data sources" button and specifying a name for the output graph.

Between the Input node and the Output node, a number of different query operations may be added. These operations are applied to the input graphs to produce the desired output. The results of each operation within a block are passed to the next operation via a default graph. At the top of the block, the default graph is empty; the first operation will begin to populate that graph. The output of the block is the default graph after the last operation has been performed. The query operations that may be added to a block are described in detail below.

EXTRACT OPERATIONS

These are a set of operations provided specifically for extracting information from an RDF graph. The operations include extract edges, extract tree, extract reachable, extract path, and extract recursive.

Extract Edges

This operation specifies a subset of RDF edges that should be added to the default graph. "From Graph" indicates the graph that should be used to locate the specified RDF triples. The table indicates the triple pattern to be found. Another triple may be added to the table by clicking on the (+) button. Selecting a triple and clicking on the (-) button will delete that triple. If the specified triple pattern is found in the specified input graph, the triples are added to the default graph. (Note that this operation completely overwrites the default graph coming in to the operation.) Variables may be specified in the table using a "?", for example "?Object". The variables are then bound to sets of triples by adding a WHERE clause (see WHERE operations below).

Extract Tree

Allows a user to indicate a root node and a set of properties that should be recursively followed to extract a subgraph of the input. This operation will maintain the structure of the extracted subgraph. "From Graph" indicates the graph that should be used to construct the tree. "Root" indicates the node in the graph to be used as the root of the tree. The table specifies the properties/edges that should be followed, and the direction in which to follow the edges: outgoing (from the root node), incoming (to the root node) or both. Another property may be added to the table by clicking on the (+) button. Selecting a property and clicking on the (-) button will delete that property. Variables may be specified in the table using a "?", for example "?property". The variables are then bound to sets of triples by adding a WHERE clause (see WHERE operations below).

Extract Reachable

Allows a user to indicate a root node and a set of properties that should be recursively followed to identify the set of nodes that can be reached by traversing those properties. Unlike "Extract Tree", this operation does not maintain the structure of the extracted subgraph, but rather produces a flat list of the nodes that can be reached. "From Graph" indicates the graph that should be used to find the nodes. "Root" indicates the node in the graph from which to begin. The table specifies the properties/edges that should be followed, and the direction in which to follow the edges: outgoing (from the root node), incoming (to the root node) or both. Another property may be added to the table by clicking on the (+) button. Selecting a property and clicking on the (-) button will delete that property. Variables may be specified in the table using a "?", for example "?property". The variables are then bound to sets of triples by adding a WHERE clause (see WHERE operations below).

Extract Path

Allows a user to specify a root node, a leaf node and a list of properties; the operation returns the subgraph containing the path from the root to the leaf by recursively traversing the list of properties. "From Graph" indicates the graph that should be used to find the path. "Root" indicates the node in the graph from which to begin. "Leaf" indicates the node in the graph which is the end of the path. The table specifies the properties/edges that should be followed, and the direction in which to follow the edges: outgoing (from the root node), incoming (to the root node) or both. Another property may be added to the table by clicking on the (+) button. Selecting a property and clicking on the (-) button will delete that property. Variables may be specified in the table using a "?", for example "?property". The variables are then bound to sets of triples by adding a WHERE clause (see WHERE operations below).

Extract Recursive

This operation provides general recursion mechanism allowing users to precisely specify the edges that they want to follow and the output that should be produced. The operation is broken down into two parts: a set of base cases, and a set of recursive cases.

Base case: "From Graph" indicates the graph that should be used to locate the specified base case RDF triples. The table indicates the triple pattern to be found. Another triple may be added to the table by clicking on the (+) button. Selecting a triple and clicking on the (-) button will delete that triple. Variables may be specified in the table using a "?", for example "?Object". The variables are then bound to sets of triples by adding a WHERE clause (see WHERE operations below). The operation evaluates the base case and the results are added to the output <recursive> graph using set union.

Recursion: "From Graph" indicates the graph that should be used to locate the specified RDF triples. A recursive case may access both the incoming set of RDF graphs and the result set being produced, referenced by the graph name <recursive>. The table indicates the triple pattern to be found. Another triple may be added to the table by clicking on the (+) button. Selecting a triple and clicking on the (-) button will delete that triple. Variables may be specified in the table using a "?", for example "?Object". The variables are then bound to sets of triples by adding a WHERE clause (see WHERE operations below). The operation evaluates each case and new results are added to the output <recursive> graph using set union, and then the recursive cases are evaluated again and the results are added to <recursive>. This iteration process continues until a stable state is reached. (No new results are added to <recursive>.)

DELETE OPERATIONS

These are a set of operations for removing information from an RDF graph: delete edge, delete node, delete property, delete tree.

Delete Edge

Allows a user to indicate the specific RDF triples that should be deleted from a graph. "From Graph" indicates the graph that should be used to locate the specified RDF triples. The table indicates the triple pattern to be found. Another triple may be added to the table by clicking on the (+) button. Selecting a triple and clicking on the (-) button will delete that triple. If the specified triple pattern is found in the specified input graph, the triples are removed from the graph. Variables may be specified in the table using a "?", for example "?Object". The variables are then bound to sets of triples by adding a WHERE clause (see WHERE operations below).

Delete Node

Allows a user to indicate a specific node that should be deleted from the graph. "From Graph" indicates the graph that should be used to locate the specified node. "Node" indicates the node to be deleted. All edges that contain the node as either a subject or object are deleted from the graph. The user can specify either a URI or a variable (bound in a WHERE clause - see WHERE operations below) to indicate the node to be deleted.

Delete Property

Allows a user to indicate a specific property that should be deleted from the graph. "From Graph" indicates the graph that should be used to locate the specified property. "Property" indicates the property to be deleted. All edges with the specified property are deleted from the graph. The user can specify either a URI or a variable (bound in a WHERE clause - see WHERE operations below) to indicate the property to be deleted.

Delete Tree

Allows a user to indicate a subgraph that should be deleted from the input graph. "From Graph" indicates the graph from which to remove the tree. "Root" indicates the node in the graph to be used as the root of the tree. The table specifies the properties/edges that should be followed, and the direction in which to follow the edges: outgoing (from the root node), incoming (to the root node) or both. Another property may be added to the table by clicking on the (+) button. Selecting a property and clicking on the (-) button will delete that property. Variables may be specified in the table using a "?", for example "?property". The variables are then bound to sets of triples by adding a WHERE clause (see WHERE operations below).

REPLACE OPERATIONS

These are a set of operations for replacing information in an RDF graph: replace property, replace node, replace literal, and replace edge subject, replace edge property, replace edge object, replace edge literal.

Replace Property

This operation is used to replace all instances of a property. "From Graph" indicates the graph in which to find the property to replace. "Replace Property" indicates the name of the property that will be replaced. "New property" indicates the name of the new property that will be the replacement. All other edges in the input graph are unchanged and exist in the operation's output graph. Variables may be specified using a "?", for example "?property". Variables are then bound by adding a WHERE clause (see WHERE operations below).

Replace Node

This operation is used to replace all instances of a node. "From Graph" indicates the graph in which to find the node to replace. "Replace Node" indicates the name of the node that will be replaced. "New node" indicates the name of the new node that will be the replacement. All other edges in the input graph are unchanged and exist in the operation's output graph. Variables may be specified using a "?", for example "?node". Variables are then bound by adding a WHERE clause (see WHERE operations below).

Replace Literal

This operation is used to replace all instances of a literal. "From Graph" indicates the graph in which to find the literal to replace. "Replace Literal" indicates the name of the literal that will be replaced. "New literal" indicates the name of the new literal that will be the replacement. All other edges in the input graph are unchanged and exist in the operation's output graph. Variables may be specified using a "?", for example "?literal". Variables are then bound by adding a WHERE clause (see WHERE operations below).

Replace Edge Subject

This operation can be used to replace the subject of an RDF triple. "From Graph" indicates the graph in which to find the edges whose subjects will be replaced. The table indicates the triple pattern specifying the edges whose subjects will be replaced. Another triple may be added to the table by clicking on the (+) button. Selecting a triple in the table and clicking on the (-) button will remove that triple. Variables may be specified in the table using a "?", for example "?Object". The variables are then bound to statements in the graph by adding a WHERE clause (see WHERE operations below). "New Subject" indicates the replacement subject for the specified edges. All other edges in the input graph are unchanged and exist in the operation's output graph.

Replace Edge Property

This operation can be used to replace the property of an RDF triple. "From Graph" indicates the graph in which to find the edges whose properties will be replaced. The table indicates the triple pattern specifying the edges whose properties will be replaced. Another triple may be added to the table by clicking on the (+) button. Selecting a triple in the table and clicking on the (-) button will remove that triple. Variables may be specified in the table using a "?", for example "?Property". The variables are then bound to statements in the graph by adding a WHERE clause (see WHERE operations below). "New Property" indicates the replacement property for the specified edges. All other edges in the input graph are unchanged and exist in the operation's output graph.

Replace Edge Object

This operation can be used to replace the object of an RDF triple. "From Graph" indicates the graph in which to find the edges whose objects will be replaced. The table indicates the triple pattern specifying the edges whose objects will be replaced. Another triple may be added to the table by clicking on the (+) button. Selecting a triple in the table and clicking on the (-) button will remove that triple. Variables may be specified in the table using a "?", for example "?Object". The variables are then bound to statements in the graph by adding a WHERE clause (see WHERE operations below). "New Object" indicates the replacement object for the specified edges. All other edges in the input graph are unchanged and exist in the operation's output graph.

Replace Edge Literal

This operation can be used to replace a literal in an RDF triple. "From Graph" indicates the graph in which to find the edges whose literals will be replaced. The table indicates the triple pattern specifying the edges whose literals will be replaced. Another triple may be added to the table by clicking on the (+) button. Selecting a triple in the table and clicking on the (-) button will remove that triple. Variables may be specified in the table using a "?", for example "?Object". The variables are then bound to statements in the graph by adding a WHERE clause (see WHERE operations below). "New Literal" indicates the replacement literal for the specified edges. All other edges in the input graph are unchanged and exist in the operation's output graph.

Replace Edge Literal ensures that the node being replaced is a literal (only permitted in the object position of an RDF triple) and that it is indeed replaced by a literal. This is necessary because replacing an object resource (URI) with an literal (string) can produce invalid RDF.

UNION GRAPHS

This is an operation used to combine information from two or more RDF graphs. Clicking on the "Select sources to union" button will open a list of available data sources which may be selected for inclusion in the union operation. The operation produces the result of combining all of the selected graphs. Duplicate edges in multiple RDF graphs will only be seen once in the result graph.

ADD EDGES

This is an operation used to create new edges that will be added to an RDF graph. "From Graph" indicates the graph that the new edges should be added to. All the edges in this graph are combined with the new edges in the output graph. The table indicates the triple pattern to be added. Another triple may be added to the table by clicking on the (+) button. Selecting a triple in the table and clicking on the (-) button will remove that triple. Variables may be specified in the table using a "?", for example "?Object". The variables are then bound to statements in the graph by adding a WHERE clause (see WHERE operations below).

WHERE OPERATIONS

The WHERE clause describes patterns that the solutions need to match. Where operations are used to bind sets of RDF triples to unknown variables. Variables are specified using a "?" symbol, for example "?property". There are 4 different types of operations that may be performed in a WHERE clause: match statements, union statements, filter statements and optional statements. Note that WHERE clauses may be nested, and therefore it is possible to add additional WHERE clause operations from within a WHERE clause operation. An edge to the next operation in the query dataflow should not come from the WHERE operation, but rather from the operation to which the WHERE operation is attached.

Where Match Statements

Allows a user to specify an RDF triple pattern containing variables to be matched to sets of RDF statements. "From Graph" indicates the graph that should be used to locate the specified RDF triples. The table indicates the triple pattern to be found. Another triple may be added to the table by clicking on the (+) button. Selecting a triple and clicking on the (-) button will delete that triple. If the triple pattern matches sets of triples in the specified input graph, the triples are bound to the relevant variable and added as output for the operation.

Where Union Statements

Allows multiple different triple patterns to be matched to the same variables. Operations to be included are added from the "Union Statements" operation (there should be at least two additional operations for the union).

Where Filter Statements

This operation tests values within a graph. The "From Graph" indicates the graph that will be used to evaluate the constraints. The "AND statements" and "OR statements" specify the logic used to combine multiple constraints. The table specifies a list of constraints. Constraints may be added to the table by clicking on the (+) button. Selecting a constraint and clicking on the (-) button will delete that constraint. The "Filter Statements" must evaluate to TRUE in order for the pattern to match.

Where Optional Statements

Allows specified patterns to be made optional. Additional WHERE operations added from the "Optional Statements" operation will be added to the output if they exist, but will not make the query fail if they do not exist.

4. The query-building workspace outline

An outline of the main query-building workspace is provided at the bottom left-hand side of the screen, with a dark blue rectangle indicating the fraction of the workspace currently being viewed. The query workspace may be navigated by clicking and dragging on this rectangle.

Execution Environment



The query execution environment, shown above, consists of three components: 1. The query component, 2. The results component and 3. A simple menu bar. The query component displays the generated query in IML. Clicking on the "execute query" button will cause the query to be executed, and the results of the query are displayed in raw RDF/XML format in the result component. The system will also provide an alert indicating the number of RDF triples that have been returned by the query.

Both the generated query and the resulting RDF/XML may be saved to a local file using the "save query" and "save results" buttons. Clicking on the "visualize results" button will open VIQUEN’s visualization environment.

Visualization Environment



The visualization environment, shown above, facilitates exploration and manipulation of an RDF graph. The user interface may be divided into 5 main components:

  1. The toolbar and system menus
  2. The tree and list views
  3. The main visualization workspace
  4. The visualization workspace pop-up menu
  5. The visualization workspace outline

1. The toolbar and system menus

The visualization environment has been designed in a fashion consistent with the query-building environment, and utilizes similar layouts, menus and toolbars. As in the query-building environment, several automatic graph layout options are available from the menu (Diagram -> Layout). Additionally, VIQUEN visualizations can be loaded from and saved to disk using the same file format as that for saving visual queries.

The load RDF button allows locally saved RDF files to be loaded and visualized.

2. The tree and list views

The tree and list views of the RDF data set are located in the upper left hand side of the workspace. The tree view depicts the RDF using a tree structure showing the graph of nodes. The list view provides an alphabetized list of the nodes. Clicking on a node in either the tree view or the alphabetized list view will make the node available for viewing and manipulation in the main visualization workspace in the following way: if the node is currently visible, the system will select it and scroll to it. Alternatively, if the selected node is not currently visible, the system will make the node visible, along with its children and parent nodes.

3. The main visualization workspace

The main visualization workspace depicts the RDF visually as a graph consisting of nodes connected by edges. The nodes represent the subject and object of the RDF triple, while the edges represent the properties. Since queries may potentially return a large number of RDF triples, VIQUEN does not attempt to display the entire results graph on the screen at one time. This would make the resulting graph difficult to understand and navigate. Instead, one or more likely root nodes from which to start the visualization are found. The most appropriate of these root nodes may then be chosen using the tree or list view of the graph.

Properties in the visualization workspace are displayed as directed edges, starting at the subject of the RDF triple and going to the object, with the edge label consisting of the property name. Nodes are displayed in blue colored rectangles labeled with the name of the node. A node may be selected and moved by clicking and dragging it in the workspace. Positioning the mouse pointer over a node’s information icon will display the total number of incoming and outgoing edges for the node and the full name of the node. Clicking on the show children button will make all of the node's child nodes visible. Note that this button is only displayed in nodes that have children.

4. The visualization workspace pop-up menu

Additional functionality for further visualization and exploration of the RDF is made available to the user in a pop-up menu which is accessed by right clicking in the main visualization workspace. As well as the basic cut, copy, paste, delete and undo actions, three submenus group actions into select actions, group actions and show/hide actions. The select submenu has options to select all of the nodes, none of the nodes, the children of a particular node or the entire subtree rooted at a particular node. The group submenu has options which allow for a number of nodes to be grouped together and then collapsed into a single representative group node. The group may then be expanded and collapsed as a single unit, or opened in a separate visualization workspace for more detailed manipulation. The show/hide submenu provides a variety of choices for manipulating currently visible nodes: show or hide the child nodes, parent nodes or the subtree rooted at that node. There are also options to show or hide the entire graph, or just the selected portion of the graph.

5. The visualization workspace outline

An outline of the main visualization workspace is provided at the bottom left-hand side of the screen, with a dark blue rectangle indicating the fraction of the workspace currently being viewed. The visualization workspace may be navigated by clicking and dragging on this rectangle.

Attachments