wiki:InstallVsparql
Last modified 7 years ago Last modified on 11/06/10 03:59:53

Installing vSPARQL on a Linux Machine

vSPARQL is a set of extensions to the SPARQL query language to support the creation of view definitions. The extensions provide subqueries, recursive subqueries, and Skolem functions for extracting, modifying, and combining RDF data. vSPARQL has been implemented on top of Jena's ARQ http://jena.sourceforge.net/ARQ/.

On this page, we describe how to get a vSPARQL installation up and running on a Linux machine. The steps include: 1) installing postgresql http://www.postgresql.org/, 2) installing and patching SDB http://openjena.org/SDB/ 3) creating and initializing an SDB store for a RDF data set 4) installing and patching ARQ http://jena.sourceforge.net/ARQ/ 5) running an example vSPARQL query

Throughout this wiki page we refer to $VSPARQL_HOME. This is the directory in which all of the software is being installed.

Installing PostgreSQL

vSPARQL uses SDB to store and access RDF data sets. Users can choose any SDB-supported RDBMS. Here we describe how we installed PostgreSQL for use with vSPARQL.

Download a copy of PostgreSQL from http://www.postgresql.org/ and compile and install. The following instructions install the binaries in the $VSPARQL_HOME/software directory. (If you already have an existing PostgreSQL installation, you can likely avoid this step.)

% cd $VSPARQL_HOME
% mkdir $VSPARQL_HOME/software
% tar xvfz postgresql-8.2.5.tar.gz
% cd postgresql-8.2.5
% ./configure --prefix=$VSPARQL_HOME/software
% gmake
% gmake install

Next, get PostgreSQL up and running. First, add all of the binaries to your path

% export PATH=$VSPARQL_HOME/software:$PATH

Create a database for the postgres server.

% mkdir $VSPARQL_HOME/DBMS
% mkdir $VSPARQL_HOME/DBMS/pgsql-8.2.5
% mkdir $VSPARQL_HOME/DBMS/pgsql-8.2.5/data
% export PGPORT=7777
% initdb -D $VSPARQL_HOME/DBMS/pgsql-8.2.5/data

Start running the postgres server. Setting PGPORT=7777 indicates that incoming connections to the postgres server should be on port 7777.

% export PGPORT=7777
% postgres -D $VSPARQL_HOME/DBMS/pgsql-8.2.5/data > logfile 2>&1 &

Installing and Patching SDB

vSPARQL uses SDB for its backend RDF store. The extensions require a small modification of SDB v1.3.0, which can be applied via a patch file. Additional information for installing SDB can be found on the SDB website http://openjena.org/SDB/. The postgresql server is accessed via JDBC.

Download sdb-1.3.0.zip from http://openjena.org/SDB/. Download the patch file [vsparql_for_SDB-1.3.0.patch].

% cd $VSPARQL_HOME
% unzip sdb-1.3.0.zip
% cd SDB-1.3.0
% patch -p1 < $VSPARQL_HOME/vsparql_for_SDB-1.3.0.patch
% cd ..
% mv SDB-1.3.0 SDB-1.3.0-vsparql

Although the code has now been patched, it needs to be built as well. There are two ways to do this: 1) run 'ant' http://ant.apache.org/ in the SDB-1.3.0-vsparql directory, or 2) load the code as a Java Project in Eclipse, open the build.xml file, and tell eclipse to run the build.xml file.

Download a JDBC driver from http://jdbc.postgresql.org/. We have found postgresql-8.3-603.jdbc3.jar to work well.

% cd $VSPARQL_HOME
% mkdir $VSPARQL_HOME/software/JDBC
% cp postgresql-8.3-603.jdbc3.jar $VSPARQL_HOME/software/JDBC

Set up the environment for running SDB. Substitute in your user name for $USERNAME.

% cd $VSPARQL_HOME/SDB-1.3.0-vsparql
% export SDBROOT=`pwd`
% export SDB_USER="$USERNAME"
% export SDB_JDBC="$VSPARQL_HOME/software/JDBC/postgresql-8.3-603.jdbc3.jar"
% export PATH=$SDBROOT/bin:$PATH

Creating and initializing an SDB store for a RDF data set

Creating an SDB store requires the creation of a SDB configuration file. More information is available about this at the SDB website; here we simply describe the bits that need to be changed for a basic installation. For this page, we assume that you have downloaded a copy of the FMA http://sig.biostr.washington.edu/projects/fm/in OWL, which is stored in the file fma3.0.owl. The name of this model is http://sig.biostr.washington.edu/fma3.0.

First, create a postgresql database for storing the RDF data. The name of the database is "fma_3_0".

% createdb -E utf8 fma_3_0

In an editor of your choice, create a file named sdb-fma-3.0.ttl. This is a configuration file for SDB. In an SDB configuration file, you describe the format of the SDB store, the name of the RDF data being stored, and the postgresql database that the data should be stored in.

Copy the following text into your text editor.

@prefix sdb:     <http://jena.hpl.hp.com/2007/sdb#> .
@prefix rdfs:	 <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .

[] ja:loadClass "com.hp.hpl.jena.sdb.SDB" .
sdb:DatasetStore rdfs:subClassOf ja:RDFDataset .

<#namedGraph> rdf:type sdb:Model ;
    sdb:namedGraph   <http://sig.biostr.washington.edu/fma3.0> ;
    sdb:dataset <#dataset> .

<#store> rdf:type sdb:Store ;
    sdb:layout         "layout2" ;
    sdb:connection     <#conn> ;
.
<#conn> rdf:type sdb:SDBConnection ;
    sdb:sdbType        "postgresql" ;
    sdb:sdbHost        "localhost:7777" ;
    sdb:sdbUser        "$USERNAME" ;
    sdb:sdbName	       "fma_3_0" ;
    sdb:driver         "org.postgresql.Driver" ;
    .

To customize this for your installation, substitute in your user name for $USERNAME. Change the sdb:namedGraph URI to the name of the RDF graph that you are storing. Change sdb:sdbHost to the machine name and port that your postgresql server is running on. Change the sdb:sdbName to the name of the database you just created.

SDB uses the sdb-fma-3.0.ttl configuration file for configuring and storing your RDF data. Execute the following commands to configure and load your RDF data. Substitute in the name of your RDF model and the file containing the data for the sdbload command.

% cd $VSPARQL_HOME
% sdbconfig --sdb=../sdb-fma-3.0.ttl --create
% sdbconfig --sdb=../sdb-fma-3.0.ttl --format
% sdbload --sdb=../sdb-fma-3.0.ttl --graph='http://sig.biostr.washington.edu/fma3.0' $VSPARQL_HOME/fma3.0.owl
% sdbconfig --sdb=../sdb-fma-3.0.ttl --index

Installing and Patching ARQ

vSPARQL has extended Jena's ARQ http://jena.sourceforge.net/ARQ/. The extensions require modification of ARQ-2.8.0, which can be applied via a patch file.

Download arq-2.8.0.zip from ARQ's website http://jena.sourceforge.net/ARQ/. Download the patch file [vsparql_for_ARQ-2.8.0.patch].

% cd $VSPARQL_HOME
% unzip arq-2.8.0.zip
% cd $VSPARQL_HOME/ARQ-2.8.0
% patch -p1 < $VSPARQL_HOME/vsparql_for_ARQ-2.8.0.patch
% cd $VSPARQL_HOME
% mv ARQ-2.8.0 ARQ-2.8.0-vsparql
% cd ARQ-2.8.0-vsparql
% cp $VSPARQL_HOME/software/JDBC/postgresql-8.3-603.jdbc.jar lib/.

Copy the library files that you built for SDB into the ARQ installation.

% cp $VSPARQL_HOME/SDB-1.3.0-vsparql/lib/sdb.jar $VSPARQL_HOME/ARQ-2.8.0-vsparql/lib/.

As with SDB above, we now need to compile the ARQ code into Java libraries. The easiest way to do this is to use Maven. http://maven.apache.org/

% cd $VSPARQL_HOME/ARQ-2.8.0-vsparql
% mvn package

Configure your shell so that you can run ARQ from the command line.

% cd $VSPARQL_HOME/ARQ-2.8.0-vsparql
% export ARQROOT=`pwd`
% export PATH=$ARQROOT/bin:$PATH

Running an example vSPARQL query

To execute vSPARQL queries against your RDF data, you need to create a ds-config.ttl file that contains all of the information that was used in constructing your SDB store.

% mkdir $VSPARQL_HOME/test
% cd $VSPARQL_HOME/test

In a text editor, open a file called ds-config.ttl. (This needs to be stored in the same directory from which you invoke vSPARQL.) The ds-config.ttl file starts with the <ds:/root> triple; dsc:dataSource property is used to indicate the SDB configurations that should be accessible in vSPARQL. <dsc:assemblerModel> is the starting point for providing information for our SDB store (described in the SDB configuration file above). Modify the ds-config.ttl file below to contain information about your SDB store.

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix sdb: <http://jena.hpl.hp.com/2007/sdb#> .
@prefix dsc: <http://localhost/ds-config#> .
@prefix fma: <http://sig.biostr.washington.edu/> .

[] ja:loadClass "com.hp.hpl.jena.sdb.SDB" .
sdb:DatasetStore rdfs:subClassOf ja:RDFDataset .

<ds:/root> rdf:type dsc:root ;
	dsc:dataSource <#source1> ;
    .

<#source1> dsc:modelName "http://sig.biostr.washington.edu/fma3.0"
	 ; dsc:assemblerModel <#model1>
	 .

<#model1> rdf:type sdb:Model ;
	sdb:namedGraph   fma:fma3.0 ;
    sdb:dataset <#dataset1> 
    .
    
<#dataset1> rdf:type sdb:DatasetStore ;
    sdb:store <#store1> 
    .
    
<#store1> rdf:type sdb:Store ;
    sdb:layout         "layout2" ;
    sdb:connection     <#conn1> ;
	.
	
<#conn1> rdf:type sdb:SDBConnection ;
    sdb:sdbType        "postgresql" ;
    sdb:sdbHost        "localhost:7777" ;
    sdb:sdbUser        "vsparql" ;
    sdb:sdbName	       "fma_3_0" ;
    sdb:driver         "org.postgresql.Driver" ;

Create a simple query for testing that you can correctly query your RDF data.

In a text editor, create a new file called current.arq. Enter your vSPARQL query.

PREFIX fma:<http://sig.biostr.washington.edu/fma3.0#>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>

SELECT *
FROM NAMED <liver_properties> [
  CONSTRUCT { fma:Liver ?b ?c }
  FROM <http://sig.biostr.washington.edu/fma3.0>
  WHERE {
      fma:Liver ?b ?c .
  }
] 
WHERE { GRAPH <liver_properties> { fma:Liver fma:regional_part ?c} }

Now run the vSPARQL query from the comment line.

% sparql --query current.arq