The System Biology Graph Extender

2407 Words5 Pages

chapter{Introduction}

The System Biology Graph Extender(SBGE) ~cite{eckman} extends the relational database management system with graph objects and operations to support SQL queries over biological networks. It uses SQL to manage graph nodes and edge property data and uses functional extensions to SQL via extit{user defined functions(UDF)} to manage graph operations. UDFs can be used anywhere in an SQL query that an expression of their return type can be used. It defines a graph as a first--class SQL data type, permitting graph instances using SQL in the same way with strings and numbers. Graph operations such as compare, combine, find components, compute shortest paths are implemented as UDFs. A SBGE query is still text--based SQL--like query. It is not as intuitive as a visual query diagram and hard to adapt to. A exploring neighborhood in function space example query would result in a lengthy SQL script.

The Pathway Query Language(PQL) ~cite{UlfLeser} is similar to the SBGE in that it is built on top of an RDBMS and address the same use case as biological networks. PQL has a similar syntax to SQL though the semantics of queries is quite different. Only node variables can appear in the FROM clause. Query evaluation considers each node variable in the FROM clause, for each of these all possible assignment of a variable to nodes of the graph are determined with the conditions of WHERE clause evaluate to true. PQL has a special syntax to allow path expression in both SELECT and WHERE clauses. Unlike SBGE, a PQL query is compiled into a PL/SQL stored procedure. The PQL is limited to its graph data model: nodes are molecules and interactions. It can only answer queries about the properties of a node or a path finding quer...

... middle of paper ...

...ivalent translation to SQL3. The algebra hence provides a semantics for a query language and facilitate a query language implementation based on this algebra. A unifying database architecture using a visual query language based on this algebra is proposed.

end{itemize}

section{Thesis Organization}

The rest of this thesis is organized as follows. Chapter 2 illustrates the background knowledge that is related to this thesis work. Chapter 3 describes the implemented visual graph database system. A query benchmark in genomics domain that is used to test the performance of the system are described in Chapter 4. The query optimization experiment and results are presented and analyzed in Chapter 5. An algebra of a visual query language for gene feature topological relationship is introduced in Chapter 6. Chapter 7 discusses final conclusion and future work.

Open Document