FMQL – FileMan Query Language
Goal
An endpoint that puts VistA on the Semantic Web.
Background
VistA
The VA’s VistA EHR is a set of loosely coupled modules built around a MUMPS-based datastore called FileMan.
Accessing FileMan Now
FileMan provides a set of routines built over MUMPS that implement a data store for networked data. It has a Terminal Interface (ScreenMan) and a local API (DataBase Server API). The VA’s RPC mechanism provides some remote access but, like most RPCs, it was designed for a custom, non web client (CPRS). FileMan has also been projected as a relational datastore.
RDF stores – Fileman’s Siblings
FileMan has much in common with Graph Databases. Both provide access to flexible “networked” data as opposed to the strictly-defined, tabular information of relational stores. The w3c’s Resource Description Framework (RDF) is a standard for such stores.
RDF has an SQL-like query language called SPARQL, an endpoint specification and response serializations for XML and JSON. Currently, FileMan has no equivalents but given its closeness to RDF stores, it can leverage their specifications.
Being SQL-like as opposed to being SQL is an important distinction. Try to pretend that a graph-store is relational and you loose much of the format’s advantages. Such stores need their own query forms. But SQL cannot be ignored. Drawing on SQL’s well known terms makes for acceptance – words like “SELECT” or “WHERE” should be reused whenever possible.
Some advantages of providing this form of access to FileMan are:
- Infrastructure that accesses SPARQL/Semantic Web endpoints can just as readily access FileMan. (This tackles face-on the notion that FileMan is old or about to go or …)
- FileMan reports can be written in a variety of programming and scripting languages, inside and outside a browser.
- Flexible FileMan access enables VistA federation and synchronization of Health Data Repositories with VistA patient data
- A query language succinctly exposes the underlying model of a data store, making that store much more accessible. It provides complete, data driven as opposed to hard coded and selective access
caregraf
caregraf.org promotes the deployment of the Semantic Web to enable the next generation of Healthcare applications.
Many organizations are defining Health-Care in the Semantic Web – the makeup of its drugs, its many procedures, diseases – but there is a “patient gap”. Patient information – encounters, specific diagnoses, concrete prescriptions – remains locked in proprietary EMRs or, worse, in piles of paper.
caregraf is exploring different mechanisms to put the patient onto the Semantic Web so that Health-Care information – both clinical and conceptual – interlinks seemlessly.
FMQL – a SPARQL Profile
A SPARQL profile defines the subset of the SPARQL query language supported by a particular endpoint. FMQL defines a subset of SPARQL suitable for querying VA VistA FileMan.
FMQL breaks into levels of semantic access, each one building on the last.
- Linked Data Traversal: allow a walk through all data and schema information stored in FileMan
- OWL/RDF: represent FileMan’s naming schemes and schema as OWL/RDF.
- Filter: support granular queries through a subset of SPARQL.
Other access issues include …
- SPARQL-HTTP: supported operations follow SPARQL’s syntax and HTTP-use conforms to its profile for endpoints
- FileMan as a collection of interlinked subgraphs for Patient, Institution, Terminology, System
- Linking out FileMan terminologies to standard equivalents
Note: This list is subject to change.
Linked Data Traversal
With linked data, a client walks URLs to gather the information it needs. To satisfy this level of FMQL, an endpoint must provide such access to all data and schema information in a FileMan instance.
It must support the following operations, four for FileMan data and three for its schema.
FileMan data access …
- Describe
- Return a chunk of information about an entity – its literal values and blank nodes, the references it has to other entities.
- SelectAllOfType
- Return a list of the nodes of a particular type
- SelectAllReferrers
- Return the list of nodes that refer to a particular node
- SelectReferrersOfType
- Return the list of nodes of a type that refer to a particular node
FileMan schema access …
- DescribeType
- Return a chunk of information about a (file) type – its fields and their types, its description, name and numeric identifier
- SelectAllTypes
- Return a list of all (file) types supported in a system
- SelectAllReferrersToType
- Return the list of (file) types whose instances may refer to the instances of a type
FileMan data types must be represented as follows
- IEN (internal entry number)
- uri (form [institution url]/[FILE #]-[Entry #])
- DATE/TIME (1)
- xsd:datetime
- NUMERIC (2)
- xsd:int
- SET (3)
- xsd:string
- FREE TEXT (4)
- xsd:string
- WORD-PROCESSING (5)
- typed literal (xml)
- COMPUTED (6)
- TBD
- POINTER (7)
- uri (same format as IEN)
- VARIABLE-POINTER (8)
- uri (same format as IEN)
- SUBFILE (9)
- blank node (subordinate data) or node
FMQL Reference Implementation
The FMQL Reference Implementation is the open source code and configuration needed for an FMQL endpoint. Its goal is correct behavior and proving the usefulness of the different levels of FMQL for VistA application development.
It uses
- Resource RPC: the MUMPS-side of Medsphere’s OVID, this RPC provides remote access to FileMan’s Database APIs.
- Apache and its wsgi module for Python endpoints.
Each level of FMQL conformance will bring a new release. For example, the Reliable Linked Data (RLD) release – will implement Linked Data Delivery support.
Appendix – (FileMan) Resource RPC
In granularity, the protocol mirrors the main functions of the Database API. MSCFM (Entry Point) leads to handlers for GETS, LIST etc.
Each handler invokes Database API calls such as
D FIND^DIC(THIS("FILE"),THIS("IENS"),THIS("FIELDS"),THIS("FLAGS"),THIS("IDXVALUE"),
THIS("NUMBER"),THIS("IDXNAME"),THIS("SCREEN"),,"TARGET")
In format, the RPC protocol is length-prefixed, name-value pairs, some are meta (operation type), some carry query values (See: NEXTPROP^MSCRES)
Appendix – SPARQL JSON Responses
SPARQL JSON has a header and results section. This is similar to the structure of the MUMPS arrays returned by the DataBase Server API …
Let’s query for patients and their doctors …
SELECT ?name ?doctor ?doctorName
WHERE {?r :name ?name ; :attending_physician ?doctor . ?doctor :name ?doctorName}
leads to this response …
{
"head": { "vars": [ "name" , "doctor", "doctorName" ]
} ,
"results": {
"bindings": [
{
"name": { "type": "literal" , "value": "Joe Blogs" },
"doctor": { "type": "uri", "value": "..." },
"doctorName": { "type": literal", "value": "Fred Smith" }
} ,
....
]
}
}
Key points about this JSON …
- “head” contains a list of every variable in the response. Not every one need be set in every binding. Think of a table where some row cells are empty but the header’s cells each have labels.
- there is at least one “binding” for every patient. A patient has as many bindings as she has doctor’s.
- in FileMan terms both the internal and external form of “doctor” are in the response. In this case, “doctor” is included in the response – the querier must want the doctor’s id for another query.
- every variable assertion is typed for ease of processing. Beyond the basic literal, you can have typed-literal’s which include XMLLiteral. It could hold FileMan’s Word Processing fields.
