Appendix A – Remote Database Access¶
Note
You are reading a digital copy of SQL-99 Complete, Really, a book that documents the SQL-99 standard.
The book and the standard does not reflect the features of CrateDB, we are just publishing it as a service for the community and for reference purposes.
On the one hand, CrateDB does not implement the SQL-99 standard thoroughly, on the other hand, it extends the standard for implementing cluster features and others.
For more information specific to CrateDB, check out the CrateDB Reference documentation.
Remote Database Access (RDA) is a standard for interfacing to databases via some network. The standards for networks are usually bound up with a hierarchy called the Open Systems Interconnect (OSI) layers, which look like this:
LAYERS (from top to bottom) WHAT'S DEFINED
--------------- the programs the end-users see
- Application -
---------------
---------------- negotiating syntax of messages
- Presentation -
----------------
---------------- naming, session establishment
- Session -
----------------
---------------- acknowledgments, sequence numbers, time-outs
- Transport -
----------------
---------------- routing through network addresses
- Network -
----------------
---------------- packet structure
- Datalink -
----------------
---------------- bit stream, transmission rate
- Physical -
----------------
Typically, a DBMS programmer worries about Application, the top layer. RDA is a specification for the next lower layers – Presentation and Session. In other words: END USERS AND APPLICATION PROGRAMMERS HAVE NO DIRECT INTEREST IN RDA BECAUSE IT CONCERNS LOW-LEVEL CODING. Typically, the DBMS vendor supplies some mediating software which covers or obscures the network communication job.
All right: for most people, it’s not vital to understand RDA in any detail. What we think is useful is a short summary and tutorial. From this chapter, you’ll get an idea of:
What RDA’s job is.
What sort of information goes back and forth on the network.
What “client/server” actually means.
Table of Contents
ISO/IEC 9759¶
The standard for RDA is ISO/IEC 9759 “Remote Database Access For SQL”. This is a completely different document from the standard SQL document (ISO/IEC 9075), so it’s not correct to say that RDA is part of the SQL standard. But it’s a related standard, and the respective committees work together closely.
Much of the impetus for RDA came from an industry consortium known as the SQL
Access Group (SAG). In 1991 members of the SAG consortium produced the first
prototypes for version 1 of RDA. The latest edition is Version 3, which is
still unofficial at the time we’re writing this. Version 3 is quite different
from the earlier versions. It operates very much like the SQL/CLI
specification, using similar or even the same names and parameters for most of
its functions. For example, the SQL/CLI function SQLExecDirect
has an RDA
equivalent: RDAExecDirect
. The parameters are similar (a handle and a
string with an SQL statement and a size word). The effect is similar (an SQL
statement is parsed and executed). The big difference, and the whole point of
RDA, is architecture: SQLExecDirect
is a function which you call directly,
RDAExecDirect
is a message which you pass to a “remote” entity.
The nearby functions are the “client”, the messages they send out are called “requests”. The remote entity is the “server”, the messages that it sends back are called “responses”.
TCP/IP¶
In the early RDA versions, the main concern was to interface via OSI. The latest version is quite specific: the main transport layer should be Transaction Connect Protocol / Internet Protocol, or TCP/IP. Actually TCP/IP is a family of protocols, made famous by the fact that The Internet runs on TCP/IP.
The TCP/IP protocol family is primarily responsible for the level “below” RDA – the Transport Layer. And, TCP/IP interfaces in its turn with the lower levels. The OSI stack works like a bucket brigade: the top level sends buckets down to the next level which passes to the next level and so on; meanwhile, buckets are coming back the other way. Which buckets are empty or full, depends on where the fire is at the moment.
The RDA is what’s between you (the application) and TCP/IP (the transport).
The Client/Server Model¶
The client sends requests to the server. The server sends responses to the client.
client ----------requests------------->
<---------responses------------- server
There are only eleven RDA requests. Each request designates a distinct operation that the client wants the server to perform. Here’s the list:
Request What the client wants the server to perform
RDAConnect Set up connection between client and server
RDADisconnect Cancel the connection made with RDAConnect
RDAEndTran COMMIT, ROLLBACK, or "prepare for COMMIT"
RDAClientAttribute Note what client's resource attributes are
RDAStatementPrepare Prepare an SQL statement
RDAStatementDeallocate Unprepare, close Cursor, free statement resource
RDAStatementExecute Execute a prepared SQL statement
RDAStatementExecDirect Prepare and execute a SQL statement in one go
RDAStatementFetchRows FETCH selected rows and send contents to client
RDAStatementCloseCursor No more selected rows will be fetched, clean up
RDAStatementCancel Cancel any of the "RDAStatement..." requests
There is only one RDA response. A response designates the results of a server’s performance – or attempted performance – of a client request. Here’s the list:
Response What client should do with the response
"RDAResponse" Read diagnostics and/or fetched row values
The eleven requests, and the response are described below. You may assume that all the messages contain identifiers and handles which are necessary to synchronize operations. We have omitted such “protocol” information in our descriptions. We have included only the “parameter” information, which is necessary to illustrate the essential point of each particular operation.
RDAConnect¶
Message Content:
DestinationServerName character string name of server
UserName character string name of server
Authentication character string name of server
What the server is supposed to do:
Answer the phone! The client is sending enough information for the server to
set up a connection – notice that the parameters here are the same as the
parameters used for the SQL/CLI SQLConnect
function.
The server will be responsible for a particular “cluster of Catalogs” – which we may think of as a “database”. There’s no standard way to pass a database name with this call, so we can assume that the usual expectation is that different databases will have different servers. In other words: by specifying the server, you are indirectly specifying the database that should be opened.
RDADisconnect¶
Message Content:
<nothing>
What the server is supposed to do:
Hang up the phone! Cut off the connection that was made with RDAConnect
.
The disconnection will probably free the server so it can do other tasks, or perhaps shut down. The connection could be important for the network too – you’re freeing up a line so that the network server can allow other users to get on. But TCP/IP is stateless, so the concept of “freeing up a line” is less meaningful than it would be with other network protocols (such as NETBIOS).
RDAEndTran¶
Message Content:
CompletionType 32-bit INTEGER flag: COMMIT OR ROLLBACK
What the server is supposed to:
The RDAEndTran
marks a transaction boundary, and the flag will usually
indicate COMMIT
or ROLLBACK
, so the usual job is simple: the server
should end a current transaction with COMMIT
or with ROLLBACK
.
There is a reason that RDAEndTran
is a distinct message – that makes it
easier for the network to notice it, without having to know how to parse SQL
statements. The network might intercept RDAEndTran
, because the job of
“committing” might affect more than just one server job.
One other flag value that CompletionType
may contain (besides COMMIT
or
ROLLBACK
) is: PREPARE TO COMMIT
. As part of what we call the “two-phase
commit” process, the first step is to co-ordinate: get all the servers on the
network to acknowledge that they are all ready to commit. The second step is
the COMMIT
itself. Two-phase commit is advanced stuff; many DBMSs will
refuse to support the PREPARE TO COMMIT
operation.
RDAClientAttribute¶
Message Content:
ClientAttributes array List of attributes
What the server is supposed to do:
Take note of what attributes the client has. An “attribute” is an env
or
dbc
or stmt
attribute. That is, it’s a number or string that was
earlier passed to the client, via a SQLSetEnvAttr
or SQLSetConnectAttr
or SQLSetStmtAttr
function call.
The client won’t send a new RDAClientAttribute
message every time its own
attributes change. That would result in too much network traffic. One of the
features of client/server architecture is: the server doesn’t do everything.
The client is capable of storing data locally for a particular connection and
only passing on what’s necessary when it’s necessary.
RDAStatementPrepare¶
Message Content:
StatementIdent 32-bit integer sort of a handle
StatementText character string SQL preparable statement
What the server is supposed to do:
Prepare an SQL statement. The client might be able to do some primitive syntax checking, but for the full job of binding and parsing, the client has to pass the SQL statement to the server. Only the server has direct access to the database and its metadata.
If we were calling SQLPrepare
, we’d pass a hstmt
(handle of a
stmt
). But a handle is a local identifier. There’s no guarantee that a
handle value will be unique over the whole network. Therefore, instead of a
hstmt
, the client must pass a StatementIdent
. This is a handle of a
handle. Given a StatementIdent
, the client and server can each look up what
their respective hstmt values are for the same stmt
.
RDAStatementDeallocate¶
Message Content:
StatementIdent 32-bit integer sort of a handle
What the server is supposed to do:
Free a stmt
. Compare the SQL/CLI function
SQLFreeHandle(SQL_HANDLE_STMT,...)
.
With client-server, any statement resource (stmt
) will be duplicated in two
places: the client and the server. When the client is done with a stmt
, it
should send an RDAStatementDeallocate
message to the server, so that the
duplicate will be destroyed too.
RDAStatementExecute¶
Message Content:
StatementIdent 32-bit integer sort of a handle
ParameterDescriptor <array> info re "host variable"
ParameterData <array> "host variable" content
What the server is supposed to do:
Execute an SQL statement. Presumably the statement was prepared earlier, due to
an RDAStatementPrepare
message with the same StatementIdent
.
The ParameterDescriptor
and ParameterData
fields – which weren’t
necessary for SQLPrepare
– represent the solution to a rather tough
question: what should be done with input parameters? There are no direct RDA
equivalents for the SQL/CLI functions that handle “parameter descriptors”.
They’d be useless anyway, because SQL/CLI parameter descriptors require
pointers and pointers have no meaning to a job on a different machine. So what
happens is: the client bundles up all the input parameter values, and sends
them as part of the message.
This might mean that requests get monstrous. But RDA’s designers thought it
would be a bad idea to split RDAStatementExecute
into multiple separate
messages. What if the messages didn’t all arrive in the right order?
RDAStatementExecDirect¶
Message Content:
StatementIdent 32-bit integer sort of a handle
StatementText character string SQL preparable statement
ParameterDescriptor <array> info re "host variable"
ParameterData <array> "host variable" content
What the Server is supposed to do:
Prepare what’s in StatementText
, then execute it (using the input
parameters if there are any).
RDAStatementExecDirect
is logically redundant: we could accomplish the same
thing by sending RDAStatementPrepare
followed by RDAStatementExecute
.
Such redundancy is occasionally justifiable because it’s more efficient to send
one message instead of two. Network traffic is expensive.
RDAStatementFetchRows¶
Message Content:
StatementIdent 32-bit integer sort of a handle
FetchOrientation 32-bit integer "next", "first", etc.
FetchOffset 32-bit integer start, relative to last fetch
FetchCount 32-bit integer how many rows to fetch
What the server is supposed to do:
Get the values in each of the indicated rows, and ship them to the client.
Presumably there was an early RDAStatementExecute
or
RDAStatementExecDirect
message for this StatementIdent
, which caused a
query to be executed. So the server has a result set that it can fetch the rows
from.
The FetchOrientation
and FetchOffset
parameters contain values
equivalent to those which are used by the SQL/CLI function SQLFetchScroll
.
For example, FetchOrientation
might equal SQL_FETCH_ABSOLUTE
and
FetchOffset
might equal 55.
In standard SQL, it’s only possible to fetch one row at a time. With RDA – and
once again this can be explained by “efficiency” – it’s possible to ask for
multiple rows to be fetched at once. That’s what the FetchCount
parameter
is for.
RDAStatementCloseCursor¶
Message Content:
StatementIdent 32-bit integer sort of a handle
What the server is supposed to do:
Well, close the Cursor. This isn’t an end-of-transaction signal, but it does
tell the server that it can forget about a result set that it created with a
recent SELECT
statement.
The client could pass messages in this sequence: (a)
RDAStatementExecDirect
with a SELECT
statement so that the server
creates a result set and Cursor, (b) RDAStatementFetchRows
so that the
server will fetch row values and send them to the client and (c)
RDAStatementCloseCursor
so that the server will free (and possibly unlock)
resources that were allocated for Cursor maintenance.
RDAStatementCancel¶
Message Content:
StatementIdent 32-bit integer sort of a handle
What the server is supposed to do:
Abort any activity that is taking place for the statement identified by
StatementIdent
. Since statement execution is atomic, the statement won’t be
half-done – the result is as if the statement was never executed at all.
Why would the client send the statement in the first place, if it didn’t want to finish it? The usual answer is that the client can “change its mind”, perhaps because the execution is taking too long or because some new information has arrived (such as a Shutdown message from Windows). Or, consider this scenario:
There are two servers. Each server is responsible for a copy of the same database.
The client sends an
RDAExecDirect
message to both servers, with the same query.Inevitably, one of the servers will respond more quickly than the other. Perhaps it’s less busy, or perhaps its message packets are just moving by a shorter route.
Once the client receives the response from the fast server, it sends an
RDAStatementCancel
message to the slow one, meaning “oh forget it – I already have the data”.The Server’s Response:
RDAResponse
Message Content:
ServerAttributes <array> server's env/dbc/stmt attributes
Diagnostics <array> errors or warnings
ParameterDescriptor <array> IPD, for RDAStatementPrepare
RowDescriptor <array> metadata, for RDAStatementFetchRows
Rows <array> data, for RDAStatementFetchRows
The server sends one response message to the client for every request message
that it receives from the client. The contents of responses can vary quite a
bit, but usually you know what sort of responses to expect, by observing what
the original request was. For example, if the request was an
RDAStatementFetchRows
message, a reasonable hope is that the response will
contain data from the rows that got fetched.
A single response contains the entire set of results. This means that the
contents of a response are useful for one or many operations that get
performed by the client alone: get diagnostics
, get attributes
or get
data
. For example, the SQL/CLI function SQLGetDiagRec
does not involve a
network message containing RDAGetDiagRec
– there is no such thing. Instead,
when an application calls the SQLGetDiagRec
function, the DBMS looks in the
Diagnostics section of the last response that was received from the server.
This means that SQLGetDiagRec
can be called millions of times without
affecting the network or the server. And the same goes for the other get
functions. This also means that the client has to buffer all responses, but
that’s okay – local storage is cheap.
A Short Session¶
Let’s walk through a short client/server session which involves a single query. We show client action on the left, server action on the right.
CLIENT SERVER
... Client is receiving input from ... Server is idle, just waiting for
a user via a dialog box incoming messages
Client sends RDAConnect message -->
<-- Server sends response, diagnostic="OK"
Client sends -->
RDAStatementExecDirect,
"SELECT * FROM Employees;" message
<-- Server sends response, diagnostic="OK"
Client sends -->
RDAStatementFetchRows message
<-- Server sends response, diagnostic="OK"
and 2 rows: { ('A',1) ('B',2) }
Client sends -->
RDAStatementCloseCursor message
<-- Server sends response, diagnostic="OK"
Client sends -->
RDAEndTran message
<-- Server sends response, diagnostic="OK"
Client sends -->
RDADisconnect message
<-- Server sends response, diagnostic="OK"
The network session is over, but Server is idle again ...
the client can continue: "FETCH"
the first row, "FETCH" the second
row (from local buffers)...
Making it hard¶
In our initial tour of RDA, we skipped some non-essential points. We note them now so that you’ll at least be aware of the existence of anything that is potentially significant.
Clients can send multiple requests without waiting for responses. This makes for hairy programming, because when responses eventually do arrive, they might not be in the right order.
There might be multiple clients for one server, which leads to the various special considerations that we mentioned in our chapter on concurrency. Or, there might be several servers for one client.
The network can be heterogeneous, that is, you should be able to connect a brand
X
client to a brandY
server. In theory.There is an optional client-side feature: a standard defined CLI (the “RDA API”). This is simply a wrapper, so that you can call a function which sends a message, without worrying about the details of “how to send a message” over TCP/IP.
There are optional Schemas which contain Tables which contain information about available servers, logs of requests and so on. The Tables can be accessed in the same way that Views can be accessed in
INFORMATION_SCHEMA
.
What good is RDA?¶
In theory, RDA would be a relatively painless way to transfer information via
TCP/IP or other OSI protocols. Programmers could write their own applications
or utilities, such as Intranet servers. But most SQL programmers would prefer
it if RDA was silently included in the DBMS. They want to call SQLExecDirect
,
for instance. If the DBMS does the job by silently sending RDAExecDirect
messages to some server, fine.
So all a SQL programmer really needs to know is what you – we hope! – have just gleaned from this discussion of RDA. You should now be as competent as you need to be. Specifically:
You can appreciate database architecture now that you can visualize what “clients” and “servers” do.
You can write better SQL/CLI code since you know what some of the underlying effects are.
You can figure out whether direct use of RDA would help you around some specific network problem.
From those examples, it might seem that we’re talking about “low-level code” solutions – an analogue would be the way that people sometimes write assembler code snippets in their C programs. We’re big fans of that technique [[ footnote: see our book OPTIMIZING C WITH ASSEMBLY LANGUAGE, R&D Books ]]. But the analogy doesn’t really hold: writing for RDA is not inherently more arcane or low-level than writing any SQL/CLI code. That’s clear when you observe that the parameters are, in almost all cases, the same.