From b3a4ceba3bd803a5eaedb82e6bea808fa092823e Mon Sep 17 00:00:00 2001 From: Tony Hannan Date: Wed, 13 Jul 2011 15:35:44 -0400 Subject: [PATCH] design doc --- doc/Article1.md | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) create mode 100644 doc/Article1.md diff --git a/doc/Article1.md b/doc/Article1.md new file mode 100644 index 0000000..993bc45 --- /dev/null +++ b/doc/Article1.md @@ -0,0 +1,26 @@ +# Design of the Haskell driver for MongoDB +#### by Tony Hannan, July 2011 + +The [Haskell driver](http://hackage.haskell.org/package/mongoDB) is a production-quality MongoDB driver for the [Haskell language](http://www.haskell.org/). This article highlights the design of the driver. For detailed documentation with coding examples see the driver package and homepage (follow driver link above). + +### BSON + +BSON is a binary format for documents used by MongoDB but defined independently at [bsonspec.org](http://bsonspec.org). Each language has its own representation for documents but they all serialize to this format. In the Haskell [bson package](http://hackage.haskell.org/package/bson), I chose to represent a *Document* as a list of *Field*s, where each Field has a *Label* and a *Value*. This is isomorphic to an association list, but I chose a custom pair (Field) over the standard pair to make printing of documents nicer. + +A *Value* is one of several basic types, which includes primitive types like Bool, Int32, Double, and UString (UTF-8 string); a handful of special BSON types like Javascript and ObjectId; and two compound types: list of Values (BSON array) and Document itself (embedded documents). To easily wrap/unwrap basic types to/from the Value sum type, I defined a type class called *Val* with *val* (wrap) and *cast* (unwrap) methods. Every basic value type is an instance of this class, so you can simply say `val "hello"` or `val 42` to get a Value, and `cast aValue :: Maybe Bool` to extract the Bool, or Nothing if it is not a Bool. The function *(=:)* constructs a Field but converts the second arg using *val* so you can construct fields directly from basic values, as in `["name" =: "Tony", "score" =: 42]`. Types that are not technically a BSON basic type but compatible with one of them are also instance of Val so they can be used as if they were. Integer, Float, and String are examples of this, their Val instance converts to/from their compatible BSON basic type. + +*UString* is a type synonym for CompactString which is a UTF-8 encoded string from the [compact-string package](http://hackage.haskell.org/package/compact-string-fix). I chose this package over the [text package](http://hackage.haskell.org/package/text) because its native format is UTF-8 while text's native format is UTF-16 and thus would spend more time serializing to BSON which requires UTF-8. If and when the text package changes its native format to UTF-8 I will switch to it. In the meantime, you can make *Text* an instance of Val to automatically convert it to/from UString. + +UString is an instance of *IsString* so literal strings can be interpreted as UStrings. Use the Language extension [*OverloadedStrings*](http://www.haskell.org/ghc/docs/7.0.4/html/users_guide/type-class-extensions.html#Overloaded+string+literals) to enable this. If you don't use this extension, use the *u* function to convert a String to a UString. Field labels are UStrings. + +You may want to define fields ahead of time to help catch typos. For example, you can define `name = ("name" =:) :: UString -> Field` and `score = ("score" =:) :: Int -> Field`, and then construct a document as `[name "Tony", score 42]`. This will ensure your fields have the correct label and type, and is more succinct. + +### Pipelining + +To increase concurrency on a server connection and thus speed up threads sharing it, I pipeline requests over a connection, a' la [HTTP pipelining](http://en.wikipedia.org/wiki/HTTP_pipelining). Pipelining means sending multiple requests over the socket and receiving the responses later in the same order. This is faster than sending one request, waiting for the response, then sending the next request, and so on. The pipelining implementation uses [futures/promises](http://en.wikipedia.org/wiki/Futures_and_promises), which are simply implemented as an IO actions. You are not exposed to the pipelining, because it is internal to *Cursor*, which iterates over the results of a query. More specifically, a query returns its cursor right away, locking the socket only briefly to write the request (allowing other threads to issue their queries). When a cursor is asked for its first result, it waits for the query response from the server. Also, when a cursor returns the last result of the current batch, it asynchronously requests the next batch from the server. This asynchronicity is automatic because the request returns a promise right away that the cursor will wait on when asked for the next result. + +### DB Action + +Every database read and write operation requires a connection to access, a database to effect, and an access mode to use. Furthermore, every operation may fail because of a connection failure or invalid operation. This context and failure is captured in a Reader and Error monad stacked on top of IO, called the *Action* monad. To access MongoDB, you sequence together several operations/actions that together accomplish a high-level task, and execute that task against a connection, database, and access mode. The execution will return *Left* if an operation failed or *Right* if all operations succeeded. Access mode indicates if stale reads (from slaves) are OK and if writes should be ensured and how. + +You may notice that a DB action/task is analogous to a DB transaction in that the action aborts when one of its operations fails. However, for scalability reasons, MongoDB does not support ACID across multiple read/write operations, so the operations before the failed operation remain in effect. Your failure handler must be prepared to recover from this intermediate state. If your DB action is conceptually a single high-level task, then it should not be too hard to undo and redo that task even from an intermediate state.