2010-02-28 12:19:02 +00:00
MongoDB Haskell Mini Tutorial
-----------------------------
__Author:__ Brian Gianforcaro (b.gianfo@gmail.com)
__Updated:__ 2/28/2010
This is a mini tutorial to get you up and going with the basics
2010-03-09 05:13:01 +00:00
of the Haskell mongoDB drivers. It is modeled after the
[pymongo tutorial ](http://api.mongodb.org/python/1.4%2B/tutorial.html ).
2010-02-28 12:19:02 +00:00
You will need the mongoDB bindings installed as well as mongo itself installed.
2010-03-09 05:13:01 +00:00
$ = command line prompt
> = ghci repl prompt
2010-02-28 12:19:02 +00:00
Installing Haskell Bindings
---------------------------
From Source:
2010-03-01 14:15:40 +00:00
2010-03-09 05:13:01 +00:00
$ git clone git://github.com/srp/mongoDB.git
$ cd mongoDB
$ runhaskell Setup.hs configure
$ runhaskell Setup.hs build
$ runhaskell Setup.hs install
2010-02-28 12:19:02 +00:00
From Hackage using cabal:
2010-03-09 05:13:01 +00:00
$ cabal install mongoDB
2010-02-28 12:19:02 +00:00
Getting Ready
-------------
Start a MongoDB instance for us to play with:
2010-03-09 05:13:01 +00:00
$ mongod
2010-02-28 12:19:02 +00:00
Start up a haskell repl:
2010-03-09 05:13:01 +00:00
$ ghci
2010-02-28 12:19:02 +00:00
2010-06-15 03:14:40 +00:00
Now we'll need to bring in the MongoDB/BSON bindings and set
OverloadedStrings so literal strings are converted to UTF-8 automatically.
2010-02-28 12:19:02 +00:00
2010-03-09 05:13:01 +00:00
> import Database.MongoDB
2010-06-15 03:14:40 +00:00
> :set -XOverloadedStrings
2010-02-28 12:19:02 +00:00
2010-03-01 14:15:40 +00:00
Making A Connection
2010-02-28 12:19:02 +00:00
-------------------
Open up a connection to your DB instance, using the standard port:
2010-06-15 03:14:40 +00:00
> Right con <- connect $ server "127.0.0.1"
2010-02-28 12:19:02 +00:00
or for a non-standard port
2010-07-03 17:15:30 +00:00
> Right con <- connect $ Server "127.0.0.1" (PortNumber 666)
2010-03-01 14:27:59 +00:00
2010-06-15 03:14:40 +00:00
*connect* returns Left IOError if connection fails. We are assuming above
it won't fail. If it does you will get a pattern match error.
2010-03-01 14:27:59 +00:00
2010-07-03 17:15:30 +00:00
Connected monad
2010-06-15 03:14:40 +00:00
-------------------
2010-02-28 12:19:02 +00:00
2010-06-21 15:06:20 +00:00
The current connection is held in a Connected monad, and the current database
is held in a Reader monad on top of that. To run a connected monad, supply
it and a connection to *runConn* . To access a database within a connected
monad, call *useDb* .
2010-02-28 12:19:02 +00:00
2010-06-21 15:06:20 +00:00
Since we are working in ghci, which requires us to start from the
IO monad every time, we'll define a convenient *run* function that takes a
2010-06-15 03:14:40 +00:00
db-action and executes it against our "test" database on the server we
just connected to:
2010-03-10 00:32:36 +00:00
2010-06-21 15:06:20 +00:00
> let run act = runConn (useDb "test" act) con
2010-03-10 00:32:36 +00:00
2010-06-21 15:06:20 +00:00
*run* (*runConn*) will return either Left Failure or Right result. Failure
2010-06-15 03:14:40 +00:00
means the connection failed (eg. network problem) or the server failed
(eg. disk full).
2010-03-10 00:32:36 +00:00
2010-06-15 03:14:40 +00:00
Databases and Collections
-----------------------------
2010-03-01 14:15:40 +00:00
2010-06-21 15:06:20 +00:00
A MongoDB can store multiple databases -- separate namespaces
2010-06-15 03:14:40 +00:00
under which collections reside.
2010-03-10 00:32:36 +00:00
2010-06-15 03:14:40 +00:00
You can obtain the list of databases available on a connection:
2010-03-10 00:32:36 +00:00
2010-06-21 15:06:20 +00:00
> runConn allDatabases con
2010-02-28 12:19:02 +00:00
2010-06-15 03:14:40 +00:00
You can also use the *run* function we just created:
2010-02-28 12:19:02 +00:00
2010-06-15 03:14:40 +00:00
> run allDatabases
2010-02-28 12:19:02 +00:00
2010-06-15 03:14:40 +00:00
The "test" database is ignored in this case because *allDatabases*
is not a query on a specific database but on the server as a whole.
2010-02-28 12:19:02 +00:00
2010-06-15 03:14:40 +00:00
Databases and collections do not need to be created, just start using
them and MongoDB will automatically create them for you.
2010-03-10 00:32:36 +00:00
2010-06-15 03:14:40 +00:00
In the below examples we'll be using the database "test" (captured in *run*
above) and the colllection "posts":
2010-03-10 00:32:36 +00:00
2010-06-15 03:14:40 +00:00
You can obtain a list of collections available in the "test" database:
2010-03-10 00:32:36 +00:00
2010-06-15 03:14:40 +00:00
> run allCollections
2010-03-10 00:32:36 +00:00
2010-06-15 03:14:40 +00:00
Documents
---------
Data in MongoDB is represented (and stored) using JSON-style
documents. In mongoDB we use the BSON *Document* type to represent
these documents. A document is simply a list of *Field*s, where each field is
a named value. A value is a basic type like Bool, Int, Float, String, Time;
a special BSON value like Binary, Javascript, ObjectId; a (embedded)
Document; or a list of values. Here's an example document which could
represent a blog post:
> import Data.Time
> now <- getCurrentTime
2010-03-10 00:32:36 +00:00
> :{
2010-06-15 03:14:40 +00:00
let post = ["author" =: "Mike",
"text" =: "My first blog post!",
"tags" =: ["mongoDB", "Haskell"],
"date" =: now]
2010-03-10 00:32:36 +00:00
:}
2010-02-28 12:19:02 +00:00
Inserting a Document
-------------------
2010-03-10 22:39:58 +00:00
To insert a document into a collection we can use the *insert* function:
2010-02-28 12:19:02 +00:00
2010-06-15 03:14:40 +00:00
> run $ insert "posts" post
Right (Oid 4c16d355 c80c560858000000)
2010-03-10 22:39:58 +00:00
2010-06-15 03:14:40 +00:00
When a document is inserted a special field, *_id* , is automatically
added if the document doesn't already contain that field. The value
2010-03-14 03:51:05 +00:00
of *_id* must be unique across the collection. *insert* returns the
2010-03-10 22:39:58 +00:00
value of *_id* for the inserted document. For more information, see
the [documentation on _id ](http://www.mongodb.org/display/DOCS/Object+IDs ).
After inserting the first document, the posts collection has actually
been created on the server. We can verify this by listing all of the
collections in our database:
2010-06-15 03:14:40 +00:00
> run allCollections
2010-03-10 22:39:58 +00:00
* Note The system.indexes collection is a special internal collection
that was created automatically.
2010-02-28 12:19:02 +00:00
Getting a single document with findOne
-------------------------------------
2010-03-10 22:39:58 +00:00
The most basic type of query that can be performed in MongoDB is
*findOne*. This method returns a single document matching a query (or
*Nothing* if there are no matches). It is useful when you know there is
only one matching document, or are only interested in the first
match. Here we use *findOne* to get the first document from the posts
collection:
2010-06-21 15:06:20 +00:00
> run $ findOne (select [] "posts")
2010-06-15 03:14:40 +00:00
Right (Just [ _id: Oid 4c16d355 c80c560858000000, author: "Mike", text: "My first blog post!", tags: ["mongoDB","Haskell"], date: 2010-06-15 01:09:28.364 UTC])
2010-03-10 22:39:58 +00:00
2010-06-15 03:14:40 +00:00
The result is a document matching the one that we inserted previously.
2010-03-10 22:39:58 +00:00
* Note: The returned document contains an *_id* , which was automatically
added on insert.
*findOne* also supports querying on specific elements that the
resulting document must match. To limit our results to a document with
author "Mike" we do:
2010-06-21 15:06:20 +00:00
> run $ findOne (select ["author" =: "Mike"] "posts")
2010-06-15 03:14:40 +00:00
Right (Just [ _id: Oid 4c16d355 c80c560858000000, author: "Mike", text: "My first blog post!", tags: ["mongoDB","Haskell"], date: 2010-06-15 01:09:28.364 UTC])
2010-03-10 22:39:58 +00:00
If we try with a different author, like "Eliot", we'll get no result:
2010-06-21 15:06:20 +00:00
> run $ findOne (select ["author" =: "Eliot"] "posts")
2010-06-15 03:14:40 +00:00
Right Nothing
2010-03-10 22:39:58 +00:00
Bulk Inserts
------------
In order to make querying a little more interesting, let's insert a
few more documents. In addition to inserting a single document, we can
2010-06-15 03:14:40 +00:00
also perform bulk insert operations, by using the *insertMany* function
which accepts a list of documents to be inserted. It send only a single
command to the server:
2010-03-10 22:39:58 +00:00
2010-06-15 03:14:40 +00:00
> now <- getCurrentTime
2010-03-10 22:39:58 +00:00
> :{
2010-06-15 03:14:40 +00:00
let post1 = ["author" =: "Mike",
"text" =: "Another post!",
"tags" =: ["bulk", "insert"],
"date" =: now]
2010-03-10 22:39:58 +00:00
:}
2010-06-15 03:14:40 +00:00
> :{
let post2 = ["author" =: "Eliot",
"title" =: "MongoDB is fun",
"text" =: "and pretty easy too!",
"date" =: now]
:}
> run $ insertMany "posts" [post1, post2]
Right [Oid 4c16d67e c80c560858000001,Oid 4c16d67e c80c560858000002]
2010-03-10 22:39:58 +00:00
2010-06-15 03:14:40 +00:00
* Note that post2 has a different shape than the other posts - there
is no "tags" field and we've added a new field, "title". This is what we
mean when we say that MongoDB is schema-free.
2010-02-28 12:19:02 +00:00
Querying for More Than One Document
------------------------------------
2010-03-10 22:39:58 +00:00
To get more than a single document as the result of a query we use the
*find* method. *find* returns a cursor instance, which allows us to
iterate over all matching documents. There are several ways in which
2010-06-15 03:14:40 +00:00
we can iterate: we can call *next* to get documents one at a time
or we can get all the results by applying the cursor to *rest* :
2010-02-28 12:19:02 +00:00
2010-06-21 15:06:20 +00:00
> Right cursor <- run $ find (select ["author" =: "Mike"] "posts")
2010-06-15 03:14:40 +00:00
> run $ rest cursor
2010-03-01 14:28:38 +00:00
2010-03-10 22:39:58 +00:00
Of course you can use bind (*>>=*) to combine these into one line:
2010-02-28 12:19:02 +00:00
2010-06-21 15:06:20 +00:00
> run $ find (select ["author" =: "Mike"] "posts") >>= rest
2010-02-28 12:19:02 +00:00
2010-06-15 03:14:40 +00:00
* Note: *next* automatically closes the cursor when the last
document has been read out of it. Similarly, *rest* automatically
closes the cursor after returning all the results.
2010-02-28 12:19:02 +00:00
Counting
--------
2010-03-09 05:13:01 +00:00
We can count how many documents are in an entire collection:
2010-02-28 12:19:02 +00:00
2010-06-21 15:06:20 +00:00
> run $ count (select [] "posts")
2010-02-28 12:19:02 +00:00
2010-06-15 03:14:40 +00:00
Or count how many documents match a query:
2010-02-28 12:19:02 +00:00
2010-06-21 15:06:20 +00:00
> run $ count (select ["author" =: "Mike"] "posts")
2010-02-28 12:19:02 +00:00
Range Queries
-------------
2010-06-15 03:14:40 +00:00
To do
2010-02-28 12:19:02 +00:00
Indexing
--------
2010-06-15 03:14:40 +00:00
To do