2010-02-28 12:19:02 +00:00
|
|
|
MongoDB Haskell Mini Tutorial
|
|
|
|
-----------------------------
|
|
|
|
|
2010-11-01 00:38:38 +00:00
|
|
|
__Updated:__ Oct 2010
|
2010-02-28 12:19:02 +00:00
|
|
|
|
|
|
|
This is a mini tutorial to get you up and going with the basics
|
2010-11-01 00:38:38 +00:00
|
|
|
of the Haskell mongoDB drivers. You will need the mongoDB driver
|
|
|
|
installed as well as mongo itself. Prompts used in this tutorial are:
|
2010-02-28 12:19:02 +00:00
|
|
|
|
2010-03-09 05:13:01 +00:00
|
|
|
$ = command line prompt
|
|
|
|
> = ghci repl prompt
|
2010-02-28 12:19:02 +00:00
|
|
|
|
|
|
|
|
|
|
|
Installing Haskell Bindings
|
|
|
|
---------------------------
|
|
|
|
|
2010-11-01 00:38:38 +00:00
|
|
|
From Hackage using cabal:
|
|
|
|
|
|
|
|
$ cabal install mongoDB
|
|
|
|
|
2010-02-28 12:19:02 +00:00
|
|
|
From Source:
|
2010-03-01 14:15:40 +00:00
|
|
|
|
2010-11-01 00:38:38 +00:00
|
|
|
$ git clone git://github.com/TonyGen/mongoDB-haskell.git mongoDB
|
2010-03-09 05:13:01 +00:00
|
|
|
$ cd mongoDB
|
|
|
|
$ runhaskell Setup.hs configure
|
|
|
|
$ runhaskell Setup.hs build
|
|
|
|
$ runhaskell Setup.hs install
|
2010-02-28 12:19:02 +00:00
|
|
|
|
|
|
|
Getting Ready
|
|
|
|
-------------
|
|
|
|
|
2010-11-01 00:38:38 +00:00
|
|
|
Start a MongoDB instance for us to play with in a separate terminal window:
|
2010-02-28 12:19:02 +00:00
|
|
|
|
2010-10-27 20:13:23 +00:00
|
|
|
$ mongod --dbpath <directory where Mongo will store the data>
|
2010-02-28 12:19:02 +00:00
|
|
|
|
|
|
|
Start up a haskell repl:
|
|
|
|
|
2010-03-09 05:13:01 +00:00
|
|
|
$ ghci
|
2010-02-28 12:19:02 +00:00
|
|
|
|
2010-11-01 00:38:38 +00:00
|
|
|
Import the MongoDB driver library, and set
|
2010-06-15 03:14:40 +00:00
|
|
|
OverloadedStrings so literal strings are converted to UTF-8 automatically.
|
2010-02-28 12:19:02 +00:00
|
|
|
|
2010-03-09 05:13:01 +00:00
|
|
|
> import Database.MongoDB
|
2010-06-15 03:14:40 +00:00
|
|
|
> :set -XOverloadedStrings
|
2010-02-28 12:19:02 +00:00
|
|
|
|
2010-03-01 14:15:40 +00:00
|
|
|
Making A Connection
|
2010-02-28 12:19:02 +00:00
|
|
|
-------------------
|
2010-11-01 00:38:38 +00:00
|
|
|
Create a connection pool for your mongo server, using the standard port (27017):
|
2010-02-28 12:19:02 +00:00
|
|
|
|
2010-11-01 00:38:38 +00:00
|
|
|
> pool <- newConnPool 1 $ host "127.0.0.1"
|
2010-02-28 12:19:02 +00:00
|
|
|
|
|
|
|
or for a non-standard port
|
|
|
|
|
2010-11-01 00:38:38 +00:00
|
|
|
> pool <- newConnPool 1 $ Host "127.0.0.1" (PortNumber 30000)
|
2010-03-01 14:27:59 +00:00
|
|
|
|
2010-11-01 00:38:38 +00:00
|
|
|
*newConnPool* takes the connection pool size and the host to connect to. It returns
|
|
|
|
a *ConnPool*, which is a potential pool of TCP connections. They are not created until first
|
|
|
|
access, so it is not possible to get a connection error here.
|
2010-03-01 14:27:59 +00:00
|
|
|
|
2010-11-01 00:38:38 +00:00
|
|
|
Note, plain IO code in this driver never raises an exception unless it invokes third party IO
|
2010-10-27 20:13:23 +00:00
|
|
|
code that does. Driver code that may throw an exception says so in its Monad type,
|
|
|
|
for example, *ErrorT IOError IO a*.
|
|
|
|
|
|
|
|
Access monad
|
2010-06-15 03:14:40 +00:00
|
|
|
-------------------
|
2010-02-28 12:19:02 +00:00
|
|
|
|
2010-11-01 00:38:38 +00:00
|
|
|
A query/update executes in an *Access* monad, which has access to a
|
|
|
|
*Pipe*, *WriteMode*, and read-mode (*MasterSlaveOk*), and may throw a *Failure*.
|
|
|
|
A Pipe is a single TCP connection.
|
2010-10-27 20:13:23 +00:00
|
|
|
|
|
|
|
To run an Access action (monad), supply WriteMode, MasterOrSlaveOk, Connection,
|
|
|
|
and action to *access*. For example, to get a list of all the database on the server:
|
|
|
|
|
|
|
|
> access safe Master conn allDatabases
|
2010-02-28 12:19:02 +00:00
|
|
|
|
2010-11-01 00:38:38 +00:00
|
|
|
*access* return either Left Failure or Right result. Failure means there was a connection failure
|
|
|
|
or a read or write exception like cursor expired or duplicate key insert.
|
|
|
|
|
2010-06-21 15:06:20 +00:00
|
|
|
Since we are working in ghci, which requires us to start from the
|
2010-10-27 20:13:23 +00:00
|
|
|
IO monad every time, we'll define a convenient *run* function that takes an
|
|
|
|
action and executes it against our "test" database on the server we
|
2010-11-01 00:38:38 +00:00
|
|
|
just connected to, with typical write and read mode:
|
2010-03-10 00:32:36 +00:00
|
|
|
|
2010-11-01 00:38:38 +00:00
|
|
|
> let run action = access safe Master pool $ use (Database "test") action
|
2010-03-10 00:32:36 +00:00
|
|
|
|
2010-10-27 20:13:23 +00:00
|
|
|
*use* adds a *Database* to the action context, so query/update operations know which
|
|
|
|
database to operate on.
|
2010-03-10 00:32:36 +00:00
|
|
|
|
2010-06-15 03:14:40 +00:00
|
|
|
Databases and Collections
|
|
|
|
-----------------------------
|
2010-03-01 14:15:40 +00:00
|
|
|
|
2010-10-27 20:13:23 +00:00
|
|
|
MongoDB can store multiple databases -- separate namespaces
|
2010-06-15 03:14:40 +00:00
|
|
|
under which collections reside.
|
2010-03-10 00:32:36 +00:00
|
|
|
|
2010-11-01 00:38:38 +00:00
|
|
|
As before, you can obtain the list of databases available on a connection:
|
2010-03-10 00:32:36 +00:00
|
|
|
|
2010-06-15 03:14:40 +00:00
|
|
|
> run allDatabases
|
2010-02-28 12:19:02 +00:00
|
|
|
|
2010-10-27 20:13:23 +00:00
|
|
|
The "test" database in context is ignored in this case because *allDatabases*
|
2010-06-15 03:14:40 +00:00
|
|
|
is not a query on a specific database but on the server as a whole.
|
2010-02-28 12:19:02 +00:00
|
|
|
|
2010-06-15 03:14:40 +00:00
|
|
|
Databases and collections do not need to be created, just start using
|
2010-11-01 00:38:38 +00:00
|
|
|
them and MongoDB will automatically create them for you. In the below examples
|
|
|
|
we'll be using the database "test" (captured in *run* above) and the colllection "posts".
|
2010-03-10 00:32:36 +00:00
|
|
|
|
2010-11-01 00:38:38 +00:00
|
|
|
You can obtain a list of all collections in the "test" database:
|
2010-03-10 00:32:36 +00:00
|
|
|
|
2010-06-15 03:14:40 +00:00
|
|
|
> run allCollections
|
2010-03-10 00:32:36 +00:00
|
|
|
|
2010-06-15 03:14:40 +00:00
|
|
|
Documents
|
|
|
|
---------
|
|
|
|
|
|
|
|
Data in MongoDB is represented (and stored) using JSON-style
|
2010-11-01 00:38:38 +00:00
|
|
|
documents, called BSON documents. A *Document" is simply a list of *Field*s,
|
|
|
|
where each field is a named value. A *Value" is a basic type like Bool, Int, Float, String, Time;
|
2010-06-15 03:14:40 +00:00
|
|
|
a special BSON value like Binary, Javascript, ObjectId; a (embedded)
|
|
|
|
Document; or a list of values. Here's an example document which could
|
|
|
|
represent a blog post:
|
|
|
|
|
|
|
|
> import Data.Time
|
|
|
|
> now <- getCurrentTime
|
2010-03-10 00:32:36 +00:00
|
|
|
> :{
|
2010-06-15 03:14:40 +00:00
|
|
|
let post = ["author" =: "Mike",
|
|
|
|
"text" =: "My first blog post!",
|
|
|
|
"tags" =: ["mongoDB", "Haskell"],
|
|
|
|
"date" =: now]
|
2010-03-10 00:32:36 +00:00
|
|
|
:}
|
2010-02-28 12:19:02 +00:00
|
|
|
|
|
|
|
Inserting a Document
|
|
|
|
-------------------
|
|
|
|
|
2010-03-10 22:39:58 +00:00
|
|
|
To insert a document into a collection we can use the *insert* function:
|
2010-02-28 12:19:02 +00:00
|
|
|
|
2010-06-15 03:14:40 +00:00
|
|
|
> run $ insert "posts" post
|
2010-03-10 22:39:58 +00:00
|
|
|
|
2010-06-15 03:14:40 +00:00
|
|
|
When a document is inserted a special field, *_id*, is automatically
|
|
|
|
added if the document doesn't already contain that field. The value
|
2010-03-14 03:51:05 +00:00
|
|
|
of *_id* must be unique across the collection. *insert* returns the
|
2010-03-10 22:39:58 +00:00
|
|
|
value of *_id* for the inserted document. For more information, see
|
|
|
|
the [documentation on _id](http://www.mongodb.org/display/DOCS/Object+IDs).
|
|
|
|
|
|
|
|
After inserting the first document, the posts collection has actually
|
|
|
|
been created on the server. We can verify this by listing all of the
|
|
|
|
collections in our database:
|
|
|
|
|
2010-06-15 03:14:40 +00:00
|
|
|
> run allCollections
|
2010-03-10 22:39:58 +00:00
|
|
|
|
2010-11-01 00:38:38 +00:00
|
|
|
Note, the system.indexes collection is a special internal collection
|
2010-03-10 22:39:58 +00:00
|
|
|
that was created automatically.
|
2010-02-28 12:19:02 +00:00
|
|
|
|
|
|
|
Getting a single document with findOne
|
|
|
|
-------------------------------------
|
|
|
|
|
2010-03-10 22:39:58 +00:00
|
|
|
The most basic type of query that can be performed in MongoDB is
|
|
|
|
*findOne*. This method returns a single document matching a query (or
|
|
|
|
*Nothing* if there are no matches). It is useful when you know there is
|
|
|
|
only one matching document, or are only interested in the first
|
|
|
|
match. Here we use *findOne* to get the first document from the posts
|
|
|
|
collection:
|
|
|
|
|
2010-06-21 15:06:20 +00:00
|
|
|
> run $ findOne (select [] "posts")
|
2010-03-10 22:39:58 +00:00
|
|
|
|
2010-06-15 03:14:40 +00:00
|
|
|
The result is a document matching the one that we inserted previously.
|
2010-11-01 00:38:38 +00:00
|
|
|
Note, the returned document contains the *_id* field, which was automatically
|
2010-03-10 22:39:58 +00:00
|
|
|
added on insert.
|
|
|
|
|
|
|
|
*findOne* also supports querying on specific elements that the
|
|
|
|
resulting document must match. To limit our results to a document with
|
|
|
|
author "Mike" we do:
|
|
|
|
|
2010-06-21 15:06:20 +00:00
|
|
|
> run $ findOne (select ["author" =: "Mike"] "posts")
|
2010-03-10 22:39:58 +00:00
|
|
|
|
|
|
|
If we try with a different author, like "Eliot", we'll get no result:
|
|
|
|
|
2010-06-21 15:06:20 +00:00
|
|
|
> run $ findOne (select ["author" =: "Eliot"] "posts")
|
2010-03-10 22:39:58 +00:00
|
|
|
|
|
|
|
Bulk Inserts
|
|
|
|
------------
|
|
|
|
|
|
|
|
In order to make querying a little more interesting, let's insert a
|
|
|
|
few more documents. In addition to inserting a single document, we can
|
2010-06-15 03:14:40 +00:00
|
|
|
also perform bulk insert operations, by using the *insertMany* function
|
|
|
|
which accepts a list of documents to be inserted. It send only a single
|
|
|
|
command to the server:
|
2010-03-10 22:39:58 +00:00
|
|
|
|
2010-06-15 03:14:40 +00:00
|
|
|
> now <- getCurrentTime
|
2010-03-10 22:39:58 +00:00
|
|
|
> :{
|
2010-06-15 03:14:40 +00:00
|
|
|
let post1 = ["author" =: "Mike",
|
|
|
|
"text" =: "Another post!",
|
|
|
|
"tags" =: ["bulk", "insert"],
|
|
|
|
"date" =: now]
|
2010-03-10 22:39:58 +00:00
|
|
|
:}
|
2010-06-15 03:14:40 +00:00
|
|
|
> :{
|
|
|
|
let post2 = ["author" =: "Eliot",
|
|
|
|
"title" =: "MongoDB is fun",
|
|
|
|
"text" =: "and pretty easy too!",
|
|
|
|
"date" =: now]
|
|
|
|
:}
|
|
|
|
> run $ insertMany "posts" [post1, post2]
|
2010-03-10 22:39:58 +00:00
|
|
|
|
2010-06-15 03:14:40 +00:00
|
|
|
* Note that post2 has a different shape than the other posts - there
|
|
|
|
is no "tags" field and we've added a new field, "title". This is what we
|
|
|
|
mean when we say that MongoDB is schema-free.
|
2010-02-28 12:19:02 +00:00
|
|
|
|
|
|
|
Querying for More Than One Document
|
|
|
|
------------------------------------
|
|
|
|
|
2010-03-10 22:39:58 +00:00
|
|
|
To get more than a single document as the result of a query we use the
|
2010-11-01 00:38:38 +00:00
|
|
|
*find* method. *find* returns a *Cursor*, which allows us to
|
2010-03-10 22:39:58 +00:00
|
|
|
iterate over all matching documents. There are several ways in which
|
2010-06-15 03:14:40 +00:00
|
|
|
we can iterate: we can call *next* to get documents one at a time
|
|
|
|
or we can get all the results by applying the cursor to *rest*:
|
2010-02-28 12:19:02 +00:00
|
|
|
|
2010-06-21 15:06:20 +00:00
|
|
|
> Right cursor <- run $ find (select ["author" =: "Mike"] "posts")
|
2010-06-15 03:14:40 +00:00
|
|
|
> run $ rest cursor
|
2010-03-01 14:28:38 +00:00
|
|
|
|
2010-03-10 22:39:58 +00:00
|
|
|
Of course you can use bind (*>>=*) to combine these into one line:
|
2010-02-28 12:19:02 +00:00
|
|
|
|
2010-06-21 15:06:20 +00:00
|
|
|
> run $ find (select ["author" =: "Mike"] "posts") >>= rest
|
2010-02-28 12:19:02 +00:00
|
|
|
|
2010-11-01 00:38:38 +00:00
|
|
|
Note, *next* automatically closes the cursor when the last
|
2010-06-15 03:14:40 +00:00
|
|
|
document has been read out of it. Similarly, *rest* automatically
|
|
|
|
closes the cursor after returning all the results.
|
2010-02-28 12:19:02 +00:00
|
|
|
|
|
|
|
Counting
|
|
|
|
--------
|
|
|
|
|
2010-03-09 05:13:01 +00:00
|
|
|
We can count how many documents are in an entire collection:
|
2010-02-28 12:19:02 +00:00
|
|
|
|
2010-06-21 15:06:20 +00:00
|
|
|
> run $ count (select [] "posts")
|
2010-02-28 12:19:02 +00:00
|
|
|
|
2010-06-15 03:14:40 +00:00
|
|
|
Or count how many documents match a query:
|
2010-02-28 12:19:02 +00:00
|
|
|
|
2010-06-21 15:06:20 +00:00
|
|
|
> run $ count (select ["author" =: "Mike"] "posts")
|
2010-02-28 12:19:02 +00:00
|
|
|
|
2010-11-01 00:38:38 +00:00
|
|
|
Advanced Queries
|
2010-02-28 12:19:02 +00:00
|
|
|
-------------
|
|
|
|
|
2010-06-15 03:14:40 +00:00
|
|
|
To do
|
2010-02-28 12:19:02 +00:00
|
|
|
|
|
|
|
Indexing
|
|
|
|
--------
|
|
|
|
|
2010-06-15 03:14:40 +00:00
|
|
|
To do
|