mongodb/tutorial.md

251 lines
7.9 KiB
Markdown
Raw Normal View History

2010-02-28 12:19:02 +00:00
MongoDB Haskell Mini Tutorial
-----------------------------
__Author:__ Brian Gianforcaro (b.gianfo@gmail.com)
__Updated:__ 2/28/2010
This is a mini tutorial to get you up and going with the basics
2010-03-09 05:13:01 +00:00
of the Haskell mongoDB drivers. It is modeled after the
[pymongo tutorial](http://api.mongodb.org/python/1.4%2B/tutorial.html).
2010-02-28 12:19:02 +00:00
You will need the mongoDB bindings installed as well as mongo itself installed.
2010-03-09 05:13:01 +00:00
$ = command line prompt
> = ghci repl prompt
2010-02-28 12:19:02 +00:00
Installing Haskell Bindings
---------------------------
From Source:
2010-03-01 14:15:40 +00:00
2010-03-09 05:13:01 +00:00
$ git clone git://github.com/srp/mongoDB.git
$ cd mongoDB
$ runhaskell Setup.hs configure
$ runhaskell Setup.hs build
$ runhaskell Setup.hs install
2010-02-28 12:19:02 +00:00
From Hackage using cabal:
2010-03-09 05:13:01 +00:00
$ cabal install mongoDB
2010-02-28 12:19:02 +00:00
Getting Ready
-------------
Start a MongoDB instance for us to play with:
2010-03-09 05:13:01 +00:00
$ mongod
2010-02-28 12:19:02 +00:00
Start up a haskell repl:
2010-03-09 05:13:01 +00:00
$ ghci
2010-02-28 12:19:02 +00:00
Now we'll need to bring in the MongoDB/BSON bindings and set
OverloadedStrings so literal strings are converted to UTF-8 automatically.
2010-02-28 12:19:02 +00:00
2010-03-09 05:13:01 +00:00
> import Database.MongoDB
> :set -XOverloadedStrings
2010-02-28 12:19:02 +00:00
2010-03-01 14:15:40 +00:00
Making A Connection
2010-02-28 12:19:02 +00:00
-------------------
Open up a connection to your DB instance, using the standard port:
> Right con <- connect $ server "127.0.0.1"
2010-02-28 12:19:02 +00:00
or for a non-standard port
> Right con <- connect $ server "127.0.0.1" (PortNumber 666)
*connect* returns Left IOError if connection fails. We are assuming above
it won't fail. If it does you will get a pattern match error.
Task and Db monad
-------------------
2010-02-28 12:19:02 +00:00
The current connection is held in a Connected monad, and the current database
is held in a Reader monad on top of that. To run a connected monad, supply
it and a connection to *runConn*. To access a database within a connected
monad, call *useDb*.
2010-02-28 12:19:02 +00:00
Since we are working in ghci, which requires us to start from the
IO monad every time, we'll define a convenient *run* function that takes a
db-action and executes it against our "test" database on the server we
just connected to:
> let run act = runConn (useDb "test" act) con
*run* (*runConn*) will return either Left Failure or Right result. Failure
means the connection failed (eg. network problem) or the server failed
(eg. disk full).
Databases and Collections
-----------------------------
2010-03-01 14:15:40 +00:00
A MongoDB can store multiple databases -- separate namespaces
under which collections reside.
You can obtain the list of databases available on a connection:
> runConn allDatabases con
2010-02-28 12:19:02 +00:00
You can also use the *run* function we just created:
2010-02-28 12:19:02 +00:00
> run allDatabases
2010-02-28 12:19:02 +00:00
The "test" database is ignored in this case because *allDatabases*
is not a query on a specific database but on the server as a whole.
2010-02-28 12:19:02 +00:00
Databases and collections do not need to be created, just start using
them and MongoDB will automatically create them for you.
In the below examples we'll be using the database "test" (captured in *run*
above) and the colllection "posts":
You can obtain a list of collections available in the "test" database:
> run allCollections
Documents
---------
Data in MongoDB is represented (and stored) using JSON-style
documents. In mongoDB we use the BSON *Document* type to represent
these documents. A document is simply a list of *Field*s, where each field is
a named value. A value is a basic type like Bool, Int, Float, String, Time;
a special BSON value like Binary, Javascript, ObjectId; a (embedded)
Document; or a list of values. Here's an example document which could
represent a blog post:
> import Data.Time
> now <- getCurrentTime
> :{
let post = ["author" =: "Mike",
"text" =: "My first blog post!",
"tags" =: ["mongoDB", "Haskell"],
"date" =: now]
:}
2010-02-28 12:19:02 +00:00
Inserting a Document
-------------------
To insert a document into a collection we can use the *insert* function:
2010-02-28 12:19:02 +00:00
> run $ insert "posts" post
Right (Oid 4c16d355 c80c560858000000)
When a document is inserted a special field, *_id*, is automatically
added if the document doesn't already contain that field. The value
of *_id* must be unique across the collection. *insert* returns the
value of *_id* for the inserted document. For more information, see
the [documentation on _id](http://www.mongodb.org/display/DOCS/Object+IDs).
After inserting the first document, the posts collection has actually
been created on the server. We can verify this by listing all of the
collections in our database:
> run allCollections
* Note The system.indexes collection is a special internal collection
that was created automatically.
2010-02-28 12:19:02 +00:00
Getting a single document with findOne
-------------------------------------
The most basic type of query that can be performed in MongoDB is
*findOne*. This method returns a single document matching a query (or
*Nothing* if there are no matches). It is useful when you know there is
only one matching document, or are only interested in the first
match. Here we use *findOne* to get the first document from the posts
collection:
> run $ findOne (select [] "posts")
Right (Just [ _id: Oid 4c16d355 c80c560858000000, author: "Mike", text: "My first blog post!", tags: ["mongoDB","Haskell"], date: 2010-06-15 01:09:28.364 UTC])
The result is a document matching the one that we inserted previously.
* Note: The returned document contains an *_id*, which was automatically
added on insert.
*findOne* also supports querying on specific elements that the
resulting document must match. To limit our results to a document with
author "Mike" we do:
> run $ findOne (select ["author" =: "Mike"] "posts")
Right (Just [ _id: Oid 4c16d355 c80c560858000000, author: "Mike", text: "My first blog post!", tags: ["mongoDB","Haskell"], date: 2010-06-15 01:09:28.364 UTC])
If we try with a different author, like "Eliot", we'll get no result:
> run $ findOne (select ["author" =: "Eliot"] "posts")
Right Nothing
Bulk Inserts
------------
In order to make querying a little more interesting, let's insert a
few more documents. In addition to inserting a single document, we can
also perform bulk insert operations, by using the *insertMany* function
which accepts a list of documents to be inserted. It send only a single
command to the server:
> now <- getCurrentTime
> :{
let post1 = ["author" =: "Mike",
"text" =: "Another post!",
"tags" =: ["bulk", "insert"],
"date" =: now]
:}
> :{
let post2 = ["author" =: "Eliot",
"title" =: "MongoDB is fun",
"text" =: "and pretty easy too!",
"date" =: now]
:}
> run $ insertMany "posts" [post1, post2]
Right [Oid 4c16d67e c80c560858000001,Oid 4c16d67e c80c560858000002]
* Note that post2 has a different shape than the other posts - there
is no "tags" field and we've added a new field, "title". This is what we
mean when we say that MongoDB is schema-free.
2010-02-28 12:19:02 +00:00
Querying for More Than One Document
------------------------------------
To get more than a single document as the result of a query we use the
*find* method. *find* returns a cursor instance, which allows us to
iterate over all matching documents. There are several ways in which
we can iterate: we can call *next* to get documents one at a time
or we can get all the results by applying the cursor to *rest*:
2010-02-28 12:19:02 +00:00
> Right cursor <- run $ find (select ["author" =: "Mike"] "posts")
> run $ rest cursor
Of course you can use bind (*>>=*) to combine these into one line:
2010-02-28 12:19:02 +00:00
> run $ find (select ["author" =: "Mike"] "posts") >>= rest
2010-02-28 12:19:02 +00:00
* Note: *next* automatically closes the cursor when the last
document has been read out of it. Similarly, *rest* automatically
closes the cursor after returning all the results.
2010-02-28 12:19:02 +00:00
Counting
--------
2010-03-09 05:13:01 +00:00
We can count how many documents are in an entire collection:
2010-02-28 12:19:02 +00:00
> run $ count (select [] "posts")
2010-02-28 12:19:02 +00:00
Or count how many documents match a query:
2010-02-28 12:19:02 +00:00
> run $ count (select ["author" =: "Mike"] "posts")
2010-02-28 12:19:02 +00:00
Range Queries
-------------
To do
2010-02-28 12:19:02 +00:00
Indexing
--------
To do