Change comments to reflect new knowledge that a cursor persists across connections, and map/reduce temp output is accessible from all connections as long as original remains alive

2010-06-15 16:15:37 -04:00 · 2010-06-15 16:15:37 -04:00 · d0ddc814a9
commit d0ddc814a9
parent 3e4065cd97
4 changed files with 9 additions and 7 deletions
--- a/Database/MongoDB/Connection.hs
+++ b/Database/MongoDB/Connection.hs
@ -1,4 +1,4 @@
-{- | A replica set is a set of servers that mirror each other (a non-replicated server can act like a replica set of one). One server in a replica set is the master and the rest are slaves. When the master goes down, one of the slaves becomes master. The ReplicaSet object in this client maintains a list of servers that it currently knows are in the set. It refreshes this list every time it establishes a new connection with one of the servers in the set. Each server in the set knows who the other member in the set are, and who is master. The user asks the ReplicaSet object for a new master or slave connection. When a connection fails, the user must ask the ReplicaSet for a new connection (which most likely will connect to another server since the previous one failed). When you loose a connection you loose all session state that was stored with that connection on the server, which includes open cursors and temporary map-reduce output collections. Attempting to read from a lost cursor (on a new connection) will only returning the remaining documents in the last batch returned to this client. It will not fetch the remaining documents from the server. Likewise, attempting to read a lost map-reduce output will return an empty set of documents. Notice, in both cases, no error is raised, just empty results. -}
+{- | A replica set is a set of servers that mirror each other (a non-replicated server can act like a replica set of one). One server in a replica set is the master and the rest are slaves. When the master goes down, one of the slaves becomes master. The ReplicaSet object in this client maintains a list of servers that it currently knows are in the set. It refreshes this list every time it establishes a new connection with one of the servers in the set. Each server in the set knows who the other member in the set are, and who is master. The user asks the ReplicaSet object for a new master or slave connection. When a connection fails, the user must ask the ReplicaSet for a new connection (which most likely will connect to another server since the previous one failed). When connecting to a new server you loose all session state that was stored with the old server, which includes open cursors and temporary map-reduce output collections. Attempting to read from a lost cursor on a new server will raise a ServerFailure exception. Attempting to read a lost map-reduce temp output on a new server will return an empty set (not an error, like it maybe should). -}

 {-# LANGUAGE OverloadedStrings, ScopedTypeVariables #-}

--- a/Database/MongoDB/Internal/Protocol.hs
+++ b/Database/MongoDB/Internal/Protocol.hs
@ -161,7 +161,7 @@ data Query = Query {
 data QueryOption =
 	TailableCursor |
 	SlaveOK |
-	NoCursorTimeout
+	NoCursorTimeout  -- Never timeout the cursor. When not set, the cursor will die if idle for more than 10 minutes.
 	deriving (Show, Eq)

 data GetMore = GetMore {
--- a/Database/MongoDB/Query.hs
+++ b/Database/MongoDB/Query.hs
@ -269,7 +269,7 @@ distinct k (Select sel col) = at "values" <$> runCommand ["distinct" =: col, "ke
 -- *** Cursor

 data Cursor = Cursor FullCollection BatchSize (MVar CursorState)
-- ^ Iterator over results of a query. Use 'next' to iterate. Cursor remains open during current connection and is closed when connection is closed, cursor is closed, or cursor is garbage collected.
+-- ^ Iterator over results of a query. Use 'next' to iterate or 'rest' to get all results. A cursor is closed when it is explicitly closed, all results have been read from it, garbage collected, or not used for over 10 minutes (unless 'NoCursorTimeout' option was specified in 'Query'). Reading from a closed cursor raises a ServerFailure exception. Note, a cursor is not closed when the connection is closed, so you can open another connection to the same server and continue using the cursor.

 data CursorState = CS Limit CursorId [Document]
 -- ^ CursorId = 0 means cursor is finished. Documents is remaining documents to serve in current batch. Limit is remaining limit for next fetch.
@ -293,7 +293,8 @@ newCursor db col batch cs = do
 	return (Cursor (db <.> col) batch var)

 next :: (Conn m) => Cursor -> m (Maybe Document)
-- ^ Return next document in query result, or Nothing if finished
+-- ^ Return next document in query result, or Nothing if finished.
+-- This can run inside or outside a 'Db' monad (a 'useDb' block), since @Conn m => ReaderT r m@ is an instance of the 'Conn' type class, along with @Task@ and @Op@
 next (Cursor fcol batch var) = runOp . exposeIO $ \h -> modifyMVar var $ \cs ->
 	-- Get lock on connection (runOp) first then get lock on cursor, otherwise you could get in deadlock if already inside an Op (connection locked), but another Task gets lock on cursor first and then tries runOp (deadlock).
 	either ((cs,) . Left) (fmap Right) <$> hideIO (nextState cs) h
@ -361,8 +362,8 @@ data MapReduce = MapReduce {
 	rSelect :: Selector,  -- ^ Default is []
 	rSort :: Order,  -- ^ Default is [] meaning no sort
 	rLimit :: Limit,  -- ^ Default is 0 meaning no limit
-	rOut :: Maybe Collection,  -- ^ Output to permanent collection. Default is Nothing.
-	rKeepTemp :: Bool,  -- ^ If True, the generated collection is made permanent. If False, the generated collection persists for the life of the current connection only. Default is False. When out is specified, the collection is automatically made permanent.
+	rOut :: Maybe Collection,  -- ^ Output to given permanent collection, otherwise output to a new temporary collection whose name is returned.
+	rKeepTemp :: Bool,  -- ^ If True, the temporary output collection is made permanent. If False, the temporary output collection persists for the life of the current connection only, however, other connections may read from it while the original one is still alive. Note, reading from a temporary collection after its original connection dies returns an empty result (not an error). The default for this attribute is False, unless 'rOut' is specified, then the collection permanent.
 	rFinalize :: Maybe FinalizeFun,  -- ^ Function to apply to all the results when finished. Default is Nothing.
 	rScope :: Document,  -- ^ Variables (environment) that can be accessed from map/reduce/finalize. Default is [].
 	rVerbose :: Bool  -- ^ Provide statistics on job execution time. Default is False.
--- a/3
+++ b/3
@ -43,7 +43,7 @@ MongoDB
  optional:
  - automatic reconnection
  - buffer pooling
-  - connection pooling. Although may not be desired because each connection maintains seperate session state (open cursors and temp map/reduce collections) and switching between connections automatically would change session state without the user knowing.
+  - connection pooling. Unsafe to shrink pool and close connections because map/reduce temp tables that were created on the connection will get deleted. Note, other connections can access a map/reduce temp table as long as the original connection is still alive. Also, other connections can access cursors created on other connections, even if those die. Cursors will be deleted on server only if idle for more than 10 minutes. Accessing a deleted cursor returns an error.
 + support safe operations, although operation with exclusive connection access is available which can be used to getLastError and check for that previous write was safe (successful).
 + auto-destoy connection (how?/when?). Although, GHC will automatically close connection (Handle) when garbage collected.
 + don't read into cursor until needed, but have cursor send getMore before
@ -82,3 +82,4 @@ Questions:

 Notes:
 - Remember that in the new version of MongoDB (>= 1.6), "ok" field can be a number (0 or 1) or boolean (False or True). Use 'true1' function defined in Database.MongoDB.Util
+- A cursor will die on the server if not accessed (by any connection) within past 10 minutes (unless NoCursorTimeout option set). Accessing a dead (or non-existent) cursor raises a ServerFailure exception.