Hello All,
We are still facing issues while running mongodump against oplog. Below are yet another logs. As a part of remediation, we are restarting the mongodump after failure.
- Log from our backup script
2020-09-17T11:03:05.475+0000 E QUERY [conn995798] Plan executor error during find command: DEAD, stats: { stage: "COLLSCAN", filter: { $and: [ { ts: { $lte: Timestamp(1600340581, 1413) }
}, { ts: { $gt: Timestamp(1600335989, 5722) } } ] }, nReturned: 0, executionTimeMillisEstimate: 2900, works: 54234, advanced: 0, needTime: 54233, needYield: 0, saveState: 430, restoreState
: 430, isEOF: 0, invalidates: 0, direction: "forward", docsExamined: 54232 }
- Mongo Logs
[blazemeter] percona@prd-mon-bzm-shard-hd02: mongodb $ sudo grep -m 3 -B 5 ‘DEAD’ mongodb.log
2020-09-17T10:25:47.271+0000 I NETWORK [conn995231] received client metadata from 10.100.0.98:54928 conn995231: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Linux", name: "CentOS Linux 7.7.1908 Core", architecture: "x86_64", version: "3.10.0-1062.18.1.el7.x86_64" }, platform: "CPython 2.7.5.final.0" }`
2020-09-17T10:25:47.380+0000 I ACCESS [conn995231] Successfully authenticated as principal pmonuser on admin from client 10.100.0.98:54928
2020-09-17T10:25:47.505+0000 I STORAGE [WT RecordStoreThread: local.oplog.rs] WiredTiger record store oplog truncation finished in: 239ms
2020-09-17T10:25:47.510+0000 I NETWORK [conn995230] end connection 10.100.0.98:54926 (14 connections now open)
2020-09-17T10:25:47.510+0000 I NETWORK [conn995231] end connection 10.100.0.98:54928 (13 connections now open)
2020-09-17T10:25:47.537+0000 E QUERY [conn995226] Plan executor error during find command: DEAD, stats: { stage: "COLLSCAN", filter: { $and: [ { ts: { $lte: Timestamp(1600338342, 195) } }, { ts: { $gt: Timestamp(1600335989, 5722) } } ] }, nReturned: 0, executionTimeMillisEstimate: 4590, works: 385676, advanced: 0, needTime: 385675, needYield: 0, saveState: 3015, restoreState: 3015, isEOF: 0, invalidates: 0, direction: "forward", docsExamined: 385674 }
2020-09-17T11:03:02.342+0000 I ACCESS [conn995799] Successfully authenticated as principal pmonuser on admin from client 10.100.0.98:43326
2020-09-17T11:03:03.013+0000 I NETWORK [listener] connection accepted from 127.0.0.1:49168 #995800 (13 connections now open)
2020-09-17T11:03:03.013+0000 I NETWORK [conn995800] received client metadata from 127.0.0.1:49168 conn995800: { driver: { name: "mongo-go-driver", version: "v1.3.2" }, os: { type: "linux", architecture: "amd64" }, platform: "go1.13.10", application: { name: "QAN-mongodb-profiler-/agent_id/5c0d2b03-eedd-4124-abda-ee6cea3cd48e" } }
2020-09-17T11:03:03.022+0000 I ACCESS [conn995800] Successfully authenticated as principal pmonuser on admin from client 127.0.0.1:49168
2020-09-17T11:03:05.421+0000 I STORAGE [WT RecordStoreThread: local.oplog.rs] WiredTiger record store oplog truncation finished in: 389ms
2020-09-17T11:03:05.475+0000 E QUERY [conn995798] Plan executor error during find command: DEAD, stats: { stage: "COLLSCAN", filter: { $and: [ { ts: { $lte: Timestamp(1600340581, 1413) } }, { ts: { $gt: Timestamp(1600335989, 5722) } } ] }, nReturned: 0, executionTimeMillisEstimate: 2900, works: 54234, advanced: 0, needTime: 54233, needYield: 0, saveState: 430, restoreState: 430, isEOF: 0, invalidates: 0, direction: "forward", docsExamined: 54232 }
Notice the Oplog window below -->
"logSizeMB" : 250000,
"usedMB" : 248310.45,
"timeDiff" : 135740,
"timeDiffHours" : 37.71,
"tFirst" : "Tue Sep 15 2020 21:28:10 GMT+0000 (UTC)",
"tLast" : "Thu Sep 17 2020 11:10:30 GMT+0000 (UTC)",
"now" : "Thu Sep 17 2020 11:10:30 GMT+0000 (UTC)"
Now, when I convert the last seen RecordId where the mongodump failed, it points to the time way before “tFirst” in oplog even though I am taking dump with higher “timestamps”
[blazemeter] percona@bzm-monitor: backups $ echo $((6872826050246617646 >> 32))
1600204513
[blazemeter] percona@bzm-monitor: backups $ date -d@1600204513
Tue Sep 15 21:15:13 UTC 2020
Can anyone please help as this is very crtical for us ?