Joe Drumgoole

19 results

Querying COVID-19 Data With MongoDB Atlas

MongoDB's Dev Rel team has adopted the Johns Hopkins COVID-19 data set to MongoDB Atlas. Learn how to access it for free.

April 22, 2020

Five security principles developers must follow

Find out about the principles that developers should be following to secure their applications.

February 18, 2020

Research: Developers are Trusted By The Business But The Alignment is Not Felt Evenly Across Different Generations

As organisations shift to becoming technology-focused, developers’ roles have evolved so that they are now playing a crucial role in decision making across their businesses. However, all this newfound alignment isn’t so keenly felt across the whole developer workforce.

November 26, 2019

PyMongo Monday - Episode 4 - Update

This is part 4 of PyMongo Monday. Previously we have covered: EP1 - Setting up your MongoDB Environment EP2 - Create - the C in CRUD EP3 - Read - the R in CRUD We are now into Update , the U in CRUD. The key aspect of update is the ability to change a document in place. In order for this to happen we must have some way to select the document and change parts of that document. In the pymongo driver this is achieved with two functions: update_one update_many Each update operation can take a range of update operators that define how we can mutate a document during update. Lets get a copy of the zipcode database hosted on MongoDB Atlas . As our copy hosted in Atlas is not writable we can't test update directly on it. However, we can create a local copy with this simple script: $ mongodump --host demodata-shard-0/demodata-shard-00-00-rgl39.mongodb.net:27017,demodata-shard-00-01-rgl39.mongodb.net:27017,demodata-shard-00-02-rgl39.mongodb.net:27017 --ssl --username readonly --password readonly --authenticationDatabase admin --db demo 2018-10-22T01:18:35.330+0100 writing demo.zipcodes to 2018-10-22T01:18:36.097+0100 done dumping demo.zipcodes (29353 documents) This will create a backup of the data in a dump directory in the current working directory. to restore the data to a local mongod make sure you are running mongod locally and just run mongorestore in the same directory as you ran mongodump . $ mongorestore 2018-10-22T01:19:19.064+0100 using default 'dump' directory 2018-10-22T01:19:19.064+0100 preparing collections to restore from 2018-10-22T01:19:19.066+0100 reading metadata for demo.zipcodes from dump/demo/zipcodes.metadata.json 2018-10-22T01:19:19.211+0100 restoring demo.zipcodes from dump/demo/zipcodes.bson 2018-10-22T01:19:19.943+0100 restoring indexes for collection demo.zipcodes from metadata 2018-10-22T01:19:20.364+0100 finished restoring demo.zipcodes (29353 documents) 2018-10-22T01:19:20.364+0100 done You will now have a demo database on your local mongod with a single collection called zipcodes . $ python Python 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 03:03:55) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import pymongo >>> client = pymongo.MongoClient() >>> database=client['demo'] >>> zipcodes=database["zipcodes"] >>> zipcodes.find_one() {'_id': '01001', 'city': 'AGAWAM', 'loc': [-72.622739, 42.070206], 'pop': 15338, 'state': 'MA'} >>> > Each document in this database has the same format: { '_id': '01001', # ZIP code 'city': 'AGAWAM', # City name 'loc': [-72.622739, 42.070206], # Geo Spatial Coordinates 'pop': 15338, # Population of within zip code 'state': 'MA', # Two letter state code (MA = Massachusetts) } Let's say we want to change the population to reflect the most current value . Today the population of 01001 is approximately 16769. To change the value we would execute the following update. >>> zipcodes.update( {"_id" : "01001"}, {"$set" : { "pop" : 16769}}) {'n': 1, 'nModified': 1, 'ok': 1.0, 'updatedExisting': True} >>> zipcodes.find_one({"_id" : "01001"}) {'_id': '01001', 'city': 'AGAWAM', 'loc': [-72.622739, 42.070206], 'pop': 16769, 'state': 'MA'} >>> Here we see the $set operator in action. The $set operator will set a field to a new value or create that field if it doesn't exist in the document. We add a new field by doing: >>> zipcodes.update_one( {"_id" : "01001"}, {"$set" : { "population_record" : []}}) <pymongo.results.UpdateResult object at 0x1042dc488> >>> zipcodes.find_one({"_id" : "01001"}) {'_id': '01001', 'city': 'AGAWAM', 'loc': [-72.622739, 42.070206], 'pop': 16769, 'state': 'MA', 'population_record': []} >>> Here we are adding a new field called population_record . This field is an array field and has been set to the empty array for now. Now we can update the array with a history of the population for the ZIP Code area. >>> zipcodes.update_one({"_id" : "01001"}, { "$push" : { "population_record" : { "pop" : 15338, "timestamp": None }}}) <pymongo.results.UpdateResult object at 0x106c210c8> >>> zipcodes.find_one({"_id" : "01001"}) {'_id': '01001', 'city': 'AGAWAM', 'loc': [-72.622739, 42.070206], 'pop': 16769, 'state': 'MA', 'population_record': [{'pop': 15338, 'timestamp': None}]} >>> from datetime import datetime >>> zipcodes.update_one({"_id" : "01001"}, { "$push" : { "population_record" : { "pop" : 16769, "timestamp": datetime.utcnow() }}}) <pymongo.results.UpdateResult object at 0x106c21908> >>> zipcodes.find_one({"_id" : "01001"}) {'_id': '01001', 'city': 'AGAWAM', 'loc': [-72.622739, 42.070206], 'pop': 16769, 'state': 'MA', 'population_record': [{'pop': 15338, 'timestamp': None}, {'pop': 16769, 'timestamp': datetime.datetime(2018, 10, 22, 11, 37, 5, 60000)}]} >>> x=zipcodes.find_one({"_id" : "01001"}) >>> x {'_id': '01001', 'city': 'AGAWAM', 'loc': [-72.622739, 42.070206], 'pop': 16769, 'state': 'MA', 'population_record': [{'pop': 15338, 'timestamp': None}, {'pop': 16769, 'timestamp': datetime.datetime(2018, 10, 22, 11, 37, 5, 60000)}]} >>> import pprint >>> pprint.pprint(x) {'_id': '01001', 'city': 'AGAWAM', 'loc': [-72.622739, 42.070206], 'pop': 16769, 'population_record': [{'pop': 15338, 'timestamp': None}, {'pop': 16769, 'timestamp': datetime.datetime(2018, 10, 22, 11, 37, 5, 60000)}], 'state': 'MA'} >>> Here we have appended two documents to the array so that we have a history of the changes in population. The original value of 15338 was captured at an unknown time in the past so we set that timestamp to None . We updated the other value today so we can set that timestamp to the current time. In both cases we use the $push operator to push new elements onto the end of the array population_record . You can see how we use pprint to produce the output in a slightly more readable format. If we want to apply updates to more than one record we use update_many to apply changes to more than one document. Now if the filter applies to more than one document the changes will be applied to each document. So imagine we wanted to add the city sales tax to each city. First, we want to add the city sales tax to all the ZIP Code regions in New York. >>> zipcodes.update_many( {'city': "NEW YORK"}, { "$set" : { "sales tax" : 4.5 }}) <pymongo.results.UpdateResult object at 0x1042dcd88> >>> zipcodes.find( {"city": "NEW YORK"}) <pymongo.cursor.Cursor object at 0x101e09410> >>> cursor=zipcodes.find( {"city": "NEW YORK"}) >>> cursor.next() {u'city': u'NEW YORK', u'loc': [-73.996705, 40.74838], u'sales tax': 4.5, u'state': u'NY', u'pop': 18913, u'_id': u'10001'} >>> cursor.next() {u'city': u'NEW YORK', u'loc': [-73.987681, 40.715231], u'sales tax': 4.5, u'state': u'NY', u'pop': 84143, u'_id': u'10002'} >>> cursor.next() {u'city': u'NEW YORK', u'loc': [-73.989223, 40.731253], u'sales tax': 4.5, u'state': u'NY', u'pop': 51224, u'_id': u'10003'} >>> The final kind of update operation we want to talk about is upsert . We can add the upsert flag to any update operation to do an insert of the target document even when it doesn't match. When is this useful? Imagine we have a read-only collection of ZIP Code data and we want to create a new collection (call it zipcodes_new ) that contains updates to the ZIP Codes that contain changes in population. As we collect new population stats ZIP Code by ZIP Code we want to update the zipcodes_new collection with new documents containing the updated ZIP Code data. In order to simplify this process we can do the updates as an upsert . Below is a fragment of code from update_population.py zip_doc = zipcodes.find_one({"_id": args.zipcode}) zip_doc["pop"] = {"pop": args.pop, "timestamp": args.date} zipcodes_new.update({"_id":args.zipcode}, zip_doc, upsert=True) print("New zipcode data: " + zip_doc["_id"]) pprint.pprint(zip_doc) The upsert=True flag ensures that if we don't match the initial clause {"_id":args.zipcode} we will still insert the zip_doc doc. This is a common pattern for upsert usage: Initially we insert based on a unique key. As the the number of inserts grows the likelihood that we will be updating an existing key as opposed to inserting a new key grows. the upsert=True flag allows us to handle both situations in a single update statement. There is a lot more to update and I will return to update later in the series. For now just remember that update is generally used for mutating existing documents using a range of update operators . Next time we will complete our first pass over CRUD operations with the final function, delete .

October 29, 2018

Why You Need to Be at MongoDB Europe 2018

MongoDB Europe 2018 is just around the corner. On the 8th of November, our premiere European event will bring together over 1000 members of the MongoDB developer community to learn about our existing technology, find out what’s around the corner and hear from our CTO, Eliot Horowitz. It is also a chance to celebrate the satisfaction of working with the world’s most developer focussed data platform. This year we are back at Old Billingsgate which is a fabulous venue for a tech event. There will be three technical tracks (or Shards as we call them) and, of course, this year we see the return of Shard N. Shard N is our high-end technical tutorial sessions where members of MongoDB technical staff get more time to cover more material in depth. These sessions are designed for our most seasoned developers to get new insights into how our products and offerings can be used to solve the most challenging business problems. This year's sessions include John Page on comparing RDBMS and MongoDB performance and the real skinny on Workload isolation from everyone’s favourite MongoDB Ninja, Asya Kamsky . In the main Shards we have Keith Bostic talking about how we built the new transactions engine and lots of sessions on our new serverless platform MongoDB Stitch. Remember, regardless of whether you are a veteran of MongoDB or coming to the database for the first time, the four parallel tracks will ensure that there is always something on for everybody. The people in white coats will be back again this year. Who are they? They are members of our MongoDB Consulting and Solution Architecture teams and nobody knows more about MongoDB than these folks. You can book a slot with them via a calendaring system that will be sent out after registration. All attendees will receive: A MongoDB Europe 2018 hoodie and other exclusive swag such as MongoDB Europe stickers, buttons, and pins 3-months of free on-demand access to MongoDB University (Courses in Java, Python, and Node.js are included.) 50% off MongoDB Certification exams Future discounts on MongoDB events as Alumni We will have the top of the line London Street Food initiative, Kerb , catering the day, and other fun stuff like a nitro-ice-cream parlour and all-day table tennis tournaments. The day will off finish with a drinks reception on us! Register today for your tickets. Get a 25% discount per person for groups of 3 or more. And just for reading this far you get another 20% off by using the code JOED20 . What’s not to like? See you all on the 8th of November at Old Billingsgate.

October 2, 2018

PyMongo Monday - Episode 3 - Read

PyMongo Monday - Episode 3 - Read Previously we covered: Episode 1 : Setting Up Your MongoDB Environment Episode 2 : CRUD - Create In this episode (episode 3) we are are going to cover the Read part of CRUD. MongoDB provides a query interface through the find function. We are going to demonstrate Read by doing find queries on a collection hosted in MongoDB Atlas . The MongoDB connection string is: mongodb+srv://demo:demo@demodata-rgl39.mongodb.net/test?retryWrites=true This is a cluster running a database called demo with a single collection called zipcodes . Every ZIP code in the US is in this database. To connect to this cluster we are going to use the Python shell. $ cd ep003 $ pipenv shell Launching subshell in virtual environment… JD10Gen:ep003 jdrumgoole$ . /Users/jdrumgoole/.local/share/virtualenvs/ep003-blzuFbED/bin/activate (ep003-blzuFbED) JD10Gen:ep003 jdrumgoole$ python Python 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 03:03:55) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> >>> from pymongo import MongoClient >>> client = MongoClient(host="mongodb+srv://demo:demo@demodata-rgl39.mongodb.net/test?retryWrites=true") >>> db = client["demo"] >>> zipcodes=db["zipcodes"] >>> zipcodes.find_one() {'_id': '01069', 'city': 'PALMER', 'loc': [-72.328785, 42.176233], 'pop': 9778, 'state': 'MA'} >>> The find_one query will get the first record in the collection. You can see the structure of the fields in the returned document. The _id is the zip code. The city is the city name. The loc is the GPS coordinates of each zip code. The pop is the population size and the state is the two-letter state code. We are connecting with the default user demo with the password demo . This user only has read-only access to this database and collection. So what if we want to select all the ZIP codes for a particular city? Querying in MongoDB consists of constructing a partial JSON document that matches the fields you want to select on. So to get all the zip codes in the city of PALMER we use the following query >>> zipcodes.find({'city': 'PALMER'}) <pymongo.cursor.Cursor object at 0x104c155c0> >>> Note we are using find() rather than find_one() as we want to return all the matching documents. In this case find() will return a cursor . To print the cursor contents just keep calling .next() on the cursor as follows: >>> cursor=zipcodes.find({'city': 'PALMER'}) >>> cursor.next() {'_id': '01069', 'city': 'PALMER', 'loc': [-72.328785, 42.176233], 'pop': 9778, 'state': 'MA'} >>> cursor.next() {'_id': '37365', 'city': 'PALMER', 'loc': [-85.564272, 35.374062], 'pop': 1685, 'state': 'TN'} >>> cursor.next() {'_id': '50571', 'city': 'PALMER', 'loc': [-94.543155, 42.641871], 'pop': 1119, 'state': 'IA'} >>> cursor.next() {'_id': '66962', 'city': 'PALMER', 'loc': [-97.112214, 39.619165], 'pop': 276, 'state': 'KS'} >>> cursor.next() {'_id': '68864', 'city': 'PALMER', 'loc': [-98.241146, 41.178757], 'pop': 1142, 'state': 'NE'} >>> cursor.next() {'_id': '75152', 'city': 'PALMER', 'loc': [-96.679429, 32.438714], 'pop': 2605, 'state': 'TX'} >>> cursor.next() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/jdrumgoole/.local/share/virtualenvs/ep003-blzuFbED/lib/python3.6/site-packages/pymongo/cursor.py", line 1197, in next raise StopIteration StopIteration As you can see cursors follow the Python iterator protocol and will raise a StopIteration exception when the cursor is exhausted. However, calling .next() continuously is a bit of a drag. Instead, you can import the pymongo_shell package and call the print_cursor() function. It will print out twenty records at a time. >>> from pymongo_shell import print_cursor >>> print_cursor(zipcodes.find({'city': 'PALMER'})) {'_id': '01069', 'city': 'PALMER', 'loc': [-72.328785, 42.176233], 'pop': 9778, 'state': 'MA'} {'_id': '37365', 'city': 'PALMER', 'loc': [-85.564272, 35.374062], 'pop': 1685, 'state': 'TN'} {'_id': '50571', 'city': 'PALMER', 'loc': [-94.543155, 42.641871], 'pop': 1119, 'state': 'IA'} {'_id': '66962', 'city': 'PALMER', 'loc': [-97.112214, 39.619165], 'pop': 276, 'state': 'KS'} {'_id': '68864', 'city': 'PALMER', 'loc': [-98.241146, 41.178757], 'pop': 1142, 'state': 'NE'} {'_id': '75152', 'city': 'PALMER', 'loc': [-96.679429, 32.438714], 'pop': 2605, 'state': 'TX'} >>> If we don't need all the fields in the doc we can use projection to remove some fields. This is a second doc argument to the find() function. This doc can specify the fields to return explicitly. >>> print_cursor(zipcodes.find({'city': 'PALMER'}, {'city':1,'pop':1})) {'_id': '01069', 'city': 'PALMER', 'pop': 9778} {'_id': '37365', 'city': 'PALMER', 'pop': 1685} {'_id': '50571', 'city': 'PALMER', 'pop': 1119} {'_id': '66962', 'city': 'PALMER', 'pop': 276} {'_id': '68864', 'city': 'PALMER', 'pop': 1142} {'_id': '75152', 'city': 'PALMER', 'pop': 2605} To include multiple fields in a query just add them to query doc. Each field is treated as a boolean and to select the documents that will be returned. >>> print_cursor(zipcodes.find({'city': 'PALMER', 'state': 'MA'}, {'city':1,'pop':1})) {'_id': '01069', 'city': 'PALMER', 'pop': 9778} >>> To pick documents with one field or the other we can use the $or operator. >>> print_cursor(zipcodes.find({ '$or' : [ {'city': 'PALMER' }, {'state': 'MA'}]})) {'_id': '01069', 'city': 'PALMER', 'loc': [-72.328785, 42.176233], 'pop': 9778, 'state': 'MA'} {'_id': '01002', 'city': 'CUSHMAN', 'loc': [-72.51565, 42.377017], 'pop': 36963, 'state': 'MA'} {'_id': '01012', 'city': 'CHESTERFIELD', 'loc': [-72.833309, 42.38167], 'pop': 177, 'state': 'MA'} {'_id': '01073', 'city': 'SOUTHAMPTON', 'loc': [-72.719381, 42.224697], 'pop': 4478, 'state': 'MA'} {'_id': '01096', 'city': 'WILLIAMSBURG', 'loc': [-72.777989, 42.408522], 'pop': 2295, 'state': 'MA'} {'_id': '01262', 'city': 'STOCKBRIDGE', 'loc': [-73.322263, 42.30104], 'pop': 2200, 'state': 'MA'} {'_id': '01240', 'city': 'LENOX', 'loc': [-73.271322, 42.364241], 'pop': 5001, 'state': 'MA'} {'_id': '01370', 'city': 'SHELBURNE FALLS', 'loc': [-72.739059, 42.602203], 'pop': 4525, 'state': 'MA'} {'_id': '01340', 'city': 'COLRAIN', 'loc': [-72.726508, 42.67905], 'pop': 2050, 'state': 'MA'} {'_id': '01462', 'city': 'LUNENBURG', 'loc': [-71.726642, 42.58843], 'pop': 9117, 'state': 'MA'} {'_id': '01473', 'city': 'WESTMINSTER', 'loc': [-71.909599, 42.548319], 'pop': 6191, 'state': 'MA'} {'_id': '01510', 'city': 'CLINTON', 'loc': [-71.682847, 42.418147], 'pop': 13269, 'state': 'MA'} {'_id': '01569', 'city': 'UXBRIDGE', 'loc': [-71.632869, 42.074426], 'pop': 10364, 'state': 'MA'} {'_id': '01775', 'city': 'STOW', 'loc': [-71.515019, 42.430785], 'pop': 5328, 'state': 'MA'} {'_id': '01835', 'city': 'BRADFORD', 'loc': [-71.08549, 42.758597], 'pop': 12078, 'state': 'MA'} {'_id': '01845', 'city': 'NORTH ANDOVER', 'loc': [-71.109004, 42.682583], 'pop': 22792, 'state': 'MA'} {'_id': '01851', 'city': 'LOWELL', 'loc': [-71.332882, 42.631548], 'pop': 28154, 'state': 'MA'} {'_id': '01867', 'city': 'READING', 'loc': [-71.109021, 42.527986], 'pop': 22539, 'state': 'MA'} {'_id': '01906', 'city': 'SAUGUS', 'loc': [-71.011093, 42.463344], 'pop': 25487, 'state': 'MA'} {'_id': '01929', 'city': 'ESSEX', 'loc': [-70.782794, 42.628629], 'pop': 3260, 'state': 'MA'} Hit Return to continue We can do range selections by using the $lt and $gt operators. >>> print_cursor(zipcodes.find({'pop' : { '$lt':8, '$gt':5}})) {'_id': '05901', 'city': 'AVERILL', 'loc': [-71.700268, 44.992304], 'pop': 7, 'state': 'VT'} {'_id': '12874', 'city': 'SILVER BAY', 'loc': [-73.507062, 43.697804], 'pop': 7, 'state': 'NY'} {'_id': '32830', 'city': 'LAKE BUENA VISTA', 'loc': [-81.519034, 28.369378], 'pop': 6, 'state': 'FL'} {'_id': '59058', 'city': 'MOSBY', 'loc': [-107.789149, 46.900453], 'pop': 7, 'state': 'MT'} {'_id': '59242', 'city': 'HOMESTEAD', 'loc': [-104.591805, 48.429616], 'pop': 7, 'state': 'MT'} {'_id': '71630', 'city': 'ARKANSAS CITY', 'loc': [-91.232529, 33.614328], 'pop': 7, 'state': 'AR'} {'_id': '82224', 'city': 'LOST SPRINGS', 'loc': [-104.920901, 42.729835], 'pop': 6, 'state': 'WY'} {'_id': '88412', 'city': 'BUEYEROS', 'loc': [-103.666894, 36.013541], 'pop': 7, 'state': 'NM'} {'_id': '95552', 'city': 'MAD RIVER', 'loc': [-123.413994, 40.352352], 'pop': 6, 'state': 'CA'} {'_id': '99653', 'city': 'PORT ALSWORTH', 'loc': [-154.433803, 60.636416], 'pop': 7, 'state': 'AK'} >>> Again sets of $lt and $gt are combined as a boolean and . if you need different logic you can use the boolean operators . Conclusion Today we have seen how to query documents using a query template, how to reduce the output using projections and how to create more complex queries using boolean and $lt and $gt operators. Next time we will talk about the Update portion of CRUD. MongoDB has a very rich and full-featured query language including support for querying using full-text, geospatial coordinates and nested queries. Give the query language a spin with the Python shell using the tools we outlined above. The complete zip codes dataset is publicly available for read queries at the MongoDB URI: mongodb+srv://demo:demo@demodata-rgl39.mongodb.net/test?retryWrites=true Try MongoDB Atlas via the free-tier today. A free MongoDB cluster for your own personal use forever!

October 1, 2018

How To Pause and Resume Atlas Clusters

A quick look at using Python to script tasks in your MongoDB Atlas Cluster. This example shows how to pause and resume Atlas Clusters.

September 25, 2018

Listing Your MongoDB Atlas Resources

If you want to use the MongoDB Atlas API to manage your clusters one of the first things you will discover is that resource IDs are the keys to the kingdom. In order to use the API you will need an API key and you will need to grant access to your program via the API whitelist.

September 19, 2018

PyMongo Monday: PyMongo Create

Last time we showed you how to setup up your environment . In the next few episodes we will take you through the standard CRUD operators that every database is expected to support. In this episode we will focus on the Create in CRUD. Create Lets look at how we insert JSON documents into MongoDB. First lets start a local single instance of mongod using m . $ m use stable 2018-08-28T14:58:06.674+0100 I CONTROL [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none' 2018-08-28T14:58:06.689+0100 I CONTROL [initandlisten] MongoDB starting : pid=43658 port=27017 dbpath=/data/db 64-bit host=JD10Gen.local 2018-08-28T14:58:06.689+0100 I CONTROL [initandlisten] db version v4.0.2 2018-08-28T14:58:06.689+0100 I CONTROL [initandlisten] git version: fc1573ba18aee42f97a3bb13b67af7d837826b47 2018-08-28T14:58:06.689+0100 I CONTROL [initandlisten] allocator: syste etc... The mongod starts listening on port 27017 by default. As every MongoDB driver defaults to connecting on localhost:27017 we won't need to specify a connection string explicitly in these early examples. Now, we want to work with the Python driver. These examples are using Python 3.6.5 but everything should work with versions as old as Python 2.7 without problems. Unlike SQL databases, databases and collections in MongoDB only have to be named to be created. As we will see later this is a lazy creation process, and the database and corresponding collection are actually only created when a document is inserted. $ python Python 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 03:03:55) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> >>> import pymongo >>> client = pymongo.MongoClient() >>> database = client[ "ep002" ] >>> people_collection = database[ "people_collection" ] >>> result=people_collection.insert_one({"name" : "Joe Drumgoole"}) >>> result.inserted_id ObjectId('5b7d297cc718bc133212aa94') >>> result.acknowledged True >>> people_collection.find_one() {'_id': ObjectId('5b62e6f8c3b498fbfdc1c20c'), 'name': 'Joe Drumgoole'} True >>> First we import the pymongo library (line 6) . Then we create the local client proxy object , client = pymongo.MongoClient() (line 7) . The client object manages a connection pool to the server and can be used to set many operational parameters related to server connections. We can leave the parameter list to the MongoClient call blank. Remember, the server by default listens on port 27017 and the client by default attempts to connect to localhost:27017 . Once we have a client object, we can now create a database, ep002 (line 8) and a collection, people_collection (line 9) . Note that we do not need an explicit DDL statement. Using Compass to examine the database server A database is effectively a container for collections. A collection provides a container for documents. Neither the database nor the collection will be created on the server until you actually insert a document. If you check the server by connecting MongoDB Compass you will see that there are no databases or collections on this server before the insert_one call. These commands are lazily evaluated. So, until we actually insert a document into the collection, nothing happens on the server. Once we insert a document: >>>> result=database.people_collection.insert_one({"name" : "Joe Drumgoole"}) >>> result.inserted_id ObjectId('5b7d297cc718bc133212aa94') >>> result.acknowledged True >>> people_collection.find_one() {'_id': ObjectId('5b62e6f8c3b498fbfdc1c20c'), 'name': 'Joe Drumgoole'} True >>> We will see that the database, the collection, and the document are created. And we can see the document in the database. _id Field Every object that is inserted into a MongoDB database gets an automatically generated _id field. This field is guaranteed to be unique for every document inserted into the collection. This unique property is enforced as the _id field is automatically indexed and the index is unique . The value of the _id field is defined as follows: The _id field is generated on the client and you can see the PyMongo generation code in the objectid.py file. Just search for the def _generate string. All MongoDB drivers generate _id fields on the client side. The _id field allows us to insert the same JSON object many times and allow each one to be uniquely identified. The _id field even gives a temporal ordering and you can get this from an ObjectID via the generation_time method. >>> from bson import ObjectId >>> x=ObjectId('5b7d297cc718bc133212aa94') >>> x.generation_time datetime.datetime(2018, 8, 22, 9, 14, 36, tzinfo=) >>> <b>print(x.generation_time)</b> 2018-08-22 09:14:36+00:00 >>> Wrap Up That is create in MongoDB. We started a mongod instance, created a MongoClient proxy, created a database and a collection and finally made then spring to life by inserting a document. Next up we will talk more about Read part of CRUD. In MongoDB this is the find query which we saw a little bit of earlier on in this episode. For direct feedback please pose your questions on twitter/jdrumgoole that way everyone can see the answers. The best way to try out MongoDB is via MongoDB Atlas our Database as a Service. It’s free to get started with MongoDB Atlas so give it a try today.

September 17, 2018