Road to re:Invent

5 results

Developing a Facebook Chatbot with AWS Lambda and MongoDB Atlas

This post is part of our Road to re:Invent series . In the weeks leading up to AWS re:Invent in Las Vegas this November, we'll be posting about a number of topics related to running MongoDB in the public cloud. Introduction While microservices have been the hot trend over the past couple of years, serverless architectures have been gaining momentum by providing a new way to build scalable, responsive and cost effective applications. Serverless computing frees developers from the traditional cost and effort of building applications by automatically provisioning servers and storage, maintaining infrastructure, upgrading software, and only charging for consumed resources. More insight into serverless computing can be found in this whitepaper . Amazon’s serverless computing platform, AWS Lambda, lets you run code without provisioning and running servers. MongoDB Atlas is Hosted MongoDB as a Service. MongoDB Atlas provides all the features of the database without the heavy operational lifting. Developers no longer need to worry about operational tasks such as provisioning, configuration, patching, upgrades, backups, and failure recovery. In addition, MongoDB Atlas offers elastic scalability, either by scaling up on a range of instance sizes or scaling out with automatic sharding, all with no application downtime. Together, AWS Lambda and MongoDB Atlas allow developers to spend more time developing code and less time managing the infrastructure. Learn how to easily integrate an AWS Lambda Node.js function with a MongoDB database in this tutorial. To demonstrate the power of serverless computing and managed database as a service, I’ll use this blog post to show you how to develop a Facebook chatbot that responds to weather requests and stores the message information in MongoDB Atlas . Setting Up MongoDB Atlas MongoDB Atlas provides multiple size options for instances. Within an instance class, there is also the ability to customize storage capacity and storage speed, as well as to use encrypted storage volumes. The number of virtual CPUs (vCPUs) – where a vCPU is a shared physical core or one hyperthread – increases as the instance class grows larger. The M10, M20, and M30 instances are excellent for development and testing purposes, but for production it is recommended to use instances higher than M30. The base options for instances are: M0 - Variable RAM, 512 MB Storage M10 – 2 GB RAM, 10 GB Storage, 1 vCPU M20 – 4 GB RAM, 20 GB Storage, 2 vCPUs M30 – 8 GB RAM, 40 GB Storage, 2 vCPUs M40 – 16 GB RAM, 80 GB Storage, 4 vCPUs M50 – 32 GB RAM, 160 GB Storage, 8 vCPUs M60 – 64 GB RAM, 320 GB Storage, 16 vCPUs M100 – 160 GB RAM, 1000 GB Storage, 40 vCPUs Register with MongoDB Atlas and use the intuitive user interface to select the instance size, region, and features you need. Connecting MongoDB Atlas to AWS Lambda Important note : VPC Peering is not available with MongoDB Atlas free tier (M0). If you use an M0 cluster, allow any IP to connect to your M0 cluster and switch directly to the Set up AWS Lambda section. MongoDB Atlas enables VPC (Virtual Private Cloud) peering, which allows you to easily create a private networking connection between your application servers and backend database. Traffic is routed between the VPCs using private IP addresses. Instances in either VPC can communicate with each other as if they are within the same network. Note, VPC peering requires that both VPCs be in the same region. Below is an architecture diagram of how to connect MongoDB Atlas to AWS Lambda and route traffic to the Internet. Figure 1: AWS Peering Architecture Architecture For our example, a Network Address Translation (NAT) Gateway and Internet Gateway (IGW) is needed as the Lambda function will require internet access to query data from the Yahoo weather API. The Yahoo weather API will be used to query real-time weather data from the chatbot. The Lambda function we will create resides in the private subnet of our VPC. Because the subnet is private, the IP addresses assigned to the Lambda function cannot be used in public. To solve this issue, a NAT Gateway can be used to translate private IP addresses to public, and vice versa. An IGW is also needed to provide access to the internet. The first step is to set up an Elastic IP address, which will be the static IP address of your Lambda functions to the outside world. Go to Services->VPC->Elastic IPs , and allocate a new Elastic IP address. Next we will create a new VPC, which you will attach to your Lambda function. Go to Services->VPC->Start VPC Wizard . After clicking VPC wizard, select VPC with Public and Private Subnets . Let’s configure our VPC. Give the VPC a name (e.g., “Chatbot App VPC”), select an IP CIDR block, choose an Availability Zone, and select the Elastic IP you created in the previous step. Note, the IP CIDR that you select for your VPC, must not overlap with the Atlas IP CIDR. Click Create VPC to set up your VPC. The AWS VPC wizard will automatically set up the NAT and IGW. You should see the VPC you created in the VPC dashboard. Go to the Subnets tab to see if your private and public subnets have been set up correctly. Click on the Private Subnet and go to the Route Table tab in the lower window. You should see the NAT gateway set to 0.0.0.0/0, which means that messages sent to IPs outside of the private subnet will be routed to the NAT gateway. Next, let's check the public subnet to see if it’s configured correctly. Select Public subnet and the Route Table tab in the lower window. You should see 0.0.0.0/0 connected to your IGW. The IGW will enable outside internet traffic to be routed to your Lambda functions. Now, the final step is initiating a VPC peering connection between MongoDB Atlas and your Lambda VPC. Log in to MongoDB Atlas, and go to Clusters->Security->Peering->New Peering Connection . After successfully initiating the peering connection, you will see the Status of the peering connection as Waiting for Approval . Go back to AWS and select Services->VPC->Peering Connections . Select the VPC peering connection. You should see the connection request pending. Go to Actions and select Accept Request . Once the request is accepted, you should see the connection status as active . We will now verify that the routing is set up correctly. Go to the Route Table of the Private Subnet in the VPC you just set up. In this example, it is rtb-58911e3e . You will need to modify the Main Route Table (see Figure 1) to add the VPC Peering connection. This will allow traffic to be routed to MongoDB Atlas . Go to the Routes tab and select Edit->Add another route . In the Destination field, add your Atlas CIDR block, which you can find in the Clusters->Security tab of the MongoDB Atlas web console: Click in the Target field. A dropdown list will appear, where you should see the peering connection you just created. Select it and click Save . Now that the VPC peering connection is established between the MongoDB Atlas and AWS Lambda VPCs, let’s set up our AWS Lambda function. Set Up AWS Lambda Now that our MongoDB Atlas cluster is connected to AWS Lambda, let’s develop our Lambda function. Go to Services->Lambda->Create Lambda Function . Select your runtime environment (here it’s Node.js 4.3), and select the hello-world starter function. Select API Gateway in the box next to the Lambda symbol and click Next . Create your API name, select dev as the deployment stage, and Open as the security. Then click Next . In the next step, make these changes to the following fields: Name : Provide a name for your function – for example, lambda-messenger-chatbot Handler : Leave as is (index.handler) Role : Create a basic execution role and use it (or use an existing role that has permissions to execute Lambda functions) Timeout : Change to 10 seconds. This is not necessary but will give the Lambda function more time to spin up its container on initialization (if needed) VPC : Select the VPC you created in the previous step Subnet : Select the private subnet for the VPC (don’t worry about adding other subnets for now) Security Groups : the default security group is fine for now Press Next , review and create your new Lambda function. In the code editor of your Lambda function, paste the following code snippet and press the Save button: 'use strict'; var VERIFY_TOKEN = "mongodb_atlas_token"; exports.handler = (event, context, callback) => { var method = event.context["http-method"]; // process GET request if(method === "GET"){ var queryParams = event.params.querystring; var rVerifyToken = queryParams['hub.verify_token'] if (rVerifyToken === VERIFY_TOKEN) { var challenge = queryParams['hub.challenge'] callback(null, parseInt(challenge)) }else{ callback(null, 'Error, wrong validation token'); } } }; This is the piece of code we'll need later on to set up the Facebook webhook to our Lambda function. Set Up AWS API Gateway Next, we will need to set up the API gateway for our Lambda function. The API gateway will let you create, manage, and host a RESTful API to expose your Lambda functions to Facebook messenger. The API gateway acts as an abstraction layer to map application requests to the format your integration endpoint is expecting to receive. For our example, the endpoint will be our Lambda function. Go to Services->API Gateway->[your Lambda function]->Resources->ANY . Click on Integration Request . This will configure the API Gateway to properly integrate Facebook with your backend application (AWS Lambda). We will set the integration endpoint to lambda-messenger-bot , which is the name I chose for our Lambda function. Uncheck Use Lambda Proxy Integration and navigate to the Body Mapping Templates section. Select When there are no templates defined as the Request body passthrough option and add a new template called application/json . Don't select any value in the Generate template section, add the code below and press Save : ## See http://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-mapping-template-reference.html ## This template will pass through all parameters including path, querystring, header, stage variables, and context through to the integration endpoint via the body/payload #set($allParams = $input.params()) { "body-json" : $input.json('$'), "params" : { #foreach($type in $allParams.keySet()) #set($params = $allParams.get($type)) "$type" : { #foreach($paramName in $params.keySet()) "$paramName" : "$util.escapeJavaScript($params.get($paramName))" #if($foreach.hasNext),#end #end } #if($foreach.hasNext),#end #end }, "stage-variables" : { #foreach($key in $stageVariables.keySet()) "$key" : "$util.escapeJavaScript($stageVariables.get($key))" #if($foreach.hasNext),#end #end }, "context" : { "account-id" : "$context.identity.accountId", "api-id" : "$context.apiId", "api-key" : "$context.identity.apiKey", "authorizer-principal-id" : "$context.authorizer.principalId", "caller" : "$context.identity.caller", "cognito-authentication-provider" : "$context.identity.cognitoAuthenticationProvider", "cognito-authentication-type" : "$context.identity.cognitoAuthenticationType", "cognito-identity-id" : "$context.identity.cognitoIdentityId", "cognito-identity-pool-id" : "$context.identity.cognitoIdentityPoolId", "http-method" : "$context.httpMethod", "stage" : "$context.stage", "source-ip" : "$context.identity.sourceIp", "user" : "$context.identity.user", "user-agent" : "$context.identity.userAgent", "user-arn" : "$context.identity.userArn", "request-id" : "$context.requestId", "resource-id" : "$context.resourceId", "resource-path" : "$context.resourcePath" } } The mapping template will structure the Facebook response in the desired format specified by the application/json template. The Lambda function will then extract information from the response and return the required output to the chatbot user. For more information on AWS mapping templates, see the AWS documentation . Go back to Services->API Gateway->[your Lambda function]->Resources->ANY and select Method Request . In the Settings section, make sure NONE is selected in the Authorization dropdown list. If not, change it NONE and press the small Update button. Go back to the Actions button for your API gateway and select Deploy API to make your API gateway accessible by the internet. Your API gateway is ready to go. Set Up Facebook Messenger Facebook makes it possible to use Facebook Messenger as the user interface for your chatbot. For our chatbot example, we will use Messenger as the UI. To create a Facebook page and Facebook app, go to the Facebook App Getting Started Guide to set up your Facebook components. To connect your Facebook App to AWS Lambda you will need to go back to your API gateway. Go to your Lambda function and find the API endpoint URL (obscured in the picture below). Go back to your Facebook App page and in the Add Product page, click on the Get Started button next to the Messenger section. Scroll down and in the Webhooks section, press the Setup webhooks button. A New Page Subscription page window should pop up. Enter your API endpoint URL in the Callback URL text box and in the Verify Token text box, enter a token name that you will use in your Lambda verification code (e.g. mongodb_atlas_token ). As the Facebook docs explain, your code should look for the Verify Token and respond with the challenge sent in the verification request. Last, select the messages and messaging_postbacks subscription fields. Press the Verify and Save button to start the validation process. If everything went well, the Webhooks section should show up again and you should see a Complete confirmation in green: In the Webhooks section, click on Select a Page to select a page you already created. If you don't have any page on Facebook yet, you will first need to create a Facebook page . Once you have selected an existing page and press the Subscribe button. Scroll up and in the Token Generation section, select the same page you selected above to generate a page token. The first time you want to complete that action, Facebook might pop up a consent page to request your approval to grant your Facebook application some necessary page-related permissions. Press the Continue as [your name] button and the OK button to approve these permissions. Facebook generates a page token which you should copy and paste into a separate document. We will need it when we complete the configuration of our Lambda function. Connect Facebook Messenger UI to AWS Lambda Function We will now connect the Facebook Messenger UI to AWS Lambda and begin sending weather queries through the chatbot. Below is the index.js code for our Lambda function. The index.js file will be packaged into a compressed archive file later on and loaded to our AWS Lambda function. "use strict"; var assert = require("assert"); var https = require("https"); var request = require("request"); var MongoClient = require("mongodb").MongoClient; var facebookPageToken = process.env["PAGE_TOKEN"]; var VERIFY_TOKEN = "mongodb_atlas_token"; var mongoDbUri = process.env["MONGODB_ATLAS_CLUSTER_URI"]; let cachedDb = null; exports.handler = (event, context, callback) => { context.callbackWaitsForEmptyEventLoop = false; var httpMethod; if (event.context != undefined) { httpMethod = event.context["http-method"]; } else { //used to test with lambda-local httpMethod = "PUT"; } // process GET request (for Facebook validation) if (httpMethod === "GET") { console.log("In Get if loop"); var queryParams = event.params.querystring; var rVerifyToken = queryParams["hub.verify_token"]; if (rVerifyToken === VERIFY_TOKEN) { var challenge = queryParams["hub.challenge"]; callback(null, parseInt(challenge)); } else { callback(null, "Error, wrong validation token"); } } else { // process POST request (Facebook chat messages) var messageEntries = event["body-json"].entry; console.log("message entries are " + JSON.stringify(messageEntries)); for (var entryIndex in messageEntries) { var messageEntry = messageEntries[entryIndex].messaging; for (var messageIndex in messageEntry) { var messageEnvelope = messageEntry[messageIndex]; var sender = messageEnvelope.sender.id; if (messageEnvelope.message && messageEnvelope.message.text) { var onlyStoreinAtlas = false; if ( messageEnvelope.message.is_echo && messageEnvelope.message.is_echo == true ) { console.log("only store in Atlas"); onlyStoreinAtlas = true; } if (!onlyStoreinAtlas) { var location = messageEnvelope.message.text; var weatherEndpoint = "https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20weather.forecast%20where%20woeid%20in%20(select%20woeid%20from%20geo.places(1)%20where%20text%3D%22" + location + "%22)&format=json&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys"; request( { url: weatherEndpoint, json: true }, function(error, response, body) { try { var condition = body.query.results.channel.item.condition; var response = "Today's temperature in " + location + " is " + condition.temp + ". The weather is " + condition.text + "."; console.log( "The response to send to Facebook is: " + response ); sendTextMessage(sender, response); storeInMongoDB(messageEnvelope, callback); } catch (err) { console.error( "error while sending a text message or storing in MongoDB: ", err ); sendTextMessage(sender, "There was an error."); } } ); } else { storeInMongoDB(messageEnvelope, callback); } } else { process.exit(); } } } } }; function sendTextMessage(senderFbId, text) { var json = { recipient: { id: senderFbId }, message: { text: text } }; var body = JSON.stringify(json); var path = "/v2.6/me/messages?access_token=" + facebookPageToken; var options = { host: "graph.facebook.com", path: path, method: "POST", headers: { "Content-Type": "application/json" } }; var callback = function(response) { var str = ""; response.on("data", function(chunk) { str += chunk; }); response.on("end", function() {}); }; var req = https.request(options, callback); req.on("error", function(e) { console.log("problem with request: " + e); }); req.write(body); req.end(); } function storeInMongoDB(messageEnvelope, callback) { if (cachedDb && cachedDb.serverConfig.isConnected()) { sendToAtlas(cachedDb, messageEnvelope, callback); } else { console.log( => connecting to database ${mongoDbUri} ); MongoClient.connect(mongoDbUri, function(err, db) { assert.equal(null, err); cachedDb = db; sendToAtlas(db, messageEnvelope, callback); }); } } function sendToAtlas(db, message, callback) { db.collection("records").insertOne({ facebook: { messageEnvelope: message } }, function(err, result) { if (err != null) { console.error("an error occurred in sendToAtlas", err); callback(null, JSON.stringify(err)); } else { var message = Inserted a message into Atlas with id: ${result.insertedId} ; console.log(message); callback(null, message); } }); } We are passing the MongoDB Atlas connection string (or URI) and Facebook page token as environment variables so we'll configure them in our Lambda function later on. For now, clone this GitHub repository and open the README file to find the instructions to deploy and complete the configuration of your Lambda function. Save your Lambda function and navigate to your Facebook Page chat window to verify that your function works as expected. Bring up the Messenger window and enter the name of a city of your choice (such as New York , Paris or Mumbai ). Store Message History in MongoDB Atlas AWS Lambda functions are stateless; thus, if you require data persistence with your application you will need to store that data in a database. For our chatbot, we will save message information (text, senderID, recipientID) to MongoDB Atlas (if you look at the code carefully, you will notice that the response with the weather information comes back to the Lambda function and is also stored in MongoDB Atlas). Before writing data to the database, we will first need to connect to MongoDB Atlas . Note that this code is already included in the index.js file. function storeInMongoDB(messageEnvelope, callback) { if (cachedDb && cachedDb.serverConfig.isConnected()) { sendToAtlas(cachedDb, messageEnvelope, callback); } else { console.log(`=> connecting to database ${mongoDbUri}`); MongoClient.connect(mongoDbUri, function(err, db) { assert.equal(null, err); cachedDb = db; sendToAtlas(db, messageEnvelope, callback); }); } } sendToAtlas will write chatbot message information to your MongoDB Atlas cluster. function sendToAtlas(db, message, callback) { db.collection("records").insertOne({ facebook: { messageEnvelope: message } }, function(err, result) { if (err != null) { console.error("an error occurred in sendToAtlas", err); callback(null, JSON.stringify(err)); } else { var message = `Inserted a message into Atlas with id: ${result.insertedId}`; console.log(message); callback(null, message); } }); } Note that the storeInMongoDB and sendToAtlas methods implement MongoDB's recommended performance optimizations for AWS Lambda and MongoDB Atlas , including not closing the database connection so that it can be reused in subsequent calls to the Lambda function. The Lambda input contains the message text, timestamp, senderID and recipientID, all of which will be written to your MongoDB Atlas cluster. Here is a sample document as stored in MongoDB: { "_id": ObjectId("58124a83c976d50001f5faaa"), "facebook": { "message": { "sender": { "id": "1158763944211613" }, "recipient": { "id": "129293977535005" }, "timestamp": 1477593723519, "message": { "mid": "mid.1477593723519:81a0d4ea34", "seq": 420, "text": "San Francisco" } } } } If you'd like to see the documents as they are stored in your MongoDB Atlas database, download MongoDB Compass , connect to your Atlas cluster and visualize the documents in your fbchats collection: Note that we're storing both the message as typed by the user, as well as the response sent back by our Lambda function (which comes back to the Lambda function as noted above). Using MongoDB Atlas with other AWS Services In this blog, we demonstrated how to build a Facebook chatbot, using MongoDB Atlas and AWS Lambda. MongoDB Atlas can also be used as the persistent data store with many other AWS services, such as Elastic Beanstalk and Kinesis. To learn more about developing an application with AWS Elastic Beanstalk and MongoDB Atlas, read Develop & Deploy a Node.js App to AWS Elastic Beanstalk & MongoDB Atlas . To learn how to orchestrate Lambda functions and build serverless workflows, read Integrating MongoDB Atlas, Twilio, and AWS Simple Email Service with AWS Step Functions . For information on developing an application with AWS Kinesis and MongoDB Atlas , read Processing Data Streams with Amazon Kinesis and MongoDB Atlas . To learn how to use your favorite language or framework with MongoDB Atlas, read Using MongoDB Atlas From Your Favorite Language or Framework . About the Author - Raphael Londner Raphael Londner is a Principal Developer Advocate at MongoDB, focused on cloud technologies such as Amazon Web Services, Microsoft Azure and Google Cloud Engine. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space. You can follow him on Twitter at @rlondner .

November 18, 2016

Processing Data Streams with Amazon Kinesis and MongoDB Atlas

This post is part of our Road to re:Invent series . In the weeks leading up to AWS re:Invent in Las Vegas this November, we'll be posting about a number of topics related to running MongoDB in the public cloud. This post provides an introduction to Amazon Kinesis: its architecture, what it provides, and how it's typically used. It goes on to step through how to implement an application where data is ingested by Amazon Kinesis before being processed and then stored in MongoDB Atlas. This is part of a series of posts which examine how to use MongoDB Atlas with a number of complementary technologies and frameworks . Introduction to Amazon Kinesis The role of Amazon Kinesis is to get large volumes of streaming data into AWS where it can then be processed, analyzed, and moved between AWS services. The service is designed to ingest and store terabytes of data every hour, from multiple sources. Kinesis provides high availability, including synchronous replication within an AWS region. It also transparently handles scalability, adding and removing resources as needed. Once the data is inside AWS, it can be processed or analyzed immediately, as well as being stored using other AWS services (such as S3) for later use. By storing the data in MongoDB, it can be used both to drive real-time, operational decisions as well as for deeper analysis. As the number, variety, and velocity of data sources grow, new architectures and technologies are needed. Technologies like Amazon Kinesis and Apache Kafka are focused on ingesting the massive flow of data from multiple fire hoses and then routing it to the systems that need it – optionally filtering, aggregating, and analyzing en-route. _Figure 1: AWS Kinesis Architecture_ Typical data sources include: IoT assets and devices(e.g., sensor readings) On-line purchases from an ecommerce store Log files Video game activity Social media posts Financial market data feeds Rather than leave this data to fester in text files, Kinesis can ingest the data, allowing it to be processed to find patterns, detect exceptions, drive operational actions, and provide aggregations to be displayed through dashboards. There are actually 3 services which make up Amazon Kinesis: Amazon Kinesis Firehose is the simplest way to load massive volumes of streaming data into AWS. The capacity of your Firehose is adjusted automatically to keep pace with the stream throughput. It can optionally compress and encrypt the data before it's stored. Amazon Kinesis Streams are similar to the Firehose service but give you more control, allowing for: Multi-stage processing Custom stream partitioning rules Reliable storage of the stream data until it has been processed. Amazon Kinesis Analytics is the simplest way to process the data once it has been ingested by either Kinesis Firehose or Streams. The user provides SQL queries which are then applied to analyze the data; the results can then be displayed, stored, or sent to another Kinesis stream for further processing. This post focuses on Amazon Kinesis Streams, in particular, implementing a consumer that ingests the data, enriches it, and then stores it in MongoDB. Accessing Kinesis Streams – the Libraries There are multiple ways to read (consume) and write (produce) data with Kinesis Streams: [Amazon Kinesis Streams API](Amazon Kinesis Streams API) Provides two APIs for putting data into an Amazon Kinesis stream: PutRecord and PutRecords Amazon Kinesis Producer Library (KPL) Easy to use and highly configurable Java library that helps you put data into an Amazon Kinesis stream. Amazon Kinesis Producer Library (KPL) presents a simple, asynchronous, high throughput, and reliable interface. Amazon Kinesis Agent The agent continuously monitors a set of files and sends new entries to your Stream or Firehose. Amazon Kinesis Client Library (KCL) A Java library that helps you easily build Amazon Kinesis Applications for reading and processing data from an Amazon Kinesis stream. KCL handles issues such as adapting to changes in stream volume, load-balancing streaming data, coordinating distributed services, providing fault-tolerance, and processing data. Amazon Kinesis Client Library MultiLangDemon The MultiLangDemon is used as a proxy by non-Java applications to use the Kinesis Client Library. Amazon Kinesis Connector Library A library that helps you easily integrate Amazon Kinesis with other AWS services and third-party tools. Amazon Kinesis Storm Spout A library that helps you easily integrate Amazon Kinesis Streams with Apache Storm. The example application in this post use the Kinesis Agent and the Kinesis Client Library MultiLangDemon (with Node.js). Role of MongoDB Atlas MongoDB is a distributed database delivering a flexible schema for rapid application development, rich queries, idiomatic drivers, and built in redundancy and scale-out. This makes it the go-to database for anyone looking to build modern applications. MongoDB Atlas is a hosted database service for MongoDB. It provides all of the features of MongoDB, without the operational heavy lifting required for any new application. MongoDB Atlas is available on demand through a pay-as-you-go model and billed on an hourly basis, letting you focus on what you do best. It’s easy to get started – use a simple GUI to select the instance size, region, and features you need. MongoDB Atlas provides: Security features to protect access to your data Built in replication for always-on availability, tolerating complete data center failure Backups and point in time recovery to protect against data corruption Fine-grained monitoring to let you know when to scale. Additional instances can be provisioned with the push of a button Automated patching and one-click upgrades for new major versions of the database, enabling you to take advantage of the latest and greatest MongoDB features A choice of regions and billing options Like Amazon Kinesis, MongoDB Atlas is a natural fit for users looking to simplify their development and operations work, letting them focus on what makes their application unique rather than commodity (albeit essential) plumbing. Also like Kinesis, you only pay for MongoDB Atlas when you're using it with no upfront costs and no charges after you terminate your cluster. Example Application The rest of this post focuses on building a system to process log data. There are 2 sources of log data: A simple client that acts as a Kinesis Streams producer , generating sensor readings and writing them to a stream Amazon Kinesis Agent monitoring a SYSLOG file and sending each log event to a stream In both cases, the data is consumed from the stream using the same consumer , which adds some metadata to each entry and then stores it in MongoDB Atlas. Create Kinesis IAM Policy in AWS From the IAM section of the AWS console use the wizard to create a new policy. The policy should grant permission to perform specific actions on a particular stream (in this case "ClusterDBStream") and the results should look similar to this: { "Version": "2012-10-17", "Statement": [ { "Sid": "Stmt1476360711000", "Effect": "Allow", "Action": [ "kinesis:DescribeStream", "kinesis:GetShardIterator", "kinesis:GetRecords", "kinesis:PutRecord", "kinesis:PutRecords", "kinesis:CreateStream" ], "Resource": [ "arn:aws:kinesis:eu-west-1:658153047537:stream/ClusterDBStream" ] }, { "Sid": "Stmt1476360824000", "Effect": "Allow", "Action": [ "dynamodb:CreateTable", "dynamodb:DeleteItem", "dynamodb:DescribeTable", "dynamodb:GetItem", "dynamodb:PutItem", "dynamodb:Scan", "dynamodb:UpdateItem" ], "Resource": [ "arn:aws:dynamodb:eu-west-1:658153047537:table/ClusterDBStream" ] }, { "Sid": "Stmt1476360951000", "Effect": "Allow", "Action": [ "cloudwatch:PutMetricData" ], "Resource": [ "*" ] } ] } Next, create a new user and associate it with the new policy. Important : Take a note of the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY . Create MongoDB Atlas Cluster Register with MongoDB Atlas and use the simple GUI to select the instance size, region, and features you need (Figure 2). _Figure 2: Create MongoDB Atlas Cluster_ Create a user with read and write privileges for just the database that will be used for your application, as shown in Figure 3. _Figure 3: Creating an Application user in MongoDB Atlas_ You must also add the IP address of your application server to the IP Whitelist in the MongoDB Atlas security tab (Figure 4). Note that if multiple application servers will be accessing MongoDB Atlas then an IP address range can be specified in CIDR format (IP Address/number of significant bits). _Figure 4: Add App Server IP Address(es) to MongoDB Atlas_ If your application server(s) are running in AWS, then an alternative to IP Whitelisting is to configure a VPC (Virtual Private Cloud) Peering relationship between your MongoDB Atlas group and the VPC containing your AWS resources . This removes the requirement to add and remove IP addresses as AWS reschedules functions, and is especially useful when using highly dynamic services such as AWS Lambda. Click the "Connect" button and make a note of the URI that should be used when connecting to the database (note that you will substitute the user name and password with ones that you've just created). App Part 1 – Kinesis/Atlas Consumer The code and configuration files in Parts 1 & 2 are based on the sample consumer and producer included with the client library for Node.js (MultiLangDaemon) . Install the Node.js client library : git clone https://github.com/awslabs/amazon-kinesis-client-nodejs.git cd amazon-kinesis-client-nodejs npm install Install the MongoDB Node.js Driver : npm install --save mongodb Move to the consumer sample folder: cd samples/basic_sample/consumer/ Create a configuration file ("logging_consumer.properties"), taking care to set the correct stream and application names and AWS region: # The script that abides by the multi-language protocol. This script will # be executed by the MultiLangDaemon, which will communicate with this script # over STDIN and STDOUT according to the multi-language protocol. executableName = node logging_consumer_app.js # The name of an Amazon Kinesis stream to process. streamName = ClusterDBStream # Used by the KCL as the name of this application. Will be used as the name # of an Amazon DynamoDB table which will store the lease and checkpoint # information for workers with this application name applicationName = ClusterDBStream # Users can change the credentials provider the KCL will use to retrieve credentials. # The DefaultAWSCredentialsProviderChain checks several other providers, which is # described here: # http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/DefaultAWSCredentialsProviderChain.html AWSCredentialsProvider = DefaultAWSCredentialsProviderChain # Appended to the user agent of the KCL. Does not impact the functionality of the # KCL in any other way. processingLanguage = nodejs/0.10 # Valid options at TRIM_HORIZON or LATEST. # See http://docs.aws.amazon.com/kinesis/latest/APIReference/API_GetShardIterator.html#API_GetShardIterator_RequestSyntax initialPositionInStream = TRIM_HORIZON # The following properties are also available for configuring the KCL Worker that is created # by the MultiLangDaemon. # The KCL defaults to us-east-1 regionName = eu-west-1 The code for working with MongoDB can be abstracted to a helper file ("db.js"): var MongoClient = require('mongodb').MongoClient; var assert = require('assert'); var logger = require('../../util/logger'); var util = require('util'); function DB() { this.db = "empty"; this.log = logger().getLogger('mongoMange-DB'); } DB.prototype.connect = function(uri, callback) { this.log.info(util.format('About to connect to DB')); if (this.db != "empty") { callback(); this.log.info('Already connected to database.'); } else { var _this = this; MongoClient.connect(uri, function(err, database) { if (err) { _this.log.info(util.format('Error connecting to DB: %s', err.message)); callback(err); } else { _this.db = database; _this.log.info(util.format('Connected to database.')); callback(); } }) } } DB.prototype.close = function(callback) { log.info('Closing database'); this.db.close(); this.log.info('Closed database'); callback(); } DB.prototype.addDocument = function(coll, doc, callback) { var collection = this.db.collection(coll); var _this = this; collection.insertOne(doc, function(err, result) { if (err) { _this.log.info(util.format('Error inserting document: %s', err.message)); callback(err.message); } else { _this.log.info(util.format('Inserted document into %s collection.', coll)); callback(); } }); }; module.exports = DB; Create the application Node.js file ("logging_consumer_app.js"), making sure to replace the database user and host details in "mongodbConnectString" with your own: 'use strict'; var fs = require('fs'); var path = require('path'); var util = require('util'); var kcl = require('../../..'); var logger = require('../../util/logger'); var DB = require('./DB.js') var mongodbConnectString = 'mongodb://kinesis-user:??????@cluster0-shard-00-00-qfovx.mongodb.net:27017,cluster0-shard-00-01-qfovx.mongodb.net:27017,cluster0-shard-00-02-qfovx.mongodb.net:27017/clusterdb?ssl=true&replicaSet=Cluster0-shard-0&authSource=admin' var mongodbCollection = 'logdata' var database = new DB; function recordProcessor() { var log = logger().getLogger('recordProcessor'); var shardId; return { initialize: function(initializeInput, completeCallback) { shardId = initializeInput.shardId; // WARNING – the connection string may contain the password and so consider removing logging for any production system log.info(util.format('About to connect to %s.', mongodbConnectString)); database.connect(mongodbConnectString, function(err) { log.info(util.format('Back from connecting to %s', mongodbConnectString)); if (err) { log.info(util.format('Back from connecting to %s', mongodbConnectString)); } completeCallback(); }) }, processRecords: function(processRecordsInput, completeCallback) { log.info(util.format('In processRecords', mongodbConnectString)); if (!processRecordsInput || !processRecordsInput.records) { completeCallback(); return; } var records = processRecordsInput.records; var record, data, sequenceNumber, partitionKey, objectToStore; for (var i = 0 ; i < records.length ; ++i) { record = records[i]; data = new Buffer(record.data, 'base64').toString(); sequenceNumber = record.sequenceNumber; partitionKey = record.partitionKey; log.info(util.format('ShardID: %s, Record: %s, SeqenceNumber: %s, PartitionKey:%s', shardId, data, sequenceNumber, partitionKey)); objectToStore = {}; try { objectToStore = JSON.parse(data); } catch(err) { // Looks like it wasn't JSON so store the raw string objectToStore.payload = data; } objectToStore.metaData = {}; objectToStore.metaData.mongoLabel = "Added by MongoMange"; objectToStore.metaData.timeAdded = new Date(); database.addDocument(mongodbCollection, objectToStore, function(err) {}) } if (!sequenceNumber) { completeCallback(); return; } // If checkpointing, completeCallback should only be called once checkpoint is complete. processRecordsInput.checkpointer.checkpoint(sequenceNumber, function(err, sequenceNumber) { log.info(util.format('Checkpoint successful. ShardID: %s, SeqenceNumber: %s', shardId, sequenceNumber)); completeCallback(); }); }, shutdown: function(shutdownInput, completeCallback) { // Checkpoint should only be performed when shutdown reason is TERMINATE. if (shutdownInput.reason !== 'TERMINATE') { completeCallback(); return; } // Whenever checkpointing, completeCallback should only be invoked once checkpoint is complete. database.close(function(){ shutdownInput.checkpointer.checkpoint(function(err) { completeCallback(); }); }); } }; } kcl(recordProcessor()).run(); Note that this code adds some metadata to the received object before writing it to MongoDB. At this point, it is also possible to filter objects based on any of their fields. Note also that this Node.js code logs a lot of information to the "application log" file (including the database password!); this is for debugging and would be removed from a real application. The simplest way to have the application use the user credentials (noted when creating the user in AWS IAM) is to export them from the shell where the application will be launched: export AWS_ACCESS_KEY_ID=???????????????????? export AWS_SECRET_ACCESS_KEY=???????????????????????????????????????? Finally, launch the consumer application: ../../../bin/kcl-bootstrap --java /usr/bin/java -e -p ./logging_consumer.properties Check the "application.log" file for any errors. App Part 2 – Kinesis Producer As for the consumer, export the credentials for the user created in AWS IAM: cd amazon-kinesis-client-nodejs/samples/basic_sample/producer export AWS_ACCESS_KEY_ID=???????????????????? export AWS_SECRET_ACCESS_KEY=???????????????????????????????????????? Create the configuration file ("config.js") and ensure that the correct AWS region and stream are specified: 'use strict'; var config = module.exports = { kinesis : { region : 'eu-west-1' }, loggingProducer : { stream : 'ClusterDBStream', shards : 2, waitBetweenDescribeCallsInSeconds : 5 } }; Create the producer code ("logging_producer.js"): 'use strict'; var util = require('util'); var logger = require('../../util/logger'); function loggingProducer(kinesis, config) { var log = logger().getLogger('loggingProducer'); function _createStreamIfNotCreated(callback) { var params = { ShardCount : config.shards, StreamName : config.stream }; kinesis.createStream(params, function(err, data) { if (err) { if (err.code !== 'ResourceInUseException') { callback(err); return; } else { log.info(util.format('%s stream is already created. Re-using it.', config.stream)); } } else { log.info(util.format("%s stream doesn't exist. Created a new stream with that name ..", config.stream)); } // Poll to make sure stream is in ACTIVE state before start pushing data. _waitForStreamToBecomeActive(callback); }); } function _waitForStreamToBecomeActive(callback) { kinesis.describeStream({StreamName : config.stream}, function(err, data) { if (!err) { log.info(util.format('Current status of the stream is %s.', data.StreamDescription.StreamStatus)); if (data.StreamDescription.StreamStatus === 'ACTIVE') { callback(null); } else { setTimeout(function() { _waitForStreamToBecomeActive(callback); }, 1000 * config.waitBetweenDescribeCallsInSeconds); } } }); } function _writeToKinesis() { var currTime = new Date().getMilliseconds(); var sensor = 'sensor-' + Math.floor(Math.random() * 100000); var reading = Math.floor(Math.random() * 1000000); var record = JSON.stringify({ program: "logging_producer", time : currTime, sensor : sensor, reading : reading }); var recordParams = { Data : record, PartitionKey : sensor, StreamName : config.stream }; kinesis.putRecord(recordParams, function(err, data) { if (err) { log.error(err); } else { log.info('Successfully sent data to Kinesis.'); } }); } return { run: function() { _createStreamIfNotCreated(function(err) { if (err) { log.error(util.format('Error creating stream: %s', err)); return; } var count = 0; while (count < 10) { setTimeout(_writeToKinesis(), 1000); count++; } }); } }; } module.exports = loggingProducer; The producer is launched from "logging_producer_app.js": 'use strict'; var AWS = require('aws-sdk'); var config = require('./config'); var producer = require('./logging_producer'); var kinesis = new AWS.Kinesis({region : config.kinesis.region}); producer(kinesis, config.loggingProducer).run(); Run the producer: node logging_producer_app.js Check the consumer and producer "application.log" files for errors. At this point, data should have been written to MongoDB Atlas. Using the connection string provided after clicking the "Connect" button in MongoDB Atlas, connect to the database and confirm that the documents have been added: mongo "mongodb://cluster0-shard-00-00-qfovx.mongodb.net:27017,cluster0-shard-00-01-qfovx.mongodb.net:27017,cluster0-shard-00-02-qfovx.mongodb.net:27017/admin?replicaSet=Cluster0-shard-0" --ssl --username kinesis-user --password ?????? use clusterdb db.logdata.find() { "_id" : ObjectId("5804d1d0aa1f330731204597"), "program" : "logging_producer", "time" : 702, "sensor" : "sensor-81057", "reading" : 639075, "metaData" : { "mongoLabel" : "Added by MongoMange", "timeAdded" : ISODate("2016-10-17T13:27:44.142Z") } } { "_id" : ObjectId("5804d1d0aa1f330731204598"), "program" : "logging_producer", "time" : 695, "sensor" : "sensor-805", "reading" : 981144, "metaData" : { "mongoLabel" : "Added by MongoMange", "timeAdded" : ISODate("2016-10-17T13:27:44.142Z") } } { "_id" : ObjectId("5804d1d0aa1f330731204599"), "program" : "logging_producer", "time" : 699, "sensor" : "sensor-2581", "reading" : 752020, "metaData" : { "mongoLabel" : "Added by MongoMange", "timeAdded" : ISODate("2016-10-17T13:27:44.143Z") } } { "_id" : ObjectId("5804d1d0aa1f33073120459a"), "program" : "logging_producer", "time" : 700, "sensor" : "sensor-56194", "reading" : 455700, "metaData" : { "mongoLabel" : "Added by MongoMange", "timeAdded" : ISODate("2016-10-17T13:27:44.144Z") } } { "_id" : ObjectId("5804d1d0aa1f33073120459b"), "program" : "logging_producer", "time" : 706, "sensor" : "sensor-32956", "reading" : 113233, "metaData" : { "mongoLabel" : "Added by MongoMange", "timeAdded" : ISODate("2016-10-17T13:27:44.144Z") } } { "_id" : ObjectId("5804d1d0aa1f33073120459c"), "program" : "logging_producer", "time" : 707, "sensor" : "sensor-96487", "reading" : 179047, "metaData" : { "mongoLabel" : "Added by MongoMange", "timeAdded" : ISODate("2016-10-17T13:27:44.144Z") } } { "_id" : ObjectId("5804d1d0aa1f33073120459d"), "program" : "logging_producer", "time" : 697, "sensor" : "sensor-37595", "reading" : 935647, "metaData" : { "mongoLabel" : "Added by MongoMange", "timeAdded" : ISODate("2016-10-17T13:27:44.144Z") } } { "_id" : ObjectId("5804d1d15f0fbb074446ad6d"), "program" : "logging_producer", "time" : 704, "sensor" : "sensor-92660", "reading" : 756624, "metaData" : { "mongoLabel" : "Added by MongoMange", "timeAdded" : ISODate("2016-10-17T13:27:45.263Z") } } { "_id" : ObjectId("5804d1d15f0fbb074446ad6e"), "program" : "logging_producer", "time" : 701, "sensor" : "sensor-95222", "reading" : 850749, "metaData" : { "mongoLabel" : "Added by MongoMange", "timeAdded" : ISODate("2016-10-17T13:27:45.263Z") } } { "_id" : ObjectId("5804d1d15f0fbb074446ad6f"), "program" : "logging_producer", "time" : 704, "sensor" : "sensor-1790", "reading" : 271359, "metaData" : { "mongoLabel" : "Added by MongoMange", "timeAdded" : ISODate("2016-10-17T13:27:45.266Z") } } App Part 3 – Capturing Live Logs Using Amazon Kinesis Agent Using the same consumer, the next step is to stream real log data. Fortunately, this doesn't require any additional code as the Kinesis Agent can be used to monitor files and add every new entry to a Kinesis Stream (or Firehose). Install the Kinesis Agent: sudo yum install –y aws-kinesis-agent and edit the configuration file to use the correct AWS region, user credentials, and stream in "/etc/aws-kinesis/agent.json": { "cloudwatch.emitMetrics": true, "kinesis.endpoint": "kinesis.eu-west-1.amazonaws.com", "cloudwatch.endpoint": "monitoring.eu-west-1.amazonaws.com", "awsAccessKeyId": "????????????????????", "awsSecretAccessKey": "????????????????????????????????????????", "flows": [ { "filePattern": "/var/log/messages*", "kinesisStream": "ClusterDBStream", "dataProcessingOptions": [{ "optionName": "LOGTOJSON", "logFormat": "SYSLOG" }] } ] } "/var/log/messages" is a SYSLOG file and so a "dataProcessingOptions" field is included in the configuration to automatically convert each log into a JSON document before writing it to the Kinesis Stream. The agent will not run as root and so the permissions for "/var/log/messages" need to be made more permissive: sudo chmod og+r /var/log/messages The agent can now be started: sudo service aws-kinesis-agent start Monitor the agent's log file to see what it's doing: sudo tail -f /var/log/aws-kinesis-agent/aws-kinesis-agent.log If there aren't enough logs being generated on the machine then extra ones can be injected manually for testing: logger -i This is a test log This will create a log with the "program" field set to your username (in this case, "ec2-user"). Check that the logs get added to MongoDB Atlas: mongo "mongodb://cluster0-shard-00-00-qfovx.mongodb.net:27017,cluster0-shard-00-01-qfovx.mongodb.net:27017,cluster0-shard-00-02-qfovx.mongodb.net:27017/admin?replicaSet=Cluster0-shard-0" --ssl --username kinesis-user --password ?????? use clusterdb db.logdata.findOne({program: "ec2-user"}) { "_id" : ObjectId("5804c9ed5f0fbb074446ad5f"), "timestamp" : "Oct 17 12:53:48", "hostname" : "ip-172-31-40-154", "program" : "ec2-user", "processid" : "6377", "message" : "This is a test log", "metaData" : { "mongoLabel" : "Added by MongoMange", "timeAdded" : ISODate("2016-10-17T12:54:05.456Z") } } Checking the Data with MongoDB Compass To visually navigate through the MongoDB schema and data, download and install MongoDB Compass . Use your MongoDB Atlas credentials to connect Compass to your MongoDB database (the hostname should refer to the primary node in your replica set or a "mongos" process if your MongoDB cluster is sharded). Navigate through the structure of the data in the "clusterdb" database (Figure 5) and view the JSON documents. _Figure 5: Explore Schema Using MongoDB Compass_ Clicking on a value builds a query and then clicking "Apply" filters the results (Figure 6). _Figure 6: View Filtered Documents in MongoDB Compass_ Add Document Validation Rules One of MongoDB’s primary attractions for developers is that it gives them the ability to start application development without first needing to define a formal schema. Operations teams appreciate the fact that they don’t need to perform a time-consuming schema upgrade operation every time the developers need to store a different attribute. This is well suited to the application built in this post as logs from different sources are likely to include different attributes. There are however some attributes that we always expect to be there (e.g., the metadata that the application is adding). For applications reading the documents from this collection to be able to rely on those fields being present, the documents should be validated before they are written to the database. Prior to MongoDB 3.2, those checks had to be implemented in the application but they can now be performed by the database itself. Executing a single command from the "mongo" shell adds the document checks: db.runCommand({ collMod: "logdata", validator: { $and: [ {program: {$type: "string"}}, {"metaData.mongoLabel": {$type: "string"}}, {"metaData.timeAdded": {$type: "date"}} ]}}) The above command adds multiple checks: The "program" field exists and contains a string There's a sub-document called "metadata" containing at least 2 fields: "mongoLabel" which must be a string "timeAdded" which must be a date Test that the rules are correctly applied when attempting to write to the database: db.logdata.insert( { "program" : "dummy_entry", "time" : 666, "sensor" : "sensor-6666", "reading" : 66666, "metaData" : { "mongoLabel" : "Test Data", "timeAdded" : ISODate("2016-10-17T13:27:44.142Z") } }) WriteResult({ "nInserted" : 1 }) db.logdata.insert( { "program" : "dummy_entry", "time" : 666, "sensor" : "sensor-6666", "reading" : 66666, "metaData" : { "mongoLabel" : "Test Data", "timeAdded" : "Just now" } }) WriteResult({ "nInserted" : 0, "writeError" : { "code" : 121, "errmsg" : "Document failed validation" } }) Cleaning Up (IMPORTANT!) Remember that you will continue to be charged for the services even when you're no longer actively using them. If you no longer need to use the services then clean up: From the MongoDB Atlas GUI, select your Cluster, click on the ellipses and select "Terminate". From the AWS management console select the Kinesis service, then Kinesis Streams, and then delete your stream. From the AWS management console select the DynamoDB service, then tables, and then delete your table. Using MongoDB Atlas with Other Frameworks and Services We have detailed walkthroughs for using MongoDB Atlas with several programming languages and frameworks, as well as generic instructions that can be used with others. They can all be found in Using MongoDB Atlas From Your Favorite Language or Framework . MongoDB Atlas, the cloud database service for MongoDB, is the easiest way to deploy and run MongoDB, allowing you to get started in minutes. Click here to learn more. The MongoDB team will be at AWS re:Invent this November in Las Vegas and our CTO Eliot Horowitz will be speaking Thursday (12/1) at 11 am PST. If you’re attending re:Invent, be sure to attend the session & visit us at booth #1344! Learn more about AWS re:Invent

November 2, 2016

5 Blogs to Read Before You Head to AWS re:Invent Next Month

This post is part of our Road to re:Invent series series. In the weeks leading up to AWS re:Invent in Las Vegas this November, we'll be posting about a number of topics related to running MongoDB in the public cloud. ![Road to AWS re:Invent](https://webassets.mongodb.com/_com_assets/cms/AWS_ReInvent-683wqzsi2z.jpg) Before you head to AWS re:Invent next month, we’ve pulled together our most popular blog posts about running MongoDB alongside different AWS solutions. 1. Virtualizing MongoDB on Amazon EC2 and GCE As part of a migration to a cloud hosting environment, David Mytton, Founder and CTO of Server Density, did an investigation into the best ways to deploy MongoDB into two popular platforms, Amazon EC2, and Google Compute Engine. In this two part series, we will review David’s general pros and cons of virtualization along with the challenges and methods of virtualizing MongoDB on EC2 and GCE. Read the post > 2. Maximizing MongoDB Performance on AWS You have many choices to make when running MongoDB on AWS: from instance type and security, to how you configure MongoDB processes and more. In addition, you now have options for tooling and management. In this post we’ll take a look at several recommendations that can help you get the best performance out of AWS. Read the post > 3. Develop & Deploy a Node.js App to AWS Elastic Beanstalk & MongoDB Atlas AWS Elastic Beanstalk is a service offered by Amazon to make it simple for developers to deploy and manage their cloud-based applications. In this post, Andrew Morgan will walk you through how to build and deploy a Node.js app to AWS Elastic Beanstalk using MongoDB Atlas. Read the tutorial > 4. Oxford Nanopore Technologies Powers Real-Time Genetic Analysis Using Docker, MongoDB, and AWS In this post, we take a look at how containerization, the public cloud, and MongoDB is helping a UK-based biotechnology firm track the spread of Ebola. Get the full story > 5. Selecting AWS Storage for MongoDB Deployments: Ephemeral vs. EBS Last but not least, take a look at what we were writing about this time last year as Bryan Reinero explores how to select the right AWS solution for your deployment. Keep reading > Want more? We’ll be blogging about MongoDB and the cloud leading up to re:Invent again this year in our Road to re:Invent series. You can see the posts we’ve already published here . Going to re:Invent? The MongoDB team will be in Las Vegas at re:Invent 11/29 to 12/2. If you’re attending re:Invent, be sure to visit us at booth 2620! MongoDB Atlas, the cloud database service for MongoDB, is the easiest way to deploy and run MongoDB, allowing you to get started in minutes. Click here to learn more . Get the guide for MongoDB on AWS

October 24, 2016

Crossing the Chasm: Looking Back on a Seminal Year of Cloud Technology

This post is part of our Road to re:Invent series . In the weeks leading up to AWS re:Invent in Las Vegas this November, we'll be posting about a number of topics related to running MongoDB in the public cloud. ![Road to AWS re:Invent](https://webassets.mongodb.com/_com_assets/cms/AWS_ReInvent-683wqzsi2z.jpg) On the main stage of Amazon’s AWS re:Invent conference in Las Vegas last year, Capital One’s CIO, Rob Alexander made his way into headlines of tech publications when he explained that, under his leadership, the bank would be reducing the number of data centers from 8 in 2015 to just 3 in 2018. Capital One began using cloud-hosted infrastructure organically, with developers turning to the public cloud for a quick and easy way to provision development environments. The increase in productivity prompted IT leadership to adopt a cloud-first strategy not just for development and test environments, but for some of the bank’s most vital production workloads. What generated headlines just a short year ago, Capital One’s story has now become just one of many examples of large enterprises shifting mission critical deployments to the cloud. In a recent report released by McKinsey & Company, the authors declared “the cloud debate is over—businesses are now moving a material portion of IT workloads to cloud environments.” The report goes on to validate what many industry-watchers (including MongoDB, in our own Cloud Brief this May) have noted: cloud adoption in the enterprise is gaining momentum and is driven primarily by benefits in time to market. According to McKinsey’s survey almost half (48 percent) of large enterprises have migrated an on-premises workload to the public cloud . Based on the conventional model of innovation adoption, this marks the divide between the “early majority” of cloud adopters and “late majority.” This not only means that the cloud computing “chasm” has been crossed, but that we have entered the period where the near term adoption of cloud-centric strategies will play a strong role in an organization’s ability to execute, and as a result, its longevity in the market. ![](https://webassets.mongodb.com/_com_assets/cms/AWS_ReInvent_Adoption_Lifecycle-awjdat7emu.png) Image source: [Technology Adoption Lifecycle](https://upload.wikimedia.org/wikipedia/commons/d/d3/Technology-Adoption-Lifecycle.png) An additional indication that the “chasm” has been bridged comes as more heavily-regulated industries put down oft-cited security concerns and pair public cloud usage with other broad-scale digitization initiatives. As Amazon, Google, and Microsoft (the three “hyperscale” public cloud vendors as McKinsey defines them) continue to invest significantly in securing their services, the most memorable soundbite from Alexander’s keynote continues to ring true: that Capital One can “operate more securely in the public cloud than we can in our own data centers." As the concern over security in the public cloud continues to wane, other barriers to cloud adoption are becoming more apparent. Respondents to McKinsey’s survey and our own Cloud Adoption Survey earlier this year reported concerns of vendor lock-in and of limited access to talent with the skills needed for cloud deployment. With just 4 vendors holding over half of the public cloud market , CIOs are careful to select technologies that have cross-platform compatibility as Amazon, Microsoft, IBM, and Google continue to release application and data services exclusive to their own clouds. This reluctance to outsource certain tasks to the hyperscale vendors is mitigated by a limited talent pool. Developers, DBAs, and architects with experience building and managing internationally-distributed, highly-available, cloud-based deployments are in high demand. In addition, it is becoming more complex for international business to comply with the changing landscape of local data protection laws as legislators try to keep pace with cloud technology. As a result, McKinsey predicts enterprises will increasingly turn to managed cloud offerings to offset these costs. It is unclear whether the keynote at Amazon’s re:Invent conference next month will once again predicate the changing enterprise technology landscape for the coming year. However, we can be certain that the world’s leading companies will be well-represented as the public cloud continues to entrench itself even deeper into enterprise technology. MongoDB Atlas, the cloud database service for MongoDB, is the easiest way to deploy and run MongoDB, allowing you to get started in minutes. Click here to learn more . The MongoDB team will be at AWS re:Invent this November in Las Vegas and our CTO Eliot Horowitz will be speaking Thursday (12/1) afternoon. If you’re attending re:Invent, be sure to attend the session & visit us at booth #2620! Learn more about AWS re:Invent

October 18, 2016