MongoDB supports time series data for the full data lifecycle, including ingestion, storage, querying, analysis, visualization, and online archiving or automated expiration as data ages. In MongoDB 6.0, extensive support for time series data will be available to all MongoDB customers, not just MongoDB Atlas customers who have opted to participate in the Rapid Release program.
Time series data is a sequence of measurements over a period of time with common metadata. Managing time series workloads is required across a variety of industries. Examples include sensor readings for manufacturing; vehicle-tracking device logs for transportation and logistics; data from consumer IoT devices, such as smart watches; customer interaction data in e-commerce; and financial transactions data for the securities and cryptocurrency industries.
AWS launches new time series database
Time series workloads across numerous industries have rapidly increased. As a result, MongoDB has significantly invested in advancing our capabilities in this space and empowering developers to build best-in-class applications using time series data on MongoDB.
Visualization of time series data in MongoDB Charts: Many organizations working with time series data want to analyze it to diagnose issues and predict trends that affect their business. With time series data in MongoDB Charts, customers got instant visibility into trends in their time series data.
Secondary indexing on metadata: MongoDB added support for creating secondary indexes on time series metadata. While the default time series indexes already supported queries on time and metadata as a whole, this new capability allowed users to create secondary indexes on specific metadata subfields required for more efficient query execution.
Atlas Online Archive support in Preview: One common challenge with time series is that the rapid proliferation of data can lead to rising storage costs. With Atlas Online Archive support for time series collections, users can define archiving policies to automatically move aged data out of the database and into lower-cost, fully managed cloud object storage. Users simply define their own archiving policy and Atlas handles all the data movement for them. This allows users to retain all of their time series data for analysis or compliance purposes while also lowering costs.
Broader platform support for time series: MongoDB released broader platform support for time series data, including the ability to create time series collections directly from Atlas Data Explorer, MongoDB Compass, or the MongoDB for VS Code Extension.
Columnar compression: With the addition of columnar compression for time series, organizations are able to dramatically reduce their storage footprint. This new capability means that time series data in BSON is significantly compressed in time series buckets before undergoing even further compression for storage on disk. Columnar compression leads to a 10 to 30x improvement in WiredTiger cache usage; it also significantly improves query performance by reducing disk I/O since more data can fit in memory.
New aggregation operators: The release of MongoDB 5.2 included new operators and enhancements to improve aggregation performance and reduce the number of aggregation stages needed to unlock insights. These new operators are valuable for analyzing a variety of types of data, including time series:
Densification and gap-filling: It is common for time series data to be uneven (for example, when a sensor goes offline and several readings are missed). In order to perform analytics and ensure the correctness of results, however, the data being analyzed needs to be continuous. Densification and gap-filling allow users to better handle missing time series data by creating additional data points to compensate for missing values.
Secondary indexes on measurements: MongoDB customers will be able to create a secondary or compound index on any field in a time series collection. This enables geo-indexing (for example, tracking changes over time on a fleet of vehicles or equipment). These new index types also provide improved read performance.
Read performance improvements for sort operations: MongoDB 6.0 comes with optimizations to last point queries, which let the user find the most recent entry in a time series data collection. The query executes a distinct scan looking only for the last point rather than executing a scan of the full collection. Also included in this release will be a feature that enables users to leverage a clustered index or secondary index to perform sort operations on time and metadata fields in most scenarios in a more optimized way.
Our work on time series will not stop with MongoDB 6.0. We will continue to empower developers to build best-in-class applications using time series data on MongoDB. In future releases, expect to hear about cluster-to-cluster replication for time series data, features to enhance scalability for time series data on MongoDB, and much more.
A big part of this explosion in data being created is time series data. This data is being generated by everything from servers inside data centers to IoT devices in homes or factories. Businesses are using this data to make faster and more efficient decisions to drive growth. As a result, the IoT industry is projected by McKinsey to grow to between $5.5 trillion and $12.6 trillion in value globally by 2030.
The goal of tracking and storing time series data is to enable faster corrective action if anything goes wrong and to increase efficiency by using past data to inform future decisions. This could be anything from monitoring your applications to taking automated actions based on IoT sensor data.
The first thing you need to do is get your time series data from your devices into InfluxDB. This data could be generated by IoT devices and sensors out in the field or your servers that are already hosted in an AWS data center. One of the easiest ways to ingest data is by using Telegraf. Telegraf is an open source server agent that has plugins for collecting data from over 250 different data sources and network protocols. Data can then be sent to 50 different output options for storage or processing.
Before you put your time series data into long-term storage you might want to enrich it by bringing in data from other sources or quickly process it for real-time insights. If you are using Telegraf, you can do this by taking advantage of some of the community built processing plugins or even make your own custom processor to transform or enrich your data. Some commonly used processors include the regex processor for transforming tag and field values. The Execd processor plugin which allows you to run any external program to transform your data, like a Python script for example.
There are several time series databases available, but in this article we will focus on InfluxDB. InfluxDB is a time series database that can be deployed on your own hardware using the open source version, or you can choose to use the managed cloud version available through the AWS Marketplace.
The main advantages of InfluxDB are that it comes with an entire ecosystem of tools to help you work with your time series data and that it has a purpose-built query language called Flux which was designed to make typical time series queries easy to write. InfluxDB also provides built-in visualization tools and easy integration with other popular tools for visualizing data like Grafana. The InfluxDB community has developed a number of pre-built templates that you can install with a single click to start monitoring your data.
Now that your data is safely stored in the database of your choice, you can get to the fun part of actually using that data. AWS provides a number of services for working with time series data. You can use your preferred business intelligence tool like PowerBI or Tableau to query your data to find valuable insights.
If you want to access your time series data from inside your applications, you can set up an AWS API gateway and safely query your database from a web or mobile app. InfluxDB provides client libraries so you can read and write data in the programming language of your choice. Need more flexibility? InfluxDB also provides a direct HTTP API. For predictions and forecasting you can use Amazon Sagemaker to create machine learning models from your data. If you want to use those models on the edge with your IoT devices, you can optimize them using Sagemaker Neo to reduce the runtime footprint of your models by up to 9/10ths and increase performance by 25x. Combining Sagemaker Neo with AWS IoT Greengrass makes running machine learning applications on the edge simple and efficient.
Hopefully this article helped you learn a little bit about time series data and how to get started using AWS to work with time series data in particular. Time series from a technical perspective is still in the early stages of growth, and there are tons of opportunities available. 2ff7e9595c
Comments