3 Data Storage Technologies Explained
March 17, 2021 by Melissa Terry
Now a year into the coronavirus pandemic, public servants at all levels of government are facing tougher questions than ever before around the future of service and operations.
The increased use of facts and evidence over hunches and anecdotes during the crisis set leaders at the state and local levels up for post-pandemic success. Challenges remain, however, in effectively using technology tools to mine available data for useful, actionable information into the future. Barriers to full data use — before stakeholders even access it — include data quality, timeliness, and relevance.
To support successful programs for decision-makers, researchers, analysts, and the public, data leaders typically choose data lakes or data warehouses for their program needs. These options essentially offer data storage at-scale, with the goal to break down data silos and provide access to data that would otherwise be stuck in source systems.
For government leaders looking to find a scalable data solution, a third option exists: data platforms built specifically for the public sector. Such vertically dedicated platforms, built with an opinion, are gaining traction among some of the most innovative local, state, and federal agencies in the U.S.
A closer examination of these three options is helpful in determining which one is right for your organization and strategic goals.
Data Warehouses
A data warehouse is a system to collect structured, historical data from siloed data records. Like goods in a warehouse, this is a highly organized and cataloged storage configuration.
This technology is designed to store structured data that have a pre-determined schema, which is directed by the specific use cases for the data. Data warehouses capably support data analysis as well as the use of BI tools, reporting, and building visualizations with aggregated data.
Given their highly structured nature, they are less suitable for easy self-service access for an organization’s non-technical personnel. The advanced skills of a database analyst who understands the written schema of the data is almost certainly needed to be able to work with and derive value from a warehouse structure. Access and rapid time-to-outcome for predetermined analytics are possible, so long as the right experts are on hand to navigate the complex architecture. Tools (such as data marts) and expertise are also needed to extract, load, and transform data before it is ready for reporting. To arrive at actionable insight, the data often travels from the warehouse to a mart to an operational system or reporting tool, all of which come at additional costs.
Data warehouse projects can take months to years to implement. Most data warehouse service providers today use a pay-by-use model to alleviate some of the stress caused by earlier models of demand-based pricing.
Data Lakes
Data lakes are systems in which large volumes of raw data are stored in their unstructured and natural states. Like a lake filled with water, the data in a data lake lack structure and organization. It serves as a “dump” for all operational and transactional data.
Because data lakes store fluid, unstructured, and structured data, they can store large volumes at low cost. The volume and lack of structure can make gathering the data for a specific use a complicated process. The data, for instance, has to be transformed before it can be analyzed or applied to answer a question or solve a problem. Access to a technical team — a team with enough time to contribute — is crucial to reduce the typically high time-to-outcome for insights and analysis.
Data lakes should be accessed only by highly technical users to avoid the risk of significant errors that can impact data governance, security, and ETL processes. Without quality enforcement, there can often be challenges with data flow, which, in turn, makes data more error-prone, increases troubleshooting time, and increases time-to-outcome for use. Technical users may still need additional processes and software to access, troubleshoot, contextualize, and visualize the data, which inflate budgets.
Data Platforms
Enterprise data platforms that are purpose-built for the public sector support government decision-makers who want to lead with data. Unlike data warehouses and data lakes, data platforms streamline the processes of storing, transforming, and sharing data. This creates a low time-to-outcome for government-specific analysis and broadens the audience of that data from a few skilled analysts to a range of stakeholders.
Data platforms can help agencies and departments take control of their data in a few key ways. Specifically, data platforms:
- Facilitate self-service access and discovery of data
- Democratize data for a range of technical abilities
- Enable the infinite reuse via APIs
- Share contextualized data internally or with the public
These functionalities help governments at all levels apply data to a range of initiatives such as increasing equity in public safety, improving service delivery, and reporting on COVID-19 recovery.
Data platforms developed for government inherently solve for the complexities of data-sharing and strict privacy requirements. In addition to the standard FedRAMP certification, platforms such as Socrata Connected Government Cloud are intelligently designed to provide the nuances of centralized administration and granular permissions to data access and controls. At the same time, such platforms maintain a simplistic UI that makes a data query as easy as a Google search, reducing the barriers to working with data and putting it into meaningful action faster.
Determining Your Needs
These fundamental questions can help guide which approach makes the most sense for your data program:
- How important is data to your organization?
- How confident are you in your ability to make data-driven decisions?
- How many employees manage and analyze data on your team?
- How do those employees share insights across the organization?
- Would your decisions have more impact if you had access to real-time data and collaborative insights?
Data is certainly needed to manage crises, but beyond that, it can be a catalyst to increasing functionality within your organization for stronger internal performance and better community outcomes. Enhanced access to data improves every decision, every day, making this an area of strategic priority for government leaders seeking innovation and long-term stability.
Data scientists are part of that effort, but a data platform built for government immediately expands your team of data consumers and experts. Platform technology gives public sector directors, program managers, elected officials, and even front-line employees the capability to tap into data and take action based on new and accurate insights.