Analyze Big Data in the cloud with BigQuery. Run fast, SQL-like queries against multi-terabyte datasets in seconds. Scalable and easy to use, BigQuery gives you real-time insights about your data.
Large Scale Data Analytics
BigQuery is Google’s fully managed, NoOps, low cost analytics database. With BigQuery you have no infrastructure to manage and don’t need a database administrator, use familiar SQL and can take advantage of pay-as-you-go model. This collection of features allows you to focus on analyzing data to find meaningful insights. BigQuery is a powerful Big Data analytics platform used by all types of organizations, from startups to Fortune 500 companies.
Speed & Performance
Load your data from Google Cloud Storage or Google Cloud Datastore, or stream it into BigQuery to enable real-time analysis of your data. With BigQuery you can easily deploy Petabyte-scale Databases.
BigQuery separates concepts of Big Data storage and compute, allowing you to scale and pay for each independently. In addition, the first terabyte (1 TB) of data processed each month is free. Please consult the pricing page for more information.
Security & Reliability
BigQuery is built with a replicated storage strategy. You can protect your data with strong role-based ACLs that you configure and control.
Run your applications on a fully-managed Platform-as-a-Service (PaaS) using built-in services that make you more productive. Just download the SDK and start building immediately.
Managed & Unified
Dataflow is a unified programming model and a managed service for developing and executing a wide range of data processing patterns including ETL, batch computation, and continuous computation. Cloud Dataflow frees you from operational tasks like resource management and performance optimization.
The managed service transparently handles resource lifetime and can dynamically provision resources to minimize latency while maintaining high utilization efficiency. Dataflow resources are allocated on-demand providing you with nearly limitless resource capacity to solve your big data processing challenges.
Unified Programming Model
Dataflow provides programming primitives such as powerful windowing and correctness controls that can be applied across both batch and stream based data sources. Dataflow effectively eliminates programming model switching cost between batch and continuous stream processing by enabling developers to express computational requirements regardless of data source.
Integrated & Open Source
Built upon services like Google Compute Engine, Dataflow is an operationally familiar compute environment that seamlessly integrates with Cloud Storage, Cloud Pub/Sub, Cloud Datastore, Cloud Bigtable, and BigQuery. The open source Java-based Cloud Dataflow SDK enables developers to implement custom extensions and to extend Dataflow to alternate service environments.
Managed Spark and Managed Hadoop from the inventors of MapReduce: fast, easy to use, and low cost.
Managed Hadoop & Spark
Use Google Cloud Dataproc, a managed Hadoop MapReduce, Spark, Pig, and Hive service, to easily process big datasets at low cost. Control your costs by quickly creating managed clusters of any size and turning them off when you’re done. Cloud Dataproc integrates across Google Cloud Platform products, giving you a powerful and complete data processing platform.
Fast & Scalable Data Processing
Create Cloud Dataproc clusters quickly and resize them at any time—from three to hundreds of nodes—so you don’t have to worry about your data pipelines outgrowing your clusters. With each cluster action taking less than 90 seconds, you have more time to focus on insights, with less time lost to infrastructure.
Adopting Google Cloud Platform pricing principles, Cloud Dataproc has a low cost and an easy to understand price structure, based on actual use, measured by the minute. Also, Cloud Dataproc clusters can include lower-cost preemptible instances, giving you powerful clusters at an even lower total cost.
Open Source Ecosystem
The Spark and Hadoop ecosystem provides tools, libraries, and documentation that you can leverage with Cloud Dataproc. By offering frequently updated and native versions of Spark, Hadoop, Pig, and Hive, you can get started without needing to learn new tools or APIs, and you can move existing projects or ETL pipelines without redevelopment.
Have You Considered?
Cloud Platform can deliver even more scale, efficiency, and simplicity for key data processing and analysis scenarios. If you use Hive on Hadoop (or SparkSQL) you might consider Google BigQuery, an on-demand SQL analytics service with amazing performance. If you program data transformation pipelines with Spark or MapReduce, you may want to consider Google Cloud Dataflow, a fully-managed service that eliminates the busy work required by other tools and executes a wide range of data processing patterns, including ETL, batch, and streaming computation.
An easy to use interactive tool for large-scale data exploration, analysis, and visualization.
Powerful Data Exploration
Cloud Datalab is a powerful interactive tool created to explore, analyze and visualize data with a single click on Google Cloud Platform. It runs on Google App Engine and orchestrates multiple services automatically so you can focus on exploring your data.
Integrated & Open Source
Whether you’re analyzing megabytes or gigabytes, Cloud Datalab has you covered. Once you are satisfied with your transformation and analysis models, deploy them to BigQuery with the click of a button.
Data Management & Visualization
Use Cloud Datalab to gain insight from your data. Interactively explore, transform, analyze, and visualize your data using BigQuery, Cloud Storage and Python.
Connect your services with reliable, many-to-many, asynchronous messaging hosted on Google’s infrastructure. Cloud Pub/Sub automatically scales as you need it and provides a foundation for building your own robust, global services.
Scalable Messaging Middleware
Cloud Pub/Sub is a fully-managed real-time messaging service that allows you to send and receive messages between independent applications. You can leverage Cloud Pub/Sub’s flexibility to decouple systems and components hosted on Google Cloud Platform or elsewhere on the Internet. By building on the same technology Google uses, Cloud Pub/Sub is designed to provide “at least once” delivery at low latency with on-demand scalability to 1 million messages per second (and beyond).
Connect Anything to Everything
Use Cloud Pub/Sub to publish and subscribe to data from multiple sources, then use Google Cloud Dataflow to understand your data, all in real time. Use Cloud Pub/Sub to reduce dependencies between components of distributed applications. Cloud Pub/Sub is the same messaging technology used by many of Google’s apps, from Ads to Gmail.
Push and Pull
Cloud Pub/Sub is designed for quick integration with systems hosted on the Google Cloud Platform or elsewhere, whether you need one-to-one, one-to-many, or many-to-many communication, with push or pull delivery.
Cloud Pub/Sub is designed to provide “at least once” delivery by storing copies of messages in multiple zones to ensure that subscribers can receive messages as swiftly as possible. All message data is encrypted and protected on the wire and at rest.
Global and Scalable
Cloud Pub/Sub is fully managed and global by design, automatically taking advantage of dedicated resources in every Google Cloud Platform region to ensure high-availability without degrading latency — even under heavy load. We even back our high availability with a Service Level Agreement.