BigQuery: How to Import Socialgist Datasets with Datastreamer’s Analytics Hub Connector

By Aaron Strat on March 5 2024 on

Note: this article was contributed to our site by the Datastreamer team.

Google BigQuery has become a staple for data analytics in many organizations. However, most teams restrict themselves to internal data within their 4 walls or static external datasets purchased from vendors. 

Powered by Datastreamer pipelines, Socialgist has released shared datasets on Google’s Analytics Hub, which enables a 1-click import of the world’s conversational data into your BigQuery environment. 

These datasets are updated daily with fresh data points (from blogs, message boards, review sites + more collected across the web). This catapults your team’s analytical efficiency in discovering trends or validating business hypotheses with a vast repository of qualitative data.

In This Blog:

  • What is Analytics Hub?
  • The Convenience of Shared Datasets
  • Step by Step Guide: How to import Socialgist datasets into BigQuery
    • SQL Query Templates 
  • Build your own data pipelines into BigQuery (without Analytics Hub)
  • Frequently Asked Questions:
    • Can I export datasets to other analytics tools?
    • What data sources are available?

What is Analytics Hub?

Analytics Hub is a data exchange, run by Google, that enables data assets to be efficiently and securely exchanged across organizations. The advantage is the ability to perform a “1-click” import of datasets as a BigQuery table. Data can then be queried, analyzed, or exported in various formats such as JSON. 

Pipelines Powered by Datastreamer: While Socialgist collects & provides data from across the web, Datastreamer is the middle-layer that structures this data into a BigQuery friendly format and manages the pipeline that delivers this data into your system. 

You can also create a direct pipeline from Socialgist to your BigQuery environment (without Analytics Hub) which gives you full ownership and expanded capabilities for managing data flows (but has additional steps to deploy).

The Convenience of Shared Datasets

Keeping a pulse on the world’s conversations opens the doors to understanding real-world perspectives on niche topics or mainstream trends. Data products, research agencies, and in-house intelligence teams alike can utilize aggregated web data as a gateway to consumer opinions and global discussions. If your business acts on timely insights, static datasets may not provide the fresh or unfiltered data that you need to transform passive observation into active strategy formulation.

Integrating conversational data through Analytics Hub datasets helps organizations:

  • Accelerate data-centric projects by lowering the technical overhead of implementation. Launch pilot programs or robust analytics efforts with a significantly reduced project scope.
  • Democratize data access in the organization with easy sharing of data to departments that may not have engineering personnel to build pipelines/API’s.
  • Streamline external data procurement by having a single point of integration and avoid the cruelty of managing multiple sources of data.
  • Enable deep analysis with familiar tools such as crafting visualizations or alerts with Looker, or exporting datasets to other BigQuery compatible applications (i.e. Tableau).

Datasets Applied for Real-World Analysis:

Our blog, “SQL Recipes for Socialgist’s BigQuery Datasets” gives you copy & paste SQL templates and elaborates on use case examples such as:

Monitoring the rise (or decline?) of the Chinese economy:

Amidst government efforts and assurances to rescue the economy, there’s an international belief that the Chinese economy is on a downward trajectory. See how a data team can query Socialgist’s Chinese Forums to tap into the local perspectives of the Chinese public and inform their market investment strategy. 

Read →

Step by Step: How to import Socialgist datasets into BigQuery

You will need a Google Cloud account: 

Analytics Hub lives on the Google Cloud console, and requires a Google Cloud account to view and import datasets. This guide is intended for BigQuery users, but alternative access methods are listed below: 

Alternatively, you can access Socialgist data with these methods:

  • Run a free query on Socialgist sample data via Datastreamer’s visual builder
  • Build your own pipeline from Socialgist to BigQuery (or any other internal system)

Step One: Log into BigQuery Cloud Console

Log In: First, go to the BigQuery Google Cloud Console (https://console.cloud.google.com/bigquery) and sign in with your Google Cloud account.

Select or Create a Project: Once logged in, you may need to select an existing project from the project drop-down list at the top of the page, or create a new project if you haven’t already.

Navigate to Analytics Hub: Inside the BigQuery console, look for the “Analytics Hub” option. It may be located in the left-hand side menu or you might have to open the “Resources” panel to find it. 

Enable API: You may see a screen for the “Analytics Hub API”. Click the “Enable” button and wait a couple minutes.

Step Two: Search Listings

Search Listings: Click on “Search Listings” to explore available listings. You will see a dashboard with various datasets and data exchanges available for subscription. 

Search for Socialgist: You can browse through the listings or use the search functionality to find the Socialgist datasets that interest you.

Step Three: View Listing Details and Request Access

View Listing Details: When you find a listing that interests you, click on it to view more details about the dataset, including descriptions and terms of use.

Request Access: For now, you will have to request access from Datastreamer through a form using your Google account. After approval, you will see the option to subscribe to the dataset on the listing page.

Step Four: Add Dataset to Project

Add dataset to project: After approval, you will see a button to add the dataset to your project on the listing details page. The linked dataset and table will then appear on your Explorer in the BigQuery Studio.

Step Five: Run Your SQL Queries

SQL Query Templates: Read this blog to copy and paste queries to interact with the datasets immediately.

Build Your Own Pipeline (Socialgist → BigQuery)

Shared Dataset vs. Pipeline

The major advantage of a shared dataset is the convenience and rapidity of importing data into BigQuery, requiring just a few clicks to facilitate instant analytics. A pipeline is a bridge that transfers data directly from Socialgist into your own internal systems. This solution offers significantly expanded functionality, such as field enrichment and data flow management, but requires additional steps to deploy.

Datastreamer holds ownership of shared datasets, which you are granted permission to view and analyze. With your own pipeline, your organization owns the data flow and you have full control over the data streams (i.e. update frequency, advanced filtering, enriching metadata fields).

Self Assessment: Should I build a pipeline instead of accessing a shared dataset?

If any of the following criteria resonate with your requirements, a pipeline may be a better solution:

  • Faster data updates: Do you need data updated more frequently than every 24 hours for real-time or near real-time analytics?
  • Data enhancement: Do you want to enrich data with Datastreamer’s suite of operations that include sentiment analysis, PII redaction, location inference, and advanced Named Entity Recognition? 
  • Advanced data flow management: If incoming data flows are a core aspect of your product, advanced monitoring/debugging capabilities may be crucial to prevent any pipeline issues.
  • Integration of multiple sources: Are you working with multiple external (i.e. Dark Web) and internal data sources (customer surveys, unstructured text data, S3 buckets)?

Frequently Asked Questions:

What data sources are available?

For an up-to-date list and listing details, we recommend viewing listings directly on Analytics Hub.

As of March 1, 2024, the Socialgist data listings available on Analytics Hub are:

  • Chinese Blogs: Data from 2,000 Chinese blogs collected daily
  • Chinese Review Sites: Data from over 50 Chinese review platforms collected daily.
  • Chinese Message Boards: Data from over 200 Chinese message boards (i.e. baidu.com, hupup.com)
  • Chinese Video Site Content: Data from a wide array of video content collected daily
  • English Blogs: Data from over 200,000 diverse English blogs collected daily
  • English Review Sites: Data from over 250 English review platforms (i.e. Tripadvisor) collected daily
  • English Forum Boards: Data from 3,000+ popular English message boards and forums collected daily
  • English Video Site Content: Data from a wide array of video content collected daily

Can I export BigQuery datasets to other analytics tools?

Yes, you can download datasets in a JSON format or export them to any BigQuery compatible third-party application.

Here are links to documentation on how to analyze BigQuery data in other popular tools:

Important Note: The connectors listed are not maintained by Socialgist or Datastreamer. Instead, these native connectors for structured data from Google / Microsoft between their respective tools.

However, connectors directly from Socialgist to analytics tools (PowerBI, Tableau) are under development. If you are interested, reach out to us, as there may be an early access version that you can deploy.