GraphQL Observability with Hasura

When you’re trying to diagnose why a particular operation is taking a long time to complete, or just want to better understand what’s going on “behind-the-scenes” in your queries, the ability to debug and analyze metrics from your Hasura instance is a critical factor.

In this post we’ll take a look at how to do this by:

  • Walking through the tools we have available
  • Showcasing how to use these tools, with follow-along images and guided descriptions
  • Discussing “gotchas” and useful tips as we go, so that you’re better equipped to avoid pitfalls and common issues

Outline

This diagram highlights the boundaries we’ll be covering — the ones separating the network from the application.

Network Boundary

One straightforward way of doing this is with the inbuilt browser devtools.
To understand what the request timeline looks like, you can emulate a query made from your client to Hasura by using fetch() or GraphiQL.

Using the Network pane, we can look at the timing information for each segment of the request.
What happens inside (or behind) Hasura is not visible to us here, but one fairly important piece of info is -- the time that the request isn't actively being processed:

From the above images (timing info for a query, taken once in Edge and once in Firefox) we can see that the time spent not connected and actively waiting for a response is relatively brief in comparison to the time spent waiting for first-byte.

Using this info, we can likely conclude that there aren’t any network issues happening during these requests. If there were, the timing information would be skewed, with significant portions of it spent in the phases outside of waiting (the connections and handshake phases).

Application Boundary (Business Logic)

Under the Trace Propagation subsection, there is some info outlining how we utilize tracing information to display timings in metrics.

As noted, likely the easiest way to get this most out of this feature if you aren’t trying to implement custom/advanced tracing functionality is simply to forward the request headers from Hasura:

If you are unsure how to implement B3 propagation in your own web service, the simplest thing to do is to read these headers and pass them along to any HTTP services you call which also support B3 propagation, including Hasura itself.

In particular, if an event trigger webhook or action handler propagates these B3 headers back to Hasura, we will be able to trace the entire interaction.

Application Boundary (Database)

  • Hasura’s ANALYZE feature to display the compiled SQL query for an operation
  • Running EXPLAIN ANALYZE on the resulting query to get a Postgres plan for it
  • Using visualizers like https://explain.dalibo.com to help graphically examine the plan
  • The Cloud “Metrics” dashboard’s Operation modal, which provides timing spans for PG operations as part of it's information
  • Using the new Query Tags feature along with a tool like https://pganalyze.com to deeply investigate operations or overall database health/activity

Let’s look at how we can use each of these tools in order to better help us understand how our application is performing.

Database Walkthrough, Pt. 1 (“ANALYZE” feature)

Using Hasura’s ANALYZE feature to display the compiled SQL query for an operation

One useful feature Hasura has is the ability to provide you with the compiled SQL query and a non-executed (theoretical/hypothesized) query plan for a GraphQL operation.

There are multiple ways to get this information. The compiled query is available on the Metrics dashboard, as part of operation info on several of the sub-pages, but another way we can get it is by simply pressing the ANALYZE button on the GraphiQL dashboard:

This will open a modal that contains:

  • The SQL Query itself
  • An execution plan (the query is not actually run to produce this)

If we copy the compiled query, we can use this to produce a report via EXPLAIN ANALYZE that can be very helpful when searching for performance-related issues or checking for proper use of indices, etc.

You can copy the query by pressing the COPY icon in the bottom right of the execution plan -- marked here by the red area circled in the image below:

If we paste the output of the compiled query into the query-editor on the DATA page, prepended with:

EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON)

And run it, a report will be produced:

We can then copy that plan, along with the original generated SQL query, and input it at https://explain.dalibo.com to get an incredibly helpful visualization of it:

From the results, one thing we notice immediately is that the grand majority of the operations processing time appears to be spent on a Seq Scan for allowed books.

The reason behind this (in this demonstration/example) is almost certainly due to the row count being too low to utilize any indices (unless we run set enable_seqscan=false on our DB to force this). It is easy to see how this kind of visualization can be tremendously helpful in real-world situations.

Database Walkthrough, Pt. 2 (Cloud Metrics dashboard)

That information can be found by navigating to an individual operation’s details modal.

To get there, first decide on an operation you want to examine. Let’s say it’s something like this:

query QueryForExplaining { ... }

Assuming this query has been run sometime relatively recently (if it hasn’t you can run it from the GraphiQL tab), navigate to the “Metrics” tab, and then select the “Operations” panel from the left-hand menu:

Then, find the operation you’re looking to inspect, and click on the “Magnifying Lens” icon under the Actions column. This icon is circled in red in the image above.

This will open a modal containing detailed information about that particular operation.

One of the pieces of info available to us here is the waterfall-style request timeline. Using this, we can see where the operation spent each portion of it’s time, relative to the total processing time:

In the image above, circled in red is the TIMING portion of the modal. Here the breakdown of request timing is available.

You can click on segments of the span to zoom in or out, or hover your mouse over them to view a popup containing the precise millisecond timing and relative percentage of the total that span counted for (pictured).

This plot shows its usefulness particularly when:

  • A) Trying to get a brief overview of where an operation is spending it’s time, so you can decide which part ought to be investigated
  • B) Visualizing the execution of queries that contain one or more calls to Actions/Remote Schemas/Remote Joins

In the case of B, the reasoning for that may be more apparent if visualized.

Here’s what the trace looks like for an operation that calls out to a Remote Schema and joins on the return value:

metrics-dashboard-3

The most recent feature added to the Hasura observability/debugging arsenal is a feature called “Query Tags”.

Query Tags are SQL comments which are made up of key=value pairs that are appended to generated SQL statements.

When the following query is sent to Hasura

query GetChild { child { name } }

The SQL generated is something like:

SELECT * FROM child;

But when we are looking at Postgres logs and trying to correlate the generated SQL statement with the actual GraphQL request, it quickly becomes difficult.
Had we had more information such as operation_id/tracing info etc, along with the SQL statement, it would make the mapping easier.

The way we can provide such information alongside the generated SQL statement is by appending a comment. And this is what the Query Tags are.

With Query Tags, whenever we issue a Query or Mutation — The generated SQL has some extra information.

In the case of above example, with the query tags the generated SQL would look like:

SELECT * FROM child /* request_id=487c2ed5-08a4-429a-b0e0-4666a82e3cc6, field_name=child, operation_name=GetChild */

The metadata present in these comments can be used by some tools to analyze DB load and track/monitor performance.

One such tool is https://pganalyze.io

In the guide below, we’ll walk through how to integrate pganalyze with Hasura using the new Query Tags feature and show the process of tracking an actual query operation.

Note: The below process assumes you are using a Hasura Cloud app connected to a Heroku database. If this isn’t the case, you will need to select one of the other provider instructions from the pganalyze dashboard to suit your scenario.

  1. Create an account at pganalyze and log in to the dashboard (the 14 day free trial is sufficient for this purpose).
  2. From the “Getting Started” page, select the Heroku Postgres environment:

Congratulations! You’ve now integrated Hasura with pganalyze.

As a concluding task, let’s make a query from our Cloud app, and then look it up in pganalyze to view more information about it.

Here’s a nonsensically named query for a sample table that I’ve created in my Cloud app:

query asdadf { sample_table(where: { id: { _eq: 1} }) { id text created_at updated_at } }

If we run this query from GraphiQL on our Cloud app console, it will eventually become viewable in the pganalyze dashboard.

Note: This process can take a significant amount of time. IE, 10–20 minutes. Don’t be alamred if nothing shows immediately.

Here on the dashboard, all of the operations are visible:

Selecting that operation, we can see the details of the query, as well as a lot of other information:

To quickly locate operations, you can use the Request ID that can be found from the Metrics dashboard on Cloud.

Here is an example workflow of back-tracking from a known operation in the Metrics dashboard, to it’s analytics inside of pganalyze:

  • Grab the Request ID from Operations tab on Metrics dashboard:
  • Search pganalyze query log by the Request ID
  • Open the matching query to view it’s analytics and other relevant data

Let us know if have any questions while trying these out! You can hit us up on github discussions / discord if you need any help!

Originally published at https://hasura.io on July 29, 2021.

⚡️ Instant realtime GraphQL APIs! Connect Hasura to your database & data sources (GraphQL, REST & 3rd party API) and get a unified data access layer instantly.