BLOG

Using GCP and Lighthouse to Build a System for Continuous Website Performance Monitoring

Published

Dec 12, 2021

Updated

Dec 23, 2024

This article is the Day 1 post for the YAMAP Engineer Advent Calendar 2020.

Motivations for Measuring Performance

At YAMAP, we have many screens that display maps. These maps need to show numerous elements, such as landmark pins and hiking trails, and it was evident that some pages felt heavy to load. However, we couldn’t determine exactly how heavy they were. To improve performance, we needed specific metrics. Therefore, I decided to establish measurable indicators and start monitoring.

Performance Metrics

While researching performance metrics, I found that there are two perspectives for observing a website’s user experience.

Synthetic Monitoring

Synthetic monitoring involves periodically measuring performance in the same simulated environment. This approach reduces measurement variability since the environment is controlled. Unlike Real User Monitoring, which is discussed below, this method does not account for variations in user environments.

Real User Monitoring

Real User Monitoring measures actual user experiences. However, because the measurement environment depends on users, the results often have high variability.

Our Chosen Metric: Synthetic Monitoring

Since we aimed for continuous improvement, we decided to use synthetic monitoring, as it minimizes variability caused by environmental factors. Using Lighthouse to measure page experience, we performed periodic monitoring and used its results as our benchmark.

Ideally, it would be best to use both methods. Real User Monitoring provides actual metrics, which can be compared with synthetic monitoring results to fine-tune the synthetic environment settings and better understand user impact.

Structure of lighthouse-observer

Overview

lighthouse-observer is a function. This function is executed periodically as a Cloud Function, and the results are recorded in a spreadsheet. By connecting this spreadsheet to a data portal, we visualize the measurement results as graphs, making trends easier to understand.

Details

Periodic Execution of Cloud Functions

We use Cloud Scheduler and Cloud Pub/Sub to periodically execute Cloud Functions.

The payload for the Cloud Scheduler is configured as follows. When the scheduler triggers, the payload is sent to Cloud Functions via Cloud Pub/Sub:

{
  "spreadsheetId": "${your_spreadsheet_id}",
  "timezone": "Asia/Tokyo",
  "targets": [
    {
      "url": "https://yamap.com",
      "sheetName": "top"
    },
    {
      "url": "https://yamap.com/maps",
      "sheetName": "maps"
    }
  ]
}

By modifying or adding to the scheduler’s payload, we enabled flexible operations. Since Cloud Functions have execution time limits (540s) and memory limits (2GB), we divided measurement targets appropriately and registered them in the schedule.

Visualizing Measurement Results in the Data Portal

The Data Portal is one of Google’s services, allowing you to visualize data in tables or graphs. It can use resources like Google Analytics, MySQL databases, or spreadsheets. Filtering data by date is also simple, thanks to its GUI design. This made it very helpful.

Here’s an example of the visualization. On 11/26, there was a significant drop in the performance score (max=1) 😣.

Insights from Visualizing Performance

We realized once again that quantifying potential problems is essential. The results confirmed our initial impression that pages with maps loaded slowly. It also highlighted areas in the client-side implementation that could be improved.

YAMAP’s frontend team holds weekly team meetings where we share performance summaries and discuss possible actions. Although we haven’t fully utilized these metrics yet, we plan to grow our efforts and involve more people.

Quotes About Performance

"Don't guess. Measure."

This famous quote emphasizes that blindly attempting improvements without understanding the bottleneck is unproductive. Upon further research, we discovered that this saying comes from Robert C. Pike’s Notes on C Programming.

"If the site slows down by xxx seconds..." Series

A 0.1-second delay in site load time decreases sales by 1%, while a 1-second improvement increases sales by 10%. — Web experiments generate insights and promote innovation.
Increasing load time from 1 second to 3 seconds raises the bounce rate by 32%. — Find out how you stack up to new industry benchmarks for mobile page speed.

These insights highlight the significant impact of site speed on business. They’re also useful for persuading decision-makers 😌.

References

Let me know if you'd like further refinements or adjustments!