Data-Driven: How I Built My Website Content Analysis Dashboard

For a long time, I’ve wanted to examine my website content in a more intuitive, data-driven way. I wanted to know the annual creation frequency, popular categories and tags, and the coverage of Chinese and English content. To achieve this goal, I spent my weekend developing a brand new page for my website: the Content Analysis Dashboard.

This article will share the main features, build process, and technical implementation of this page, hoping to provide you with some interesting ideas.

Key Features

This analysis page is a completely client-side rendered single-page application that provides rich interactive charts and data filtering capabilities:

Multi-dimensional Statistics: Displays core metrics such as total content count, language distribution, and average annual publications.
Interactive Charts: Uses Chart.js to generate various charts including annual publication trends, monthly distribution, category and tag proportions.
Publication Heatmap: Visually shows the creative activity for each month over the past few years.
Data Filtering: Supports dynamic filtering by language, content type, year range, and categories, with all charts and data updating in real-time.
Bilingual Content Analysis: Automatically detects and analyzes the correspondence and coverage between Chinese and English content.
Tag Cloud and Content List: Convenient for quickly browsing popular topics and recently published content.

Architecture and Build Process

Considering the volume of website content, directly using Hugo’s jsonify to process all data would make the build process extremely slow and might even fail due to insufficient memory. Therefore, I designed an automated data processing workflow with front-end and back-end separation.

This workflow can be broken down into the following key steps:

Data Extraction: I wrote a Node.js script (scripts/generate-analysis-data.js) that runs during website build. This script uses glob to scan all Markdown files in the content/ directory and uses gray-matter to parse each file’s Front Matter, extracting metadata such as title, date, categories, and tags.
Data Compression: The script consolidates all content metadata into a large JSON file. To optimize loading speed, the build.js script calls the pako library to compress this JSON file into Gzip format (content_analysis_data.json.gz), reducing file size by approximately 70-80%.
Frontend Loading and Rendering: When the analysis page (/analysis) loads, its JavaScript (assets/js/content-analysis-upload.js) asynchronously requests this .gz data file. In the browser, it uses the pako library again to decompress the data, then passes the parsed JSON data to Chart.js to dynamically render various charts.

This workflow cleverly bypasses Hugo’s build limitations, transferring the data processing pressure from build time to the client side, while ensuring good loading performance through data compression.

Conclusion

This content analysis page not only satisfies my curiosity about understanding my own creative content in depth but also serves as an interesting technical practice. Through data visualization, I can more clearly plan future creative directions. I invite you to visit my content analysis page for a hands-on experience, and I hope the implementation process of this project can provide you with some inspiration.

View the analysis report

Data-Driven: How I Built My Website Content Analysis Dashboard

Key Features

Architecture and Build Process

Conclusion

Note-Taking Like Coding: My VS Code + GitHub + Hugo Workflow

How to Add Instant Search to Your Hugo Website