Skip to content

Recent Project

2025

Social Media Data Analysis Platform

A distributed platform for collecting, processing, and analyzing social media data from Mastodon, Reddit, and Bluesky, built on Kubernetes and deployed on the MRC/NeCTAR Research Cloud.

  • Deployed on MRC/NeCTAR Research Cloud using OpenStack CLI to provision the Kubernetes cluster.
  • Designed an event-driven architecture using Fission to manage modular functions triggered by time or tag-based tasks.
  • Leveraged Redis queues to decouple processing stages and enable asynchronous function execution.
  • Performed sentiment analysis with VADER and keyword extraction using YAKE during post-processing.
  • Validated data structure and uniqueness before writing to Elasticsearch to ensure quality and consistency.
  • Exposed a RESTful API endpoint to support filtered queries based on keywords, time range, and platform.
  • Defined a unified index schema (`socialplatform`) to integrate cross-platform data into a single searchable format.
  • Visualized trends using Kibana and Jupyter Notebook to support scenario-driven analysis.
  • Covered multiple use cases, including AFL team popularity, cost-of-living sentiment, and election support trends.
  • Used GitLab for team collaboration, version control, and branch-based development workflow.
KubernetesFissionRedisElasticsearchOpenStackMRC/NectarPythonVADERYAKEGitLabJupyter

MPI-based Parallel Processing of Mastodon Data

A high-performance parallel data processing pipeline using MPI to analyze large-scale Mastodon datasets (over 144GB of .ndjson). Designed to test and compare performance on a distributed environment using varying node and core configurations.

  • Parsed and processed Mastodon post data line-by-line using MPI with Python’s `mpi4py`, distributing tasks across nodes and cores.
  • Implemented modular functions for sentiment analysis and data field extraction, integrated with custom filtering and tagging logic.
  • Configured SLURM scripts to run jobs on a high-performance computing cluster, testing 1n1c, 1n8c, and 2n8c configurations.
  • Measured and compared execution time and CPU utilization across configurations using Linux time and `/proc` metrics.
  • Achieved 22.85× speedup when scaling from 1 core to 2 nodes × 4 cores, with no need for code modification—demonstrating excellent scalability.
  • Handled errors and malformed entries in Mastodon `.ndjson` with robust exception handling and per-line validation.
  • Logged results and statistics to output and error files for post-run performance evaluation and debugging.
PythonMPISLURMHPC ClusterMastodonVADER
2024

DataBricks System

A comprehensive web platform enabling dynamic control and monitoring of the DataBricks installation. The system manages and visualizes 226 LED bricks via synchronized video playback and real-time grid layout.

  • Built as a 7-member Agile team using weekly sprint reviews and Git-based code collaboration.
  • Designed and implemented frontend logic for visualizing brick status in a responsive grid.
  • Converted images, GIFs, and videos to standardized MP4 (854×480) format using FFmpeg automation.
  • Ensured synchronized frame-accurate playback across all bricks to support collective visual effects.
  • Generated unique, traceable filenames for media using show name, artist name, and timestamp.
  • Implemented quick-setting tools to assign videos to 226 bricks efficiently, bypassing drag-and-drop limitations.
  • Secured access and role authentication via AWS Cognito and OTP verification flow.
  • Automated email delivery of access links and codes using AWS SES integration.
  • Managed media assets and metadata with AWS S3 and Prisma-backed relational database.
RemixTypeScriptAWS S3AWS SESAWS CognitoChakra UIPrismaSQLiteFFmpeg

Distributed Shared Whiteboard

A distributed Java-based whiteboard system supporting real-time collaboration among multiple clients. The project focused on server-client synchronization, remote method invocation, and GUI usability.

  • Built a centralized server architecture supporting one manager and multiple clients
  • Managed real-time synchronization and data integrity through a shared server interface
  • Utilized Java RMI to enable communication between remote clients and server components
  • Implemented remote objects for canvas updates, client actions, and server coordination
  • Developed GUI components using Java Swing including ToolBar, DrawPanel, and ChatBox
  • Packaged and deployed client/server as CreateWhiteBoard.jar and JoinWhiteBoard.jar executables
  • Added UI features such as drawing tools, color picker, chat messaging, and user list panel
  • Enabled collaborative whiteboard editing with synchronized multi-user operations
JavaJava RMIJava Swing

Multi-Threaded Dictionary Server

A TCP-based client-server dictionary application that allows concurrent users to query, add, delete, and update word definitions. Built in Java with Swing GUI and JSON-based data storage, it demonstrates multithreaded programming, error handling, and real-time visual interaction.

  • Implemented thread-per-request architecture to handle multiple clients concurrently while maintaining dictionary consistency.
  • Used Java Swing to create intuitive GUI interfaces for both the client (tab-based) and server (monitor panel and dictionary viewer).
  • Established a structured JSON format and used Jackson for serialization/deserialization of dictionary entries.
  • Supported four main operations: Query, Add, Delete, and Update—each initiated with a new TCP connection per request.
  • Ensured synchronization of shared dictionary resource across threads to prevent data conflicts.
  • Implemented robust error handling for invalid requests, missing parameters, and unavailable servers.
  • Client-side application validates user input and provides clear feedback messages for failed operations.
  • Server GUI tracks incoming requests, updates dictionary state in real-time, and logs operational history.
JavaJava SwingTCPJSONJackson
2023

Recording Web Platform (C-LARA)

A full-stack web application designed to streamline the creation and management of recording entries for C-LARA. The project focused on delivering responsive user interfaces and secure, scalable backend services.

  • Initiated development of a React-based frontend to enhance the recording creation experience.
  • Collaborated in a 5-person Agile team (2 frontend, 3 backend), conducting weekly sprints and code reviews.
  • Built reusable and dynamic UI components using React + TypeScript for maintainability and scalability.
  • Integrated Material-UI and designed 5+ custom components to improve user interaction and design consistency.
  • Led secure user authentication implementation via Django REST Framework and JSON Web Tokens (JWT).
  • Utilized Axios to enable seamless front-end and back-end communication with clear interface separation.
  • Adopted a diverse tech stack to deliver scalable features across both frontend and backend layers.
ReactTypeScriptDjangoPostgreSQLAxiosMaterial UIJWT