GoodData, my final year project, and where it actually stands

GoodData is a web analytics tool. Not in the Google Analytics sense of "here's how many people visited." More in the sense of "here's what they actually did while they were there."

You drop a script tag on your site and it starts picking up scroll depth, rage clicks (someone hitting the same spot three times in under a second because something's broken), hesitation around buttons, text selection, form interaction. All the stuff that tells you a user is confused or annoyed or about to buy something, instead of just "they loaded /pricing at 3:41pm." It's built on three pieces: a tracker SDK that sits on the site, an API that ingests the events, and a dashboard that shows sessions, visitor history, traffic sources, goals, and a live event feed.

The other thing it does, which matters more than it sounds like it should, is throw away anything that looks like raw personal data on the way in. No cookies, no stored IPs, no third party trackers riding along. IP gets used once for geolocation and then gets hashed and rotated daily instead of kept. That part wasn't an afterthought, it's baked into how events get written to the database.

Why this one

I had other ideas on the table. A CRUD app with a nicer UI, a dashboard wrapping someone else's API, a chatbot with a slightly different prompt. Fine projects. Not what I wanted to spend my final year on.

I wanted something where I'd actually hit problems I didn't know the shape of yet. Analytics tooling does that quietly. Ingesting events at real volume without losing or duplicating them, telling a bot from a person without paying a service to ask, fingerprinting a visitor without storing anything that counts as personal data. None of that is hard in the "hard algorithm" sense. It's hard in the "twelve small decisions and any one of them being wrong quietly corrupts your data" sense, which is its own kind of hard and the kind I find more interesting.

So I picked the one where I knew I'd be wrong about things along the way.

Where it stands now

The core is built and it works. The tracker SDK is live, sending batched events with a fallback so you don't lose the last few seconds before someone closes the tab. It handles SPA navigation properly, debounces duplicate page views, and does rage click detection with burst suppression so one frustrated click-fest doesn't register as ten separate events. The API takes the events, hashes and filters what needs filtering, and writes to Postgres. The dashboard shows sessions, a visitor timeline, traffic sources, top pages, live visitor count, and a goals system for marking conversions.

Once the core was working, I went back and audited the whole pipeline like an outside reviewer would, rather than waiting for something to break in front of a user. That turned up a few real issues: a goal-tracking bug that meant conversions weren't being counted correctly, an access control gap on one endpoint, and missing rate limiting on the ingestion API. Fixed all of it in one pass once I went looking properly.

I also sat my own tracker side by side against a more established open one, just to see what I'd missed. Turned out I had more behavioral depth than it did, and it had more of the user-identity and compliance plumbing I haven't built yet. So now I know what the next round of work looks like.

Next up is an ML layer that scores a session for purchase intent based on its behavior. That's planned for after summer break, once the core is fully stable and I've had more real traffic to validate against.

Real tool, real pipeline, and a handful of things I'd rather catch myself than have someone else find for me. Still building.