FilmFan — 6-Month Update (Nov 2025)

For people who care what they watch.


Here’s a practical, behind-the-scenes look at everything I’ve added to FilmFan over the last six months—across the app experience, data pipeline, ML features, and cloud/dev-ops. It’s been a busy stretch: faster UI, smarter reviews, sturdier data, leaner infrastructure, and a sharper model…

What FilmFan is: a personalised film-recommender where each user gets their own model. You typically review ~50–100 films (or as many as you like), and a machine learning job re-trains/refines your recs (in < 15 minutes). This isn’t a generic “top charts” feed—it’s tuned to you.

Also see past posts
Previous “under the hood” summary
Product summary

What’s new (Product / UI)

Review Films — “Smart Sort”

The review queue is now smart-sorted to keep the flow fast and most impactful. It blends:

  • Easy wins you’ll likely recognise (newer titles and widely-rated crowd pleasers) so you can move quickly (and discover more).
  • Your recommended picks that add the most learning signal.

Result: less clicking, more signal, better recs sooner.

Feels faster (and is)

I reduced unnecessary server round-trips and made the client snappier:

  • Local pagination, lightweight logging, and UI updates.
  • More polish across auth and UX (email login, Google/Facebook sign-in), cleaner navigation, and better tab-to-tab responsiveness.

Net effect: browsing, searching, and reviewing are noticeably quicker and more pleasant.

Recommendations — fresh in less than 15 minutes

After you review, a background job refreshes your recommendations every 15 minutes, so the list adapts to your latest input.

Search

Cleaner and faster, with multi-facet queries:

  • “What films have Anne Hathaway and Meryl Streep been in together?”
  • “Which film did I see Laura Dern in recently?” (Answer: Marriage Story.)

My Reviews

A dedicated space to: see, filter, search and tweak your past reviews. Also ideal for quick corrections if you mis-remembered a film.

Fun question: Have you seen more “Awesome” films than me?

Friends

Lightweight friends features let you peek at overlaps so that starting your movie night is painless—without an otherwise “let’s find a movie” time-sink.

Save Films

Save = “I’m curious.” Saves are distinct from positive ratings; it means you’re going to check it out and maybe watch it. It’s important to save because recommendations can be constantly updating, so it’s handy to keep track.

Data

Film-Universe enhancement

Defining which titles belong in FilmFan’s “universe” matters for both UX and model quality. Key lessons:

  • Target audience is English-speaking users, so language and region rules matter.
  • Foreign films absolutely belong—when they clear a quality bar (e.g., Parasite).
  • Older films can be noisy or detached unless they’re well-established (e.g., Vertigo).
  • Poorly rated films aren’t always poor training signals (e.g., Spread)—context matters.

These learnings turned into enhanced inclusion rules (vote counts by source, minimum rating by cohort, language/country normalization), enforced in the caching + processing pipeline.

Providers

  • TMDb and OMDb are now first-class, with IMDb-derived metadata in the warehouse.
  • Source quality varies per field; I added selection rules to pick the best value per attribute across all providers.
  • Oscars data is now ingested to enrich awards-related features.
  • Better schema normalisation across titles/years/genres/country/language codes.

Cleaning & consistency

  • Hardened parsing for genres, runtimes, multi-country/multi-language productions.
  • Languages/countries normalised (they drive both universe filters and ML features).
  • Genres clustered to a stable main-genre set to reduce sparsity and noise.

Caching & re-caching

  • Re-cache policy tuned by staleness + change likelihood (popular/new titles refresh more often than deep back-catalogue).
  • Daily backfills to refresh fast-changing signals e.g. ratings for new releases.
  • Titles can enter the universe later as their data improves.

Machine Learning

Pruned “bad” features

  • Removed/down-weighted brittle signals (e.g., prompts that leaked era/genre or popularity proxies rather than taste).
  • Converted some fields into binary presence features when “exists vs. not-exists” mattered more than the raw value.
  • Fixed data-type edge-cases that introduced noise.
  • Sometimes it’s not the feature that’s important but how often it has been in “Good” or “Awesome” films (e.g. Directors/Actors etc). Hence weighted counting features were introduced.
  • Added training & testing harness to evaluate feature changes on real production data before going live.

Gen-AI features (vibe signals)

  • Added prompt pipeline to label “vibes” e.g. films could have: psychological depth, hidden truths, shifting perceptions, realism vs. speculative, modern_realistic_setting, narrative cohesion vs. complexity, etc.
  • These on cost-effective models (e.g., GPT-4o-mini / Claude variants) with guardrails.
  • Outputs stored with rich metadata for reproducibility and analysis.

Model notes

  • Still using XGBoost (One-vs-Rest) classifier with feature-importance checks and periodic audits.
  • The combination of classic film metadata (director/cast/country/year/ratings etc) + stable vibe features generalises well across decades and genres.

Cloud / DevOps

Python upgrade

Standardised on Python 3.11 across app and jobs in Google Cloud.

Artifact Registry + one-command deploy

  • Containers live in Google Artifact Registry.
  • Single build-and-deploy scripts that creates docker images, pushes, and rolls out to App Engine / Cloud Run.

Leaner, lower-cost runtime

  • App Engine F1 with a warm instance for predictable latency.
  • Backend jobs on Cloud Run, scheduled via Cloud Scheduler.
  • More proactive caching and re-caching to keep the data fresh fast.
  • Cost drop from F4 instance and increasing responsive.

Mobile polish (ongoing)

  • Small-screen ergonomics enhancements for reviewing/search; touch-target and keyboard-nav tweaks.
  • Testing and refinement on iOS for new UI pieces.

The experience today

  1. Review ~50–100 films (the smart sort keeps it quick).
  2. Wait < 15 minutes for the recommendations to refresh.
  3. See fresh recs that reflect you!
  4. Nudge the model (re-rate, save, skip) to keep learning.

No generic feed. No infinite-scroll anxiety. Just films you’re likely to love, surfaced faster.

What’s next

  • Explainability you can trust: “Why this film?” tied to feature importances.
  • Sharing with friends: don’t just share recs, share all films you’ve reviewed with low friction.
  • More mobile fit-and-finish.

If you’ve tried FilmFan recently—thank you! If not, kick the tyres at https://filmfan.ai and tell me what feels great vs. what still feels rough. I’m building this for people who really care what they watch 🙂

Another Blog

4 Receive personalised recommendations

For people who care what they watch. Here’s a practical, behind-the-scenes look at everything I’ve added to FilmFan over the last six months—across the app experience, data pipeline, ML features, ...

Designer (3)

Prologue: Film Fan is your very own personalised film recommendation engine. Please check it out here: filmfan.ai. A full walkthrough of the site’s features and inspiration is covered in a ...

Screen Shot 2024-06-19 at 3.37.27 pm

Prologue: Film Fan is your very own personalised film recommendation engine. Please check it out here: filmfan.ai. A full walkthrough of the site’s features and inspiration is covered in a ...

Copyright © 2024 Data Karate all rights reserved.