2025 Christmas Music in Chicago: What the Data Says
The annual Chicago holiday music data project returns for 2025 — with better scraping, more venues, and some genuinely surprising findings.
Year two of the Chicago Christmas music data project. The scraper got smarter, the dataset got bigger, and the findings got weirder.
What Changed This Year
Last year's scraper handled 47 venues with about a 73% success rate (some sites just won't cooperate with automated access). This year:
- 68 venues in the dataset
- 89% scrape success rate — improved by switching from Playwright to a hybrid approach (static pages via httpx, dynamic via Playwright)
- Event categorization using a small local LLM (Ollama + Mistral 7B) to tag genres automatically
- Historical comparison with 2024 data
The 2025 Numbers
- 412 unique holiday music events in December
- Average ticket price up 23% from 2024
- Jazz still dominant but classical programming up 18% year-over-year
- New entrant: K-pop holiday shows at 3 venues — genuinely not on my radar
The LLM Categorization Experiment
This year's big addition: using Mistral 7B locally to classify event descriptions into genre categories. The results were about 84% accurate (I manually verified a sample).
The model struggled with:
- Tribute bands (is "The Beatles Christmas Show" classical? Jazz? Pop? It chose "rock")
- Multi-genre events ("Jazz/Blues/Soul Holiday Spectacular" got classified as whatever came first)
- Deliberately vague descriptions (some venues really don't want to be categorized)
But for the core jazz/blues/classical/rock split, it worked well enough to be useful.
Running It Yourself
The full project repo is public. To run locally:
git clone https://github.com/mfitzgerald2/chicago-music-data
cd chicago-music-data
pip install -r requirements.txt
ollama pull mistral
python scrape.py --year=2025
python analyze.py --output=html
Data updates automatically via GitHub Actions every night in December. The 2025 results are in the /data/2025 directory.
Next Year
I want to add sentiment analysis on reviews and attendance proxies (using social media mentions). And maybe finally get the ticket price tracking reliable — this year's data had too many "$20-40" ranges that I had to handle inconsistently.