commit 6271ea7576e3345ddfd4ce81faea87c83661caa6 Author: JL Kruger Date: Wed Apr 1 17:29:47 2026 +0200 First commit. Again. Yo ho. Again diff --git a/README.md b/README.md new file mode 100644 index 0000000..6f760c3 --- /dev/null +++ b/README.md @@ -0,0 +1,119 @@ +# ☠ PLAYLIST PIRATE + +**Break free from the stream. Pay the artist. Own your music.** + +--- + +## What is this? + +Playlist Pirate is a free, open-source tool that takes your Spotify playlist exports and turns them into something actually useful: static HTML pages with embedded YouTube players, and optionally — MP3s on your own machine. + +No subscription. No algorithm. No ads. No data harvest. Just your music, on your terms. + +--- + +## Why does this exist? + +Because Spotify is a bad deal. For everyone. But mostly for artists. + +Here's the math: an independent artist earns somewhere between **$0.003 and $0.005 per stream** on Spotify. That means roughly **250–500 streams to earn $1.00** — and if they're signed to a label, that label takes 80–85% of even that, pushing the number closer to **1,500–2,000 streams per dollar** that reaches the artist. + +Put it this way: **you could hand an indie artist a dollar, download their album, listen to it 400 times, and still have treated them fairer than Spotify does.** + +Meanwhile: +- **Bandcamp** gives artists ~85% of every sale. A $1 purchase puts ~$0.85 directly in the artist's pocket. +- **YouTube** pays less per view than Spotify does per stream (~$0.001–$0.002), but at least you can link people directly to the artist's channel, which builds their audience. +- **Direct support** — buying from the artist's own site, Bandcamp, Patreon, a merch table — is the only model that actually works for working musicians. + +Spotify's 2023 policy change made this worse: they raised the minimum stream threshold for royalty eligibility to **1,000 streams per year**, cutting the bottom tier of artists — the independent, the emerging, the weird and wonderful — out of any payment at all. + +The streaming model concentrates money at the top. It always has. It's by design. + +> **$1 direct to an artist = the same as ~250–500 streams on Spotify for an indie act, or 1,500–2,000+ streams if they're on a label deal.** +> *[Sources: Trichordist Streaming Price Bible; UK DCMS Committee 2021; Spotify's own disclosed rate ranges; RIAA data. Rates vary and shift — always look at current figures.]* + +Playlist Pirate exists to help you walk away from that model. Keep your playlists. Ditch the platform. + +--- + +## How it works + +Four discrete, opt-in steps. Nothing runs automatically. + +### Pipeline (CLI) + +``` +CSV → resolve → search → build → download +``` + +| Step | Command | What it does | +|------|---------|--------------| +| **resolve** | `playlist.py resolve *.csv` | Parses your Spotify CSV exports into `*-playlist.md` tracking files. Captures track title, artists, ISRC, Spotify ID. | +| **search** | `playlist.py search *-playlist.md` | Uses `yt-dlp` to find YouTube URLs for each track. No API key. Resumable — re-run it and it picks up where it left off. ~3–7 seconds per track. | +| **build** | `playlist.py build *-playlist.md --out ` | Hits MusicBrainz (no API key, polite 1 req/sec) to fetch recording data and artist URLs — homepage, Bandcamp, SoundCloud, Patreon, in that priority order. Generates static HTML pages with embedded YouTube players, fire-spectrum colors, and links out to the artist directly. | +| **download** | `playlist.py download *-playlist.md --output ` | Downloads MP3s via `yt-dlp` + `ffmpeg`. Embeds ID3 tags via `mutagen`. Marks tracks done in the `.md` file so you never double-download. | + +### GUI + +There's a browser-based GUI (`gui.py`) wrapping the same pipeline via Flask. It runs locally, opens in your browser, and gives you accordion-style controls for each step, a live log panel, and per-track download selection. There's also a PyInstaller-built single binary (`dist/playlist-gui`) for running without Python. + +### Tracking file format + +``` +# Playlist Name + + +- [ ] Track Title | Artist Name | ISRC:XXXXXX | SP:SpotifyID | https://youtu.be/XXXXX +- [x] Downloaded Track | Artist | ISRC:... | SP:... | https://... +- [-] Not Found | Artist | ISRC:- | SP:- | NOT_FOUND +``` + +`[ ]` = pending · `[x]` = done · `[-]` = not found on YouTube + +--- + +## Dependencies + +``` +pip install yt-dlp rich mutagen +``` + +- **[yt-dlp](https://github.com/yt-dlp/yt-dlp)** — does the heavy lifting. YouTube search and download. No API key required. +- **[rich](https://github.com/Textualize/rich)** — terminal output that doesn't look like 1993. +- **[mutagen](https://github.com/quodlibet/mutagen)** — ID3 tag writing. +- **[MusicBrainz](https://musicbrainz.org/)** — open music encyclopedia. ISRC lookup → artist URLs. Free, no key, rate-limited politely. +- **ffmpeg** — for MP3 conversion (system install, not a Python package). + +### The yt-dlp caveat + +This tool depends on `yt-dlp`. If `yt-dlp` stops working — because YouTube changes its API, or because a court somewhere decides the sky is the ceiling — **this tool breaks in the download and search steps**. The build step (HTML generation) and resolve step (CSV parsing) are unaffected. + +`yt-dlp` is a community-maintained FOSS project. Keep it updated. And if it ever goes away, something else will take its place. It always does. + +--- + +## Ethos + +Everything this tool does uses **publicly available tools and data, accessed respectfully**: + +- YouTube search via `yt-dlp` with polite delays +- MusicBrainz at 1 request per second (their documented rate limit) +- No scraping. No API key abuse. No spoofing. +- No data sent anywhere except to the services being queried +- The generated HTML includes **links back to artists' own sites** — Bandcamp, SoundCloud, Patreon, their homepage. Backlinks improve search engine rankings. A link from a real page to an artist's real site is a small act of support that compounds over time. + +This is not a tool for stealing from artists. It is a tool for **owning your own library** and **finding the artists you love** so you can support them directly. + +--- + +## License + +FOSS. Use it, fork it, improve it. If you make money off it, buy an artist's album. + +--- + +## Find the artists. Pay the artists. Own your music. + +> Bandcamp: [bandcamp.com](https://bandcamp.com) +> MusicBrainz: [musicbrainz.org](https://musicbrainz.org) +> yt-dlp: [github.com/yt-dlp/yt-dlp](https://github.com/yt-dlp/yt-dlp) diff --git a/_playlist-template.html b/_playlist-template.html new file mode 100644 index 0000000..5b3ef0a --- /dev/null +++ b/_playlist-template.html @@ -0,0 +1,577 @@ + + + + + + Eclectica Experimenti | PLAYLISTS + + + + + +

Eclectica Experimenti

+ 92 TRACKS • 6:42:15 +

+ + + + +

+ +

+ + +

+ 01 + Album Art

Midnight City

M83

Hurry Up, We're Dreaming • 2011

+ +

+ [SPOTIFY] + [MUSICBRAINZ] + + [ARTIST] +

+ +

+ + +

+ 02 + Album Art

Nightcall

Kavinsky

OutRun • 2013

+ +

+ [SPOTIFY] + [MUSICBRAINZ] + + [ARTIST] +

+ +

+ + +

+ 03 + Album Art

Genesis

Grimes

Visions • 2012

+ +

+ [SPOTIFY] + [MUSICBRAINZ] + [YOUTUBE] + [ARTIST] +

+ + +

+ 04 + Album Art

Loro

Pinback

Pinback • 1999

+ +

+ [SPOTIFY] + [MUSICBRAINZ] + + [ARTIST] +

+ +

+ + +

+ 05 + Album Art

Small Memory

Jon Hopkins

Insides • 2009

+ +

+ [SPOTIFY] + [MUSICBRAINZ] + + [ARTIST] +

+ +

+ + +

+ 06 + Album Art

Teardrop

Massive Attack

Mezzanine • 1998

+ +

+ [SPOTIFY] + [MUSICBRAINZ] + + [ARTIST] +

+ +

+ + + + + diff --git a/build.py b/build.py new file mode 100644 index 0000000..6a31459 --- /dev/null +++ b/build.py @@ -0,0 +1,840 @@ +#!/usr/bin/env python3 +""" +Playlists build script — Singular Particular Space +spaces.exopraxist.org + +Data sources: + MusicBrainz (no key) — recording link + artist URL (official site, Bandcamp, etc.) + Odesli / song.link — YouTube video ID for embeds (10 req/min without key) + +First run: ~3 hours (cached). Subsequent runs: seconds. +Script resumes from cache if interrupted — safe to run overnight and re-run. + +Usage: + python3 build.py # full build + python3 build.py --hub # regenerate hub only (instant) + python3 build.py --playlist # single playlist test + python3 build.py --force-odesli # re-fetch Odesli data only + python3 build.py --force-mb # re-fetch MusicBrainz data only + +Optional env vars (no keys required — just speeds things up): + ODESLI_API_KEY — higher rate limit from Odesli (email developers@song.link) +""" + +import csv +import json +import os +import re +import sys +import time +import urllib.parse +import urllib.request +from pathlib import Path + +# ─── Config ─────────────────────────────────────────────────────────────────── + +SCRIPT_DIR = Path(__file__).parent +CACHE_FILE = SCRIPT_DIR / "cache.json" + +ODESLI_KEY = os.environ.get("ODESLI_API_KEY", "") +FORCE_ODESLI = "--force-odesli" in sys.argv +FORCE_MB = "--force-mb" in sys.argv +HUB_ONLY = "--hub" in sys.argv + +_playlist_arg = None +if "--playlist" in sys.argv: + i = sys.argv.index("--playlist") + if i + 1 < len(sys.argv): + _playlist_arg = sys.argv[i + 1] + +# ─── Rate limiters ──────────────────────────────────────────────────────────── + +_last_mb_call = 0.0 +_last_odesli_call = 0.0 + +MB_INTERVAL = 1.1 # 1 req/sec free tier +ODESLI_INTERVAL = 8.0 # 10 req/min without key — 8s gives safe margin + +# Sentinel: call failed with rate limit or error — do not cache +FETCH_FAILED = object() + +def _wait(last: float, interval: float) -> float: + elapsed = time.time() - last + if elapsed < interval: + time.sleep(interval - elapsed) + return time.time() + +# ─── HTTP helpers ───────────────────────────────────────────────────────────── + +MB_HEADERS = {"User-Agent": "SingularParticularSpace/1.0 (spaces.exopraxist.org)"} + +def http_get(url: str, headers: dict = None): + """ + Returns parsed JSON dict on success or 404. + Returns FETCH_FAILED sentinel on 429 / 5xx / network error (do not cache). + """ + try: + req = urllib.request.Request(url, headers=headers or {}) + with urllib.request.urlopen(req, timeout=15) as resp: + return json.loads(resp.read().decode("utf-8")) + except urllib.error.HTTPError as e: + if e.code == 404: + return {} # Not found — cache as empty, won't change + if e.code == 429: + print(f" 429 rate limit — backing off 30s", file=sys.stderr) + time.sleep(30) + return FETCH_FAILED + print(f" HTTP {e.code}: {url}", file=sys.stderr) + return FETCH_FAILED + except Exception as e: + print(f" error: {e}", file=sys.stderr) + return FETCH_FAILED + +def mb_get(url: str): + global _last_mb_call + _last_mb_call = _wait(_last_mb_call, MB_INTERVAL) + return http_get(url, MB_HEADERS) + +def odesli_get(url: str): + global _last_odesli_call + _last_odesli_call = _wait(_last_odesli_call, ODESLI_INTERVAL) + return http_get(url) + +# ─── Cache ──────────────────────────────────────────────────────────────────── +# Flat dict with namespaced keys: +# "mb:isrc:{ISRC}" → { mb_recording_url, mb_artist_id } +# "mb:artist:{MB_ID}" → { artist_url, artist_url_type } +# "odesli:{SPOTIFY_ID}" → { youtube_video_id, odesli_page_url } + +def load_cache() -> dict: + if CACHE_FILE.exists(): + try: + return json.loads(CACHE_FILE.read_text("utf-8")) + except Exception: + return {} + return {} + +def save_cache(cache: dict): + CACHE_FILE.write_text(json.dumps(cache, indent=2, ensure_ascii=False), "utf-8") + +# ─── MusicBrainz ────────────────────────────────────────────────────────────── + +def mb_isrc_lookup(isrc: str): + """ISRC → { mb_recording_url, mb_artist_id } or FETCH_FAILED""" + url = f"https://musicbrainz.org/ws/2/isrc/{isrc}?inc=artist-credits&fmt=json" + data = mb_get(url) + if data is FETCH_FAILED: + return FETCH_FAILED + result = {"mb_recording_url": "", "mb_artist_id": ""} + recs = data.get("recordings", []) + if not recs: + return result + rec = recs[0] + rec_id = rec.get("id", "") + if rec_id: + result["mb_recording_url"] = f"https://musicbrainz.org/recording/{rec_id}" + credits = rec.get("artist-credit", []) + for credit in credits: + if isinstance(credit, dict) and "artist" in credit: + result["mb_artist_id"] = credit["artist"].get("id", "") + break + return result + +# Artist URL type priority — ordered best to worst +ARTIST_URL_PRIORITY = [ + "official homepage", + "bandcamp", + "soundcloud", + "patreon", + "linktree", + "youtube", + "myspace", + "instagram", + "twitter", + "facebook", + "last.fm", + "discogs", + "wikidata", + "wikipedia", +] + +def mb_artist_url_lookup(mb_artist_id: str): + """MB artist ID → { artist_url, artist_url_type } or FETCH_FAILED""" + url = f"https://musicbrainz.org/ws/2/artist/{mb_artist_id}?inc=url-rels&fmt=json" + data = mb_get(url) + if data is FETCH_FAILED: + return FETCH_FAILED + result = {"artist_url": "", "artist_url_type": ""} + best_rank = len(ARTIST_URL_PRIORITY) + 1 + for rel in data.get("relations", []): + rel_type = rel.get("type", "").lower() + href = rel.get("url", {}).get("resource", "") + if not href: + continue + for i, ptype in enumerate(ARTIST_URL_PRIORITY): + if ptype in rel_type or ptype in href: + if i < best_rank: + best_rank = i + result["artist_url"] = href + result["artist_url_type"] = rel_type + break + return result + +# ─── Odesli ─────────────────────────────────────────────────────────────────── + +def odesli_lookup(spotify_track_id: str): + """Spotify track ID → { youtube_video_id, odesli_page_url } or FETCH_FAILED""" + spotify_uri = f"spotify:track:{spotify_track_id}" + params = f"url={urllib.parse.quote(spotify_uri)}&platform=spotify&type=song" + if ODESLI_KEY: + params += f"&key={ODESLI_KEY}" + url = f"https://api.song.link/v1-alpha.1/links?{params}" + data = odesli_get(url) + if data is FETCH_FAILED: + return FETCH_FAILED + result = {"youtube_video_id": "", "odesli_page_url": ""} + if not data: + return result + result["odesli_page_url"] = data.get("pageUrl", "") + yt_url = data.get("linksByPlatform", {}).get("youtube", {}).get("url", "") + if yt_url: + result["youtube_video_id"] = extract_youtube_id(yt_url) + return result + +def extract_youtube_id(url: str) -> str: + m = re.search(r"youtu\.be/([A-Za-z0-9_\-]{11})", url) + if m: + return m.group(1) + m = re.search(r"[?&]v=([A-Za-z0-9_\-]{11})", url) + if m: + return m.group(1) + return "" + +# ─── CSV / slug helpers ─────────────────────────────────────────────────────── + +def parse_csv(path: Path) -> list: + with open(path, newline="", encoding="utf-8") as f: + return list(csv.DictReader(f)) + +def make_slug(csv_filename: str) -> str: + name = Path(csv_filename).stem + name = name.replace("_", "-").lower() + name = re.sub(r"[^a-z0-9\-]", "", name) + name = re.sub(r"-{2,}", "-", name) + return name.strip("-") + +def make_display_name(csv_filename: str) -> str: + name = Path(csv_filename).stem.strip("_").replace("_", " ") + return name.title() + +def spotify_track_id(uri: str) -> str: + parts = uri.split(":") + return parts[2] if len(parts) == 3 and parts[1] == "track" else "" + +def ms_to_mmss(ms) -> str: + try: + s = int(ms) // 1000 + return f"{s // 60}:{s % 60:02d}" + except Exception: + return "—" + +def ms_to_hhmmss(ms: int) -> str: + s = ms // 1000 + h, m, s = s // 3600, (s % 3600) // 60, s % 60 + return f"{h}:{m:02d}:{s:02d}" if h else f"{m}:{s:02d}" + +def get_year(date: str) -> str: + return date[:4] if date else "" + +def esc(s: str) -> str: + return (str(s) + .replace("&", "&").replace("<", "<") + .replace(">", ">").replace('"', """)) + +# ─── Fetch pipeline ─────────────────────────────────────────────────────────── + +def fetch_all(playlists: list, cache: dict): + """ + playlists: list of (slug, display_name, tracks, csv_path) + Fills cache in-place. Saves to disk every 50 calls. + """ + # Collect unique ISRCs and track IDs + isrc_map = {} # isrc (upper) → (artist_name, track_name) + trackid_map = {} # spotify_track_id → True + + for slug, display, tracks, _ in playlists: + for t in tracks: + isrc = t.get("ISRC", "").strip().upper() + tid = spotify_track_id(t.get("Track URI", "")) + if isrc and isrc not in isrc_map: + artist = t.get("Artist Name(s)", "").split(",")[0].strip() + title = t.get("Track Name", "").strip() + isrc_map[isrc] = (artist, title) + if tid: + trackid_map[tid] = True + + # ── MusicBrainz ISRC lookups ────────────────────────────────────────────── + mb_key = lambda isrc: f"mb:isrc:{isrc}" + uncached_isrcs = [ + i for i in isrc_map + if FORCE_MB or mb_key(i) not in cache + ] + total = len(uncached_isrcs) + print(f"MusicBrainz: {len(isrc_map)} ISRCs total, {total} to fetch") + + for n, isrc in enumerate(uncached_isrcs, 1): + result = mb_isrc_lookup(isrc) + if result is not FETCH_FAILED: + cache[mb_key(isrc)] = result + if n % 50 == 0: + save_cache(cache) + print(f" MB ISRC {n}/{total}") + save_cache(cache) + + # ── MusicBrainz artist URL lookups ──────────────────────────────────────── + # Collect unique MB artist IDs from ISRC results + artist_ids = set() + for isrc in isrc_map: + mb_data = cache.get(mb_key(isrc), {}) + aid = mb_data.get("mb_artist_id", "") + if aid: + artist_ids.add(aid) + + art_key = lambda aid: f"mb:artist:{aid}" + uncached_artists = [ + a for a in artist_ids + if FORCE_MB or art_key(a) not in cache + ] + total = len(uncached_artists) + print(f"MusicBrainz: {len(artist_ids)} artists total, {total} to fetch") + + for n, aid in enumerate(uncached_artists, 1): + result = mb_artist_url_lookup(aid) + if result is not FETCH_FAILED: + cache[art_key(aid)] = result + if n % 50 == 0: + save_cache(cache) + print(f" MB artist {n}/{total}") + save_cache(cache) + + # ── Odesli track lookups ────────────────────────────────────────────────── + od_key = lambda tid: f"odesli:{tid}" + uncached_tracks = [ + tid for tid in trackid_map + if FORCE_ODESLI or od_key(tid) not in cache + ] + total = len(uncached_tracks) + mins = round(total * ODESLI_INTERVAL / 60) + print(f"Odesli: {len(trackid_map)} tracks total, {total} to fetch (~{mins} min)") + + for n, tid in enumerate(uncached_tracks, 1): + result = odesli_lookup(tid) + if result is not FETCH_FAILED: + cache[od_key(tid)] = result + if n % 20 == 0: + save_cache(cache) + print(f" Odesli {n}/{total}") + save_cache(cache) + print("Fetch complete.") + +# ─── HTML ───────────────────────────────────────────────────────────────────── + +GOOGLE_FONTS = '' + +SHARED_CSS = """ + :root { + --bg-void: #04060b; + --text-warm: #e8d5b8; + --text-muted: #7a6f5e; + --ff-primary: #a855f7; + --ff-bright: #c084fc; + --ff-deep: #6d28d9; + --ff-glow: rgba(168, 85, 247, 0.18); + } + * { box-sizing: border-box; margin: 0; padding: 0; } + body { + background-color: var(--bg-void); + color: var(--text-warm); + font-family: 'Rambla', sans-serif; + line-height: 1.5; + min-height: 100vh; + } +""" + + +def build_hub(playlists: list) -> str: + """playlists: sorted list of {slug, display_name, track_count}""" + total = sum(p["track_count"] for p in playlists) + n = len(playlists) + cards = "\n".join( + f' \n' + f'

{esc(p["display_name"])}

\n' + f'

{p["track_count"]} tracks

\n' + f' ' + for p in playlists + ) + return f""" + + + + + PLAYLISTS | Singular Particular + {GOOGLE_FONTS} + + + + ← Space +

PLAYLISTS

{n} playlists • {total:,} tracks

+{cards} +

+ + + +""" + + +def build_track_card(track: dict, idx: int, cache: dict) -> str: + num = f"{idx:02d}" + name = esc(track.get("Track Name", "")) + artists_raw = track.get("Artist Name(s)", "") + artists = esc(artists_raw) + album = esc(track.get("Album Name", "")) + year = esc(get_year(track.get("Album Release Date", ""))) + duration = ms_to_mmss(track.get("Track Duration (ms)", 0)) + art_url = esc(track.get("Album Image URL", "")) + isrc = track.get("ISRC", "").strip().upper() + tid = spotify_track_id(track.get("Track URI", "")) + + # Pull cached data + mb_data = cache.get(f"mb:isrc:{isrc}", {}) + mb_rec_url = esc(mb_data.get("mb_recording_url", "")) + mb_art_id = mb_data.get("mb_artist_id", "") + art_data = cache.get(f"mb:artist:{mb_art_id}", {}) if mb_art_id else {} + artist_url = esc(art_data.get("artist_url", "")) + od_data = cache.get(f"odesli:{tid}", {}) if tid else {} + yt_id = od_data.get("youtube_video_id", "") + spotify_url = esc(f"https://open.spotify.com/track/{tid}") if tid else "" + + # Spotify link + spotify_link = ( + f'[SPOTIFY]' + if spotify_url else + '[SPOTIFY]' + ) + + # MusicBrainz link + mb_link = ( + f'[MUSICBRAINZ]' + if mb_rec_url else + '[MUSICBRAINZ]' + ) + + # YouTube — embed toggle or search link + yt_search = f"https://www.youtube.com/results?search_query={urllib.parse.quote(artists_raw + ' ' + track.get('Track Name', ''))}" + if yt_id: + yt_link = f'' + embed_html = f""" +

+ +

""" + else: + yt_link = f'[YOUTUBE]' + embed_html = "" + + # Artist link + artist_link = ( + f'[ARTIST]' + if artist_url else + '[ARTIST]' + ) + + return f""" +

+ {num} +

+ {duration} +

{name}

{artists}

{album} • {year}

+ {spotify_link} + {mb_link} + {yt_link} + {artist_link} +

{embed_html} +

""" + + +def build_playlist_page(display: str, slug: str, tracks: list, cache: dict) -> str: + total_ms = sum(int(t.get("Track Duration (ms)", 0) or 0) for t in tracks) + total_time = ms_to_hhmmss(total_ms) + n = len(tracks) + cards = "".join(build_track_card(t, i + 1, cache) for i, t in enumerate(tracks)) + + return f""" + + + + + {esc(display)} | PLAYLISTS + {GOOGLE_FONTS} + + + +

{esc(display)}

+ {n} TRACKS • {total_time} +

+ + +

+ +

+{cards} +

+ + + + + + +""" + +# ─── Main ───────────────────────────────────────────────────────────────────── + +def main(): + csv_files = sorted(SCRIPT_DIR.glob("*.csv")) + if not csv_files: + print("No CSV files found.", file=sys.stderr); sys.exit(1) + + # Parse all CSVs + all_playlists = [] + for csv_path in csv_files: + slug = make_slug(csv_path.name) + display = make_display_name(csv_path.name) + tracks = parse_csv(csv_path) + all_playlists.append((slug, display, tracks, csv_path)) + + playlists_meta = sorted([ + {"slug": slug, "display_name": display, "track_count": len(tracks)} + for slug, display, tracks, _ in all_playlists + ], key=lambda p: p["display_name"].lower()) + + # Single playlist test mode + if _playlist_arg: + match = next( + ((s, d, t, p) for s, d, t, p in all_playlists + if s == _playlist_arg or p.stem == _playlist_arg), + None + ) + if not match: + slugs = ", ".join(s for s, *_ in all_playlists) + print(f"Not found: '{_playlist_arg}'\nAvailable: {slugs}", file=sys.stderr) + sys.exit(1) + slug, display, tracks, _ = match + print(f"Test: '{display}' ({len(tracks)} tracks)") + cache = load_cache() + fetch_all([(slug, display, tracks, None)], cache) + out = SCRIPT_DIR / f"{slug}.html" + out.write_text(build_playlist_page(display, slug, tracks, cache), "utf-8") + print(f"Written → {slug}.html") + return + + # Hub only + if HUB_ONLY: + hub = build_hub(playlists_meta) + (SCRIPT_DIR / "playlists.html").write_text(hub, "utf-8") + print("Hub written → playlists.html") + return + + # Full build + cache = load_cache() + fetch_all(all_playlists, cache) + + (SCRIPT_DIR / "playlists.html").write_text(build_hub(playlists_meta), "utf-8") + print("Hub written → playlists.html") + + for slug, display, tracks, _ in all_playlists: + out = SCRIPT_DIR / f"{slug}.html" + out.write_text(build_playlist_page(display, slug, tracks, cache), "utf-8") + print(f" → {slug}.html ({len(tracks)} tracks)") + + total = sum(p["track_count"] for p in playlists_meta) + print(f"\nDone. {len(playlists_meta)} playlists, {total:,} tracks.") + + +if __name__ == "__main__": + main() diff --git a/playlistpirate/build.sh b/playlistpirate/build.sh new file mode 100644 index 0000000..b61c6ea --- /dev/null +++ b/playlistpirate/build.sh @@ -0,0 +1,43 @@ +#!/usr/bin/env bash +# Build standalone playlist binary with PyInstaller +# Run from the playlist/ directory + +pip install pyinstaller yt-dlp rich mutagen + +pyinstaller \ + --onefile \ + --name playlist \ + --hidden-import yt_dlp \ + --hidden-import mutagen.id3 \ + --hidden-import mutagen.mp3 \ + playlist.py + +echo "" +echo "Binary at: dist/playlist" +echo "Usage:" +echo " ./dist/playlist resolve my-export.csv" +echo " ./dist/playlist search my-export-playlist.md" +echo " ./dist/playlist download my-export-playlist.md" + +# GUI binary (Flask + system browser, no Qt/pywebview dependency) +pip install flask + +pyinstaller \ + --onefile \ + --name playlist-gui \ + --collect-all yt_dlp \ + --hidden-import mutagen.id3 \ + --hidden-import mutagen.mp3 \ + --hidden-import mutagen.easyid3 \ + --hidden-import flask \ + --hidden-import rich \ + --hidden-import rich.console \ + --hidden-import rich.theme \ + --hidden-import tkinter \ + --hidden-import tkinter.filedialog \ + gui.py + +echo "GUI binary at: dist/playlist-gui" +echo "Usage: ./dist/playlist-gui" +echo " Opens in your default browser at http://localhost:" +echo " Ctrl-C to quit." diff --git a/playlistpirate/gui.py b/playlistpirate/gui.py new file mode 100644 index 0000000..7587d54 --- /dev/null +++ b/playlistpirate/gui.py @@ -0,0 +1,1019 @@ +import os +import sys + +# --- Frozen binary dispatch (must be before all other imports) --- +# When the frozen binary spawns itself as a subprocess, it passes one of these +# markers as argv[1] so the child process runs in the correct mode and exits. +if len(sys.argv) >= 2: + if sys.argv[1] == '__cli__': + # Pipeline CLI mode: strip marker, hand off to playlist.main() + del sys.argv[1] + from playlist import main as _cli_main + _cli_main() + sys.exit(0) + elif sys.argv[1] == '__picker__': + # Folder picker mode: open native dialog, print path, exit + import tkinter as tk + from tkinter import filedialog + _root = tk.Tk() + _root.withdraw() + _root.attributes('-topmost', True) + print(filedialog.askdirectory() or '', end='') + sys.exit(0) + +import json +import time +import subprocess +import threading +import socket +import re +from pathlib import Path +from flask import Flask, render_template_string, request, Response, jsonify + +# --- Configuration & Palette --- +APP_NAME = "PLAYLIST PIRATE" +VERSION = "v2.0" + +# Fire Orange Palette +PALETTE = { + "bg_void": "#04060b", + "text_warm": "#e8d5b8", + "text_muted": "#7a6f5e", + "fp": "#ff6600", # fire orange + "fb": "#ff8833", # fire bright + "fd": "#cc4400", # fire deep + "fg": "rgba(255,102,0,0.12)", # fire glow +} + +# --- Flask App Setup --- +app = Flask(__name__) +current_proc = None +current_step = None +output_queue = [] + +def strip_ansi(text): + ansi_escape = re.compile(r'\x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~])') + return ansi_escape.sub('', text) + +PLAYLIST_PY = str(Path(__file__).parent / "playlist.py") + +def get_free_port(): + s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) + s.bind(('', 0)) + port = s.getsockname()[1] + s.close() + return port + +# --- File/Folder Picker --- +def pick_folder_native(): + """Open a native folder picker via tkinter subprocess.""" + if getattr(sys, 'frozen', False): + # Frozen: re-invoke this binary in __picker__ mode + cmd = [sys.executable, '__picker__'] + else: + # From source: re-invoke this script in __picker__ mode + cmd = [sys.executable, str(Path(__file__).resolve()), '__picker__'] + try: + result = subprocess.run(cmd, capture_output=True, text=True, timeout=60) + path = result.stdout.strip() + return path if path else None + except Exception as e: + print(f"Picker error: {e}") + return None + +# --- Routes --- +@app.route('/') +def index(): + return render_template_string(HTML_TEMPLATE, palette=PALETTE, version=VERSION) + +@app.route('/pick-folder', methods=['POST']) +def pick_folder(): + path = pick_folder_native() + if path: + return jsonify({"path": path}) + return jsonify({"path": None}), 400 + +@app.route('/scan-dir', methods=['POST']) +def scan_dir(): + data = request.json + path = data.get("path") + if not path or not os.path.exists(path): + return jsonify({"error": "Invalid path"}), 400 + + p = Path(path) + csvs = [f.name for f in p.glob("*.csv")] + mds = [f.name for f in p.glob("*-playlist.md")] + + return jsonify({ + "csvs": csvs, + "mds": mds + }) + +_TRACK_RE_NEW = re.compile(r'^- \[( |x|-)\] (.+?) \| (.+?) \| ISRC:[^ ]+ \| SP:[^ ]+ \| (.+)$') +_TRACK_RE_OLD = re.compile(r'^- \[( |x|-)\] (.+?) \| (.+?) \| ISRC:[^ ]+ \| (.+)$') +_pending_merges = [] # list of (temp_path, original_path) + + +@app.route('/read-tracks', methods=['POST']) +def read_tracks(): + data = request.json + md_path = os.path.join(data.get('work_dir', ''), data.get('filename', '')) + if not os.path.exists(md_path): + return jsonify({'error': 'File not found'}), 404 + tracks = [] + with open(md_path, encoding='utf-8') as f: + for line in f: + m = _TRACK_RE_NEW.match(line.strip()) or _TRACK_RE_OLD.match(line.strip()) + if m: + status, url = m.group(1), m.group(4) + tracks.append({ + 'title': m.group(2), + 'artist': m.group(3), + 'downloadable': status == ' ' and url not in ('?', 'NOT_FOUND', '-'), + }) + return jsonify({'tracks': tracks}) + + +def _create_combined_download_md(work_dir, track_filter): + """Build one combined temp .md from all selected tracks across playlists. + Deduplicates by YouTube URL. Returns (temp_path, list_of_original_paths).""" + seen_urls = set() + lines_out = ['# Combined Download\n', '\n', '\n'] + originals = [] + + for md_filename, selected_titles in track_filter.items(): + original = os.path.join(work_dir, md_filename) + if not os.path.exists(original): + continue + originals.append(original) + selected = set(selected_titles) + with open(original, encoding='utf-8') as f: + for line in f: + m = _TRACK_RE_NEW.match(line.strip()) or _TRACK_RE_OLD.match(line.strip()) + if not m: + continue + if m.group(1) != ' ': + continue # already done / not found + title, url = m.group(2), m.group(4) + if title not in selected: + continue + if url not in ('?', 'NOT_FOUND', '-') and url in seen_urls: + continue # duplicate URL — skip + if url not in ('?', 'NOT_FOUND', '-'): + seen_urls.add(url) + lines_out.append(line if line.endswith('\n') else line + '\n') + + if len(lines_out) <= 3: + return None, [] + + tmp = os.path.join(work_dir, '_download_queue.tmp.md') + with open(tmp, 'w', encoding='utf-8') as f: + f.writelines(lines_out) + return tmp, originals + + +def _merge_and_cleanup(): + """Copy DONE status from temp back to original .md files, then delete temp.""" + for item in _pending_merges: + temp_path, originals = item + try: + done = set() + with open(temp_path, encoding='utf-8') as f: + for line in f: + m = _TRACK_RE_NEW.match(line.strip()) or _TRACK_RE_OLD.match(line.strip()) + if m and m.group(1) == 'x': + done.add(m.group(2)) + for original_path in (originals if isinstance(originals, list) else [originals]): + if not done or not os.path.exists(original_path): + continue + updated = [] + with open(original_path, encoding='utf-8') as f: + for line in f: + m = _TRACK_RE_NEW.match(line.strip()) or _TRACK_RE_OLD.match(line.strip()) + if m and m.group(1) == ' ' and m.group(2) in done: + line = line.replace('- [ ]', '- [x]', 1) + updated.append(line) + with open(original_path, 'w', encoding='utf-8') as f: + f.writelines(updated) + except Exception as e: + print(f'Merge error: {e}') + finally: + try: os.remove(temp_path) + except: pass + _pending_merges.clear() + + +@app.route('/run', methods=['POST']) +def run_command(): + global current_proc, current_step, output_queue + if current_proc and current_proc.poll() is None: + return jsonify({"error": "A process is already running"}), 400 + + data = request.json + step = data.get("step") + args = data.get("args", []) + work_dir = data.get("work_dir", os.getcwd()) + track_filter = data.get("track_filter") # {md_filename: [titles]} or None + + # For download with per-track selection, build filtered temp files + if step == 'download' and track_filter: + # Build one combined temp file — deduped by URL, one track at a time + tmp_path, originals = _create_combined_download_md(work_dir, track_filter) + if not tmp_path: + return jsonify({"error": "No downloadable tracks selected"}), 400 + _pending_merges.append((tmp_path, originals)) + # Find --output and its value from args; keep everything after it + try: + out_idx = args.index('--output') + extra = args[out_idx:] + except ValueError: + extra = [] + args = [os.path.basename(tmp_path)] + extra + + if getattr(sys, 'frozen', False): + cmd = [sys.executable, '__cli__', step] + args + else: + cmd = [sys.executable, PLAYLIST_PY, step] + args + try: + current_proc = subprocess.Popen( + cmd, cwd=work_dir, + stdout=subprocess.PIPE, stderr=subprocess.STDOUT, + text=True, bufsize=1, env=os.environ.copy() + ) + current_step = step + output_queue = [] + return jsonify({"ok": True}) + except Exception as e: + return jsonify({"error": str(e)}), 500 + + +@app.route('/stream') +def stream(): + def generate(): + global current_proc, current_step + if not current_proc: + yield "data: __DONE__\n\n" + return + while True: + line = current_proc.stdout.readline() + if not line: + if current_proc.poll() is not None: + break + time.sleep(0.1) + continue + clean = strip_ansi(line).rstrip() + if clean: + yield f"data: {json.dumps(clean)}\n\n" + completed_step = current_step + current_proc = None + current_step = None + if completed_step == 'download': + _merge_and_cleanup() + else: + # Discard any stale pending merges from a dropped connection + for item in _pending_merges: + try: os.remove(item[0]) + except: pass + _pending_merges.clear() + yield "data: __DONE__\n\n" + + return Response(generate(), mimetype='text/event-stream') + +# --- Embedded Assets --- +HTML_TEMPLATE = """ + + + + + + PLAYLIST PIRATE ☠ + + + + + +

☠

PLAYLIST PIRATE

{{ version }} / CHART YER COURSE BELOW

⚓ AT ANCHOR

+ +

+ + +

+ +

No port charted

+ + +

+ PARSE THE MANIFEST +

+ +

Reads your Spotify CSV exports and creates a tracking file for each playlist. Select the CSVs you want to process — you can always come back and run others later.

Chart a course to detect yer CSVs...

+ + +

+ CHART THE WATERS +

+ +

Searches YouTube for every track and saves the video URL. About 3–7 seconds per track — a full playlist takes a few minutes. If you interrupt it, it picks up where it left off next time.

+ Delay (sec): + + to + +

+ + +

III

+ RAISE THE FLAG +

+ +

Generates a static HTML page for each playlist with embedded YouTube players, MusicBrainz recording links, artist pages, and Spotify links. Choose where to put them — they're ready to drop straight into a website.

+ Output Dir: + + +

+ + +

+ PLUNDER THE HOLD +

+ +

Downloads each track as a 192kbps MP3 with title, artist, album and ISRC tags embedded. Requires ffmpeg. This step is always opt-in — nothing downloads unless you run it.

+ ⚠ PIRATE'S OATH
+ You are responsible for ensuring you have the right to download this content in your jurisdiction. +

+ Output Dir: + + +

📜 Playlists

🎵 Tracks

Select a playlist to browse its tracks

+ +

+ 🐦 CROW'S NEST + +

+ +

+ + + + +""" + +# --- Entry Point --- +if __name__ == '__main__': + import webbrowser + + port = get_free_port() + url = f"http://localhost:{port}" + + def run_flask(): + import logging + logging.getLogger('werkzeug').setLevel(logging.ERROR) + app.run(port=port, debug=False, use_reloader=False) + + threading.Thread(target=run_flask, daemon=True).start() + time.sleep(0.5) + webbrowser.open(url) + + try: + while True: + time.sleep(1) + except KeyboardInterrupt: + pass diff --git a/playlistpirate/playlist-gui.spec b/playlistpirate/playlist-gui.spec new file mode 100644 index 0000000..e3b771c --- /dev/null +++ b/playlistpirate/playlist-gui.spec @@ -0,0 +1,43 @@ +# -*- mode: python ; coding: utf-8 -*- + +a = Analysis( + ['gui.py'], + pathex=[], + binaries=[], + datas=[], + hiddenimports=[ + 'mutagen.id3', 'mutagen.mp3', 'mutagen.easyid3', + 'flask', 'werkzeug', 'jinja2', 'click', 'itsdangerous', 'markupsafe', + 'rich', 'rich.console', 'rich.theme', + 'tkinter', 'tkinter.filedialog', + ], + hookspath=[], + hooksconfig={}, + runtime_hooks=[], + excludes=[], + collect_all=['yt_dlp'], + noarchive=False, + optimize=0, +) +pyz = PYZ(a.pure) + +exe = EXE( + pyz, + a.scripts, + a.binaries, + a.datas, + [], + name='playlist-gui', + debug=False, + bootloader_ignore_signals=False, + strip=False, + upx=True, + upx_exclude=[], + runtime_tmpdir=None, + console=True, + disable_windowed_traceback=False, + argv_emulation=False, + target_arch=None, + codesign_identity=None, + entitlements_file=None, +) diff --git a/playlistpirate/playlist.py b/playlistpirate/playlist.py new file mode 100644 index 0000000..cecd747 --- /dev/null +++ b/playlistpirate/playlist.py @@ -0,0 +1,1027 @@ +#!/usr/bin/env python3 +""" +PLAYLIST PIRATE v2.0 +CSV → resolve → search → build → download + +Pipeline: + resolve Parse CSV(s) into *-playlist.md tracking files + search Find YouTube URLs via yt-dlp (resumable, no API key) + build Generate static HTML pages with embedded players + download Download tracks as MP3 (opt-in) + +Each step is discrete. Nothing runs automatically. +""" + +import csv +import json +import re +import time +import sys +import os +import random +import argparse +import urllib.parse +import urllib.request +from pathlib import Path +from datetime import datetime +from typing import List, Optional, Tuple + +# ─── Dependency Check ───────────────────────────────────────────────────────── + +missing = [] +try: + import yt_dlp +except ImportError: + missing.append("yt-dlp") +try: + from rich.console import Console + from rich.theme import Theme +except ImportError: + missing.append("rich") +try: + from mutagen.id3 import ID3, TIT2, TPE1, TSRC, TALB, ID3NoHeaderError +except ImportError: + missing.append("mutagen") + +if missing: + print(f"[FATAL] Missing: {', '.join(missing)}") + print("Install: pip install yt-dlp rich mutagen") + sys.exit(1) + + +# ─── Terminal ───────────────────────────────────────────────────────────────── + +console = Console( + theme=Theme({ + "ok": "green", + "accent": "bold bright_green", + "dim": "dim green", + "warn": "yellow", + "err": "bold red", + }), + style="green on black", + highlight=False, +) + +LOGO = """\ + ██████╗ ██╗ █████╗ ██╗ ██╗██╗ ██╗███████╗████████╗ + ██╔══██╗██║ ██╔══██╗╚██╗ ██╔╝██║ ██║██╔════╝╚══██╔══╝ + ██████╔╝██║ ███████║ ╚████╔╝ ██║ ██║███████╗ ██║ + ██╔═══╝ ██║ ██╔══██║ ╚██╔╝ ██║ ██║╚════██║ ██║ + ██║ ███████╗██║ ██║ ██║ ███████╗██║███████║ ██║ + ╚═╝ ╚══════╝╚═╝ ╚═╝ ╚═╝ ╚══════╝╚═╝╚══════╝ ╚═╝ + ██████╗ ██╗██████╗ █████╗ ████████╗███████╗ + ██╔══██╗██║██╔══██╗██╔══██╗╚══██╔══╝██╔════╝ + ██████╔╝██║██████╔╝███████║ ██║ █████╗ + ██╔═══╝ ██║██╔══██╗██╔══██║ ██║ ██╔══╝ + ██║ ██║██║ ██║██║ ██║ ██║ ███████╗ + ╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═╝ ╚═╝ ╚══════╝""" + +DIVIDER = "─" * 60 + +def boot(module: str): + console.print(f"\n[accent]{LOGO}[/accent]") + console.print(f"[dim]PLAYLIST PIRATE v2.0 // {module.upper()} MODULE[/dim]") + console.print(f"[dim]{DIVIDER}[/dim]\n") + time.sleep(0.2) + +def out(msg: str, style: str = "ok"): + console.print(msg, style=style) + + +# ─── Fire Spectrum ───────────────────────────────────────────────────────────── +# Each playlist gets a unique fire accent. Assigned by sorted alphabetical index. + +FIRE_SPECTRUM = [ + ("#ff3300", "#ff6633", "#cc2200", "rgba(255,51,0,0.18)"), # fire red + ("#ff6600", "#ff8833", "#cc4400", "rgba(255,102,0,0.18)"), # orange + ("#ff9900", "#ffbb44", "#cc7700", "rgba(255,153,0,0.18)"), # amber-orange + ("#ffcc00", "#ffdd55", "#cc9900", "rgba(255,204,0,0.18)"), # gold + ("#e8943a", "#f0ad60", "#b86820", "rgba(232,148,58,0.18)"), # fire amber + ("#d4654a", "#e07d64", "#a04030", "rgba(212,101,74,0.18)"), # coral + ("#cc3333", "#dd5555", "#992222", "rgba(204,51,51,0.18)"), # crimson + ("#ff4d6d", "#ff7090", "#cc2244", "rgba(255,77,109,0.18)"), # hot pink-red + ("#f472b6", "#f79ed0", "#c04080", "rgba(244,114,182,0.18)"), # fairy pink + ("#c558d9", "#d880e8", "#8830a0", "rgba(197,88,217,0.18)"), # orchid + ("#a855f7", "#c084fc", "#6d28d9", "rgba(168,85,247,0.18)"), # violet + ("#7c3aed", "#a06af0", "#4c1d95", "rgba(124,58,237,0.18)"), # indigo-violet + ("#3fbfaf", "#66d0c4", "#288070", "rgba(63,191,175,0.18)"), # waterfall + ("#2ac4b3", "#55d4c6", "#1a8077", "rgba(42,196,179,0.18)"), # teal + ("#00b4d8", "#33c8e8", "#007a99", "rgba(0,180,216,0.18)"), # sky blue + ("#32dc8c", "#66e8aa", "#1a9955", "rgba(50,220,140,0.18)"), # neon green + ("#00ff41", "#55ff77", "#00aa22", "rgba(0,255,65,0.18)"), # phosphor + ("#ff7f3f", "#ffa066", "#cc5500", "rgba(255,127,63,0.18)"), # paradise + ("#ffcf40", "#ffdd77", "#cc9900", "rgba(255,207,64,0.18)"), # toucan + ("#8b2020", "#bb4444", "#5a0f0f", "rgba(139,32,32,0.18)"), # deep red + ("#ff5500", "#ff7733", "#cc3300", "rgba(255,85,0,0.18)"), # orange-red +] + +def get_fire(idx: int) -> dict: + p, b, d, g = FIRE_SPECTRUM[idx % len(FIRE_SPECTRUM)] + return {"primary": p, "bright": b, "deep": d, "glow": g} + + +# ─── Data Model ─────────────────────────────────────────────────────────────── + +LINE_RE = re.compile( + r'^- \[( |x|-)\] (.+?) \| (.+?) \| ISRC:([A-Z0-9\-]{3,15}|-) \| SP:([A-Za-z0-9]+|-) \| (.+)$' +) +LINE_RE_LEGACY = re.compile( + r'^- \[( |x|-)\] (.+?) \| (.+?) \| ISRC:([A-Z0-9\-]{3,15}|-) \| (.+)$' +) +PENDING = " " +DONE = "x" +NOT_FOUND = "-" + + +class Track: + def __init__(self, status, title, artists, isrc, url, album="", spotify_id="-"): + self.status = status + self.title = title.strip() + self.artists = artists.strip() + self.isrc = isrc.strip() if isrc else "-" + self.url = url.strip() + self.album = album.strip() + self.spotify_id = spotify_id.strip() if spotify_id else "-" + + @property + def needs_search(self): + return self.url == "?" and self.status == PENDING + + @property + def needs_download(self): + return self.url not in ("?", "NOT_FOUND") and self.status == PENDING + + @property + def youtube_id(self): + if not self.url or self.url in ("?", "NOT_FOUND"): + return "" + m = re.search(r"youtu\.be/([A-Za-z0-9_\-]{11})", self.url) + if m: return m.group(1) + m = re.search(r"[?&]v=([A-Za-z0-9_\-]{11})", self.url) + if m: return m.group(1) + return "" + + @property + def search_query(self): + parts = [a.strip() for a in self.artists.split(",")][:2] + return f"{self.title} {', '.join(parts)}" + + def to_md(self): + return ( + f"- [{self.status}] {self.title} | {self.artists} " + f"| ISRC:{self.isrc} | SP:{self.spotify_id} | {self.url}" + ) + + +class Playlist: + def __init__(self, name, source, tracks, slug=""): + self.name = name + self.source = source + self.tracks = tracks + self.slug = slug or _make_slug(name) + + @classmethod + def from_md(cls, path: Path): + text = path.read_text(encoding="utf-8") + lines = text.splitlines() + name = path.stem.replace("-playlist", "").replace("-", " ").replace("_", " ").title() + if lines and lines[0].startswith("#"): + name = lines[0].lstrip("#").strip() + source = "" + if len(lines) > 1: + m = re.search(r"source:\s*([^|]+)", lines[1]) + if m: source = m.group(1).strip() + tracks = [] + for line in lines: + m = LINE_RE.match(line.strip()) + if m: + tracks.append(Track(m.group(1), m.group(2), m.group(3), m.group(4), m.group(6), spotify_id=m.group(5))) + else: + m = LINE_RE_LEGACY.match(line.strip()) + if m: + tracks.append(Track(m.group(1), m.group(2), m.group(3), m.group(4), m.group(5))) + slug = _make_slug(path.stem.replace("-playlist", "")) + return cls(name, source, tracks, slug) + + def to_md(self): + ts = datetime.now().strftime("%Y-%m-%dT%H:%M:%S") + body = [f"# {self.name}", f"", ""] + body += [t.to_md() for t in self.tracks] + return "\n".join(body) + "\n" + + def save(self, path: Path): + path.write_text(self.to_md(), encoding="utf-8") + + +# ─── CSV Parser ─────────────────────────────────────────────────────────────── + +TRACK_KEYS = ["track name", "title", "song name", "song", "name", "track"] +ARTIST_KEYS = ["artist name(s)", "artist names", "artist name", "artists", "artist"] +ISRC_KEYS = ["isrc"] +ALBUM_KEYS = ["album name", "album title", "album"] +SPOTIFY_KEYS = ["track uri", "spotify uri", "track id", "spotify id"] + +def _find_col(headers, keys): + lower = {h.lower(): h for h in headers} + return next((lower[k] for k in keys if k in lower), None) + +def _make_slug(name: str) -> str: + s = name.lower().replace(" ", "-") + s = re.sub(r"[^a-z0-9\-]", "", s) + s = re.sub(r"-{2,}", "-", s) + return s.strip("-") + +def _clean_artists(raw: str) -> str: + parts = [a.strip() for a in raw.split(",") if a.strip()] + return ", ".join(parts[:2]) + +def parse_csv(path: Path) -> Playlist: + with path.open(encoding="utf-8-sig", newline="") as f: + reader = csv.DictReader(f) + headers = list(reader.fieldnames or []) + tc = _find_col(headers, TRACK_KEYS) + ac = _find_col(headers, ARTIST_KEYS) + ic = _find_col(headers, ISRC_KEYS) + lc = _find_col(headers, ALBUM_KEYS) + sc = _find_col(headers, SPOTIFY_KEYS) + if not tc or not ac: + raise ValueError(f"Cannot find track/artist columns.\nHeaders: {headers}") + tracks = [] + for row in reader: + title = row[tc].strip() + artists = _clean_artists(row[ac]) + isrc = row.get(ic, "").strip().upper() if ic else "-" + album = row.get(lc, "").strip() if lc else "" + spotify_id = "-" + if sc: + raw = row.get(sc, "").strip() + # Accept full URI (spotify:track:ID) or bare ID + m = re.match(r"spotify:track:([A-Za-z0-9]+)", raw) + spotify_id = m.group(1) if m else (raw if re.match(r"^[A-Za-z0-9]{10,}$", raw) else "-") + if not isrc: isrc = "-" + if title: + t = Track(PENDING, title, artists, isrc, "?", album=album, spotify_id=spotify_id) + tracks.append(t) + name = path.stem.replace("-", " ").replace("_", " ").title() + return Playlist(name, path.name, tracks, _make_slug(path.stem)) + + +# ─── Batch helpers ──────────────────────────────────────────────────────────── + +def resolve_inputs(inputs: List[str], suffix: str) -> List[Path]: + """Expand inputs (files or directories) to a list of matching Path objects.""" + paths = [] + for inp in inputs: + p = Path(inp) + if p.is_dir(): + paths.extend(sorted(p.glob(f"*{suffix}"))) + elif p.exists(): + paths.append(p) + else: + out(f"> [WARN] Not found: {inp}", "warn") + return paths + + +# ─── MusicBrainz ────────────────────────────────────────────────────────────── + +MB_HEADERS = {"User-Agent": "PlaylistPirate/2.0 (spaces.exopraxist.org)"} +MB_INTERVAL = 1.2 +_last_mb = 0.0 +FETCH_FAILED = object() + +ARTIST_URL_PRIORITY = [ + "official homepage", "bandcamp", "soundcloud", "patreon", + "linktree", "youtube", "instagram", "twitter", "facebook", + "last.fm", "discogs", "wikidata", "wikipedia", +] + +def _mb_get(url: str): + global _last_mb + elapsed = time.time() - _last_mb + if elapsed < MB_INTERVAL: + time.sleep(MB_INTERVAL - elapsed) + _last_mb = time.time() + try: + req = urllib.request.Request(url, headers=MB_HEADERS) + with urllib.request.urlopen(req, timeout=15) as r: + return json.loads(r.read().decode("utf-8")) + except urllib.error.HTTPError as e: + if e.code == 404: return {} + if e.code == 429: + out(" MB rate limit — waiting 30s", "warn") + time.sleep(30) + return FETCH_FAILED + except Exception: + return FETCH_FAILED + +def mb_isrc_lookup(isrc: str): + url = f"https://musicbrainz.org/ws/2/isrc/{isrc}?inc=artist-credits&fmt=json" + data = _mb_get(url) + if data is FETCH_FAILED: return FETCH_FAILED + result = {"mb_recording_url": "", "mb_artist_id": ""} + recs = data.get("recordings", []) + if not recs: return result + rec = recs[0] + if rec.get("id"): + result["mb_recording_url"] = f"https://musicbrainz.org/recording/{rec['id']}" + for credit in rec.get("artist-credit", []): + if isinstance(credit, dict) and "artist" in credit: + result["mb_artist_id"] = credit["artist"].get("id", "") + break + return result + +def mb_artist_lookup(mb_artist_id: str): + url = f"https://musicbrainz.org/ws/2/artist/{mb_artist_id}?inc=url-rels&fmt=json" + data = _mb_get(url) + if data is FETCH_FAILED: return FETCH_FAILED + result = {"artist_url": "", "artist_url_type": ""} + best_rank = len(ARTIST_URL_PRIORITY) + 1 + for rel in data.get("relations", []): + rel_type = rel.get("type", "").lower() + href = rel.get("url", {}).get("resource", "") + if not href: continue + for i, ptype in enumerate(ARTIST_URL_PRIORITY): + if ptype in rel_type or ptype in href: + if i < best_rank: + best_rank = i + result["artist_url"] = href + result["artist_url_type"] = rel_type + break + return result + + +# ─── Build cache ────────────────────────────────────────────────────────────── + +def load_build_cache(cache_path: Path) -> dict: + if cache_path.exists(): + try: return json.loads(cache_path.read_text("utf-8")) + except Exception: pass + return {} + +def save_build_cache(cache: dict, cache_path: Path): + cache_path.write_text(json.dumps(cache, indent=2, ensure_ascii=False), "utf-8") + + +# ─── HTML helpers ───────────────────────────────────────────────────────────── + +GOOGLE_FONTS = ( + '' +) + +def esc(s) -> str: + return (str(s) + .replace("&", "&").replace("<", "<") + .replace(">", ">").replace('"', """)) + +def ms_to_mmss(ms) -> str: + try: + s = int(ms) // 1000 + return f"{s // 60}:{s % 60:02d}" + except Exception: return "" + +def ms_to_hhmmss(total_ms: int) -> str: + s = total_ms // 1000 + h, m, s = s // 3600, (s % 3600) // 60, s % 60 + return f"{h}:{m:02d}:{s:02d}" if h else f"{m}:{s:02d}" + +SHARED_CSS = """ + :root {{ + --bg-void: #04060b; + --text-warm: #e8d5b8; + --text-muted: #7a6f5e; + --fp: {primary}; + --fb: {bright}; + --fd: {deep}; + --fg: {glow}; + }} + * {{ box-sizing: border-box; margin: 0; padding: 0; }} + body {{ + background: var(--bg-void); + color: var(--text-warm); + font-family: 'Rambla', sans-serif; + line-height: 1.5; min-height: 100vh; + }} +""" + + +def build_hub_html(playlists_meta: list) -> str: + """playlists_meta: list of {slug, name, track_count, fire_idx}""" + total = sum(p["track_count"] for p in playlists_meta) + n = len(playlists_meta) + + cards = [] + for p in playlists_meta: + f = get_fire(p["fire_idx"]) + cards.append( + f' \n' + f'

{esc(p["name"])}

\n' + f'

{p["track_count"]} tracks

\n' + f' ' + ) + + fire = get_fire(0) + css = SHARED_CSS.format(**fire) + + return f""" + + + + + PLAYLISTS | Singular Particular + {GOOGLE_FONTS} + + + + ← Space +

PLAYLISTS

{n} playlists • {total:,} tracks

+{chr(10).join(cards)} +

+ + + +""" + + +def build_track_html(track: Track, idx: int, mb_data: dict, artist_url: str) -> str: + num = f"{idx:02d}" + name = esc(track.title) + artists = esc(track.artists) + album = esc(track.album) + yt_id = track.youtube_id + yt_search = f"https://www.youtube.com/results?search_query={urllib.parse.quote(track.artists + ' ' + track.title)}" + mb_rec_url = esc(mb_data.get("mb_recording_url", "")) + art_url = esc(artist_url) + + mb_link = ( + f'[MUSICBRAINZ]' + if mb_rec_url else '[MUSICBRAINZ]' + ) + artist_link = ( + f'[ARTIST]' + if art_url else '[ARTIST]' + ) + spotify_link = ( + f'[SPOTIFY]' + if track.spotify_id and track.spotify_id != "-" else "" + ) + + if yt_id: + yt_link = f'' + embed_html = f""" +

+ +

""" + else: + yt_link = f'[YOUTUBE]' + embed_html = "" + + return f""" +

+ {num} +

{name}

{artists}

{album}

+ {mb_link} + {yt_link} + {artist_link} + {spotify_link} +

{embed_html} +

""" + + +def build_playlist_html(playlist: Playlist, fire: dict, cache: dict) -> str: + css = SHARED_CSS.format(**fire) + n = len(playlist.tracks) + cards = "".join( + build_track_html(t, i + 1, + cache.get(f"mb:isrc:{t.isrc}", {}), + cache.get(f"mb:artist:{cache.get(f'mb:isrc:{t.isrc}', {}).get('mb_artist_id','')}", {}).get("artist_url","") + ) + for i, t in enumerate(playlist.tracks) + ) + has_yt = sum(1 for t in playlist.tracks if t.youtube_id) + + return f""" + + + + + {esc(playlist.name)} | PLAYLISTS + {GOOGLE_FONTS} + + + +

{esc(playlist.name)}

+ {n} TRACKS • {has_yt} EMBEDDED +

+ + +

+{cards} +

+ + + + +""" + + +# ─── Commands ───────────────────────────────────────────────────────────────── + +def cmd_resolve(args): + boot("resolve") + paths = resolve_inputs(args.input, ".csv") + if not paths: + out("> [ERR] No CSV files found.", "err"); sys.exit(1) + + for src in paths: + out(f"> PARSING: {src.name}") + try: + playlist = parse_csv(src) + except Exception as e: + out(f" [ERR] {e}", "err"); continue + + out_path = src.with_name(src.stem + "-playlist.md") + playlist.save(out_path) + out(f" → {out_path.name} ({len(playlist.tracks)} tracks)", "accent") + + out(f"\n[dim]{DIVIDER}[/dim]") + out("> NEXT: playlist search ", "dim") + + +def cmd_search(args): + boot("search") + paths = resolve_inputs(args.input, "-playlist.md") + if not paths: + out("> [ERR] No playlist.md files found.", "err"); sys.exit(1) + + delay_min = float(args.delay_min) + delay_max = float(args.delay_max) + ydl_opts = {"quiet": True, "no_warnings": True, "extract_flat": "in_playlist"} + + for src in paths: + out(f"\n> PLAYLIST: {src.name}", "accent") + playlist = Playlist.from_md(src) + pending = [t for t in playlist.tracks if t.needs_search] + + out(f" TO SEARCH: {len(pending)} / {len(playlist.tracks)}") + if not pending: + out(" Nothing to search — all tracks have URLs.", "warn"); continue + + found = not_found = 0 + for i, track in enumerate(pending): + out(f" [{i+1}/{len(pending)}] {track.search_query}") + try: + with yt_dlp.YoutubeDL(ydl_opts) as ydl: + info = ydl.extract_info(f"ytsearch1:{track.search_query}", download=False) + entries = (info or {}).get("entries", []) + entry = entries[0] if entries else None + if entry: + vid_id = entry.get("id") or entry.get("url", "").split("v=")[-1] + track.url = f"https://www.youtube.com/watch?v={vid_id}" + out(f" ✓ {track.url}", "dim") + found += 1 + else: + track.url = track.status = NOT_FOUND + out(" NOT FOUND", "warn"); not_found += 1 + except Exception as e: + track.url = track.status = NOT_FOUND + out(f" ERROR: {e}", "err"); not_found += 1 + + playlist.save(src) + if i < len(pending) - 1: + d = random.uniform(delay_min, delay_max) + out(f" [dim]sleep {d:.1f}s[/dim]", "dim") + time.sleep(d) + + out(f" FOUND: {found} NOT FOUND: {not_found}", "accent") + + out(f"\n[dim]{DIVIDER}[/dim]") + out("> NEXT: playlist build --out ", "dim") + + +def cmd_build(args): + boot("build") + paths = resolve_inputs(args.input, "-playlist.md") + if not paths: + out("> [ERR] No playlist.md files found.", "err"); sys.exit(1) + + out_dir = Path(args.out) if args.out else paths[0].parent + out_dir.mkdir(parents=True, exist_ok=True) + + cache_path = out_dir / ".build-cache.json" + cache = load_build_cache(cache_path) + + # Load all playlists + playlists = [] + for p in paths: + pl = Playlist.from_md(p) + playlists.append(pl) + out(f"> LOADED: {pl.name} ({len(pl.tracks)} tracks)") + + # Sort for consistent fire color assignment + playlists.sort(key=lambda p: p.name.lower()) + + # ── MusicBrainz pass ────────────────────────────────────────────────────── + out(f"\n[dim]{DIVIDER}[/dim]") + out("> MUSICBRAINZ: fetching recording + artist data...") + + all_isrcs = { + t.isrc: t for pl in playlists for t in pl.tracks + if t.isrc and t.isrc != "-" + } + uncached_isrcs = [i for i in all_isrcs if f"mb:isrc:{i}" not in cache] + out(f" ISRCs: {len(all_isrcs)} total | {len(uncached_isrcs)} to fetch") + + for n, isrc in enumerate(uncached_isrcs, 1): + result = mb_isrc_lookup(isrc) + if result is not FETCH_FAILED: + cache[f"mb:isrc:{isrc}"] = result + if n % 50 == 0: + save_build_cache(cache, cache_path) + out(f" ISRC {n}/{len(uncached_isrcs)}", "dim") + save_build_cache(cache, cache_path) + + artist_ids = { + cache[f"mb:isrc:{i}"]["mb_artist_id"] + for i in all_isrcs + if f"mb:isrc:{i}" in cache and cache[f"mb:isrc:{i}"].get("mb_artist_id") + } + uncached_artists = [a for a in artist_ids if f"mb:artist:{a}" not in cache] + out(f" Artists: {len(artist_ids)} total | {len(uncached_artists)} to fetch") + + for n, aid in enumerate(uncached_artists, 1): + result = mb_artist_lookup(aid) + if result is not FETCH_FAILED: + cache[f"mb:artist:{aid}"] = result + if n % 50 == 0: + save_build_cache(cache, cache_path) + out(f" Artist {n}/{len(uncached_artists)}", "dim") + save_build_cache(cache, cache_path) + out(" MusicBrainz complete.", "accent") + + # ── Generate HTML ───────────────────────────────────────────────────────── + out(f"\n[dim]{DIVIDER}[/dim]") + out("> BUILDING HTML...") + + playlists_meta = [] + for idx, pl in enumerate(playlists): + fire = get_fire(idx) + html = build_playlist_html(pl, fire, cache) + out_path = out_dir / f"{pl.slug}.html" + out_path.write_text(html, "utf-8") + has_yt = sum(1 for t in pl.tracks if t.youtube_id) + out(f" → {pl.slug}.html ({len(pl.tracks)} tracks, {has_yt} embeds)", "accent") + playlists_meta.append({ + "slug": pl.slug, "name": pl.name, + "track_count": len(pl.tracks), "fire_idx": idx, + }) + + # Hub page (only if multiple playlists) + if len(playlists) > 1: + hub_path = out_dir / "playlists.html" + hub_path.write_text(build_hub_html(playlists_meta), "utf-8") + out(f" → playlists.html (hub, {len(playlists)} playlists)", "accent") + + total = sum(p["track_count"] for p in playlists_meta) + embeds = sum(1 for pl in playlists for t in pl.tracks if t.youtube_id) + out(f"\n> BUILD COMPLETE — {len(playlists)} playlists, {total:,} tracks, {embeds} embeds", "accent") + out(f"[dim]{DIVIDER}[/dim]") + out(f"> OUTPUT: {out_dir}", "dim") + out("> NEXT (opt-in): playlist download ", "dim") + + +def cmd_download(args): + boot("download") + paths = resolve_inputs(args.input, "-playlist.md") + if not paths: + out("> [ERR] No playlist.md files found.", "err"); sys.exit(1) + + for src in paths: + out(f"\n> PLAYLIST: {src.name}", "accent") + playlist = Playlist.from_md(src) + pending = [t for t in playlist.tracks if t.needs_download] + + out_dir = ( + Path(args.output) if args.output + else src.parent / src.stem.replace("-playlist", "") + ) + out_dir.mkdir(parents=True, exist_ok=True) + + out(f" TO DOWNLOAD: {len(pending)} / {len(playlist.tracks)}") + out(f" OUTPUT DIR: {out_dir}", "dim") + + if not pending: + out(" Nothing to download.", "warn"); continue + + for i, track in enumerate(pending): + safe = _safe_filename(track.title, track.artists) + target = out_dir / f"{safe}.mp3" + out(f" [{i+1}/{len(pending)}] {track.title}") + out(f" {track.url}", "dim") + + ydl_opts = { + "format": "bestaudio/best", + "outtmpl": str(out_dir / f"{safe}.%(ext)s"), + "quiet": True, "no_warnings": True, + "postprocessors": [{ + "key": "FFmpegExtractAudio", + "preferredcodec": "mp3", + "preferredquality": "192", + }], + } + try: + with yt_dlp.YoutubeDL(ydl_opts) as ydl: + ydl.download([track.url]) + if target.exists(): + _embed_tags(target, track) + out(f" ✓ {target.name}", "accent") + track.status = DONE + else: + out(f" WARN: not at expected path", "warn") + except Exception as e: + out(f" ERROR: {e}", "err") + + playlist.save(src) + + done = sum(1 for t in playlist.tracks if t.status == DONE) + out(f" {done}/{len(playlist.tracks)} tracks acquired.", "accent") + + out(f"\n[dim]{DIVIDER}[/dim]") + + +# ─── Helpers ────────────────────────────────────────────────────────────────── + +def _safe_filename(title: str, artists: str) -> str: + raw = f"{artists.split(',')[0].strip()} - {title}" + safe = re.sub(r'[<>:"/\\|?*\x00-\x1f]', "", raw) + safe = re.sub(r'\s+', " ", safe).strip() + return safe[:180] + +def _embed_tags(path: Path, track: Track): + try: + try: tags = ID3(str(path)) + except ID3NoHeaderError: tags = ID3() + tags["TIT2"] = TIT2(encoding=3, text=track.title) + tags["TPE1"] = TPE1(encoding=3, text=track.artists) + if track.isrc and track.isrc != "-": + tags["TSRC"] = TSRC(encoding=3, text=track.isrc) + if track.album: + tags["TALB"] = TALB(encoding=3, text=track.album) + tags.save(str(path), v2_version=3) + except Exception as e: + out(f" WARN: tag write failed: {e}", "warn") + + +# ─── Entry Point ────────────────────────────────────────────────────────────── + +def main(): + parser = argparse.ArgumentParser( + prog="playlist", + description="PLAYLIST PIRATE v2.0 — CSV to embedded web player to MP3", + ) + sub = parser.add_subparsers(dest="command", required=True) + + # resolve + r = sub.add_parser("resolve", help="Parse CSV(s) → *-playlist.md") + r.add_argument("input", nargs="+", help="CSV file(s) or directory") + + # search + s = sub.add_parser("search", help="Find YouTube URLs via yt-dlp (resumable)") + s.add_argument("input", nargs="+", help="*-playlist.md file(s) or directory") + s.add_argument("--delay-min", type=float, default=3.0, metavar="SEC") + s.add_argument("--delay-max", type=float, default=7.0, metavar="SEC") + + # build + b = sub.add_parser("build", help="Generate HTML pages with embedded YouTube players") + b.add_argument("input", nargs="+", help="*-playlist.md file(s) or directory") + b.add_argument("--out", metavar="DIR", help="Output directory for HTML (default: same as input)") + + # download + d = sub.add_parser("download", help="Download tracks as MP3 (opt-in)") + d.add_argument("input", nargs="+", help="*-playlist.md file(s) or directory") + d.add_argument("--output", "-o", metavar="DIR", help="Output directory for MP3s") + + args = parser.parse_args() + {"resolve": cmd_resolve, "search": cmd_search, + "build": cmd_build, "download": cmd_download}[args.command](args) + + +if __name__ == "__main__": + main() diff --git a/playlistpirate/requirements.txt b/playlistpirate/requirements.txt new file mode 100644 index 0000000..26f4eb5 --- /dev/null +++ b/playlistpirate/requirements.txt @@ -0,0 +1,3 @@ +yt-dlp>=2024.1.0 +rich>=13.0.0 +mutagen>=1.47.0