Creating a Terry Search Engine

Share projects. Talk about tech that empowers or oppresses.
Post Reply
User avatar
masayuki
Posts: 19
Joined: Tue Nov 21, 2023 6:45 pm

Creating a Terry Search Engine

Post by masayuki »

On more than one occasion ive asked myself "man what are Terrys thoughts on this?". I am now setting out on the goal of compiling every word spoken by Mr Terry and cataloging it somewhere.

Maybe later if I ever get the GPU power ill train a Mixtral-8x7b-Instruct-v0.1 model on it and we can talk once more to terry

But for now, we need text

https://archive.org/details/TerryADavis ... OS_Archive
Heres the terry davis archive. Its mostly video but it also contains already text website, emails, mailing list. Its just under a Terabyte of data.

One thing that was absent from the archive.org terry archive is terrys reddit account
https://old.reddit.com/user/TempleOS_Terry_Davis
Using https://redditcommentsearch.com/ it was trivial to get all of TDs comments, they live here now: https://git.wired.rehab/masayuki/terry- ... reddit.txt

VIDEO TRANSCRIBING SOFTWARE:
Ideally this should be something that lives on the CLI so I can script it and let it transcribe ALL the videos. I dont want some GUI ware that I have to add, transcribe, add, transcribe.
https://secure.scribebuddy.com/lifetime-transcription/
This seems pretty damn good, its affordable, but it aint free! I dont think Terry would approve of proprietary ware
Free ware:
https://whisper.ggerganov.com/ In browser ware
https://goodsnooze.gumroad.com/l/macwhisper MacOS ware
https://github.com/HenestrosaDev/audiotext Python Desktop ware
https://github.com/JSchmie/ScrAIbe Python Framework

ScrAIbe seems to be the way forward. Ill script up a test with a shorter clip and see how it performs, once testing looks good Ill spool up a fat VM and start transcribing all the videos
Post Reply