Several years ago, I started recording our Township meetings and posting them to YouTube. This was very helpful — even our government officials used the recordings to refresh their memory about what happened in a meeting. But it also led people to ask “why, exactly, are we relying on some random citizen to provide this service? What if they are busy? Or move?!” … and the Township created their own channel and posted their meeting recordings. This was a great way to promote transparency however they’ve got retention policies. Since we have absolutely been at meetings where it would be very helpful to know what happened five, ten, forty!! years ago … my expectation is that these videos will be useful far beyond the allotted document retention period.
We decided to keep our channel around with the historic archive of government meeting recordings. There’s no longer time criticality — anyone who wants to see a current meeting can just use the township’s channel. We have a script that lists all of the videos from the township’s channel and downloads them — once I complete back-filling our archive, I will modify the script to stop once it reaches a video series we already have. But this quick script will list all videos published to a channel and download the highest quality MP4 file associated with that video.
# API key for my Google Developer project
strAPIKey = '<CHANGEIT>'
# Youtube account channel ID
strChannelID = '<CHANGEIT>'
import os
from time import sleep
import urllib
from urllib.request import urlopen
import json
from pytube import YouTube
import datetime
from config import dateLastDownloaded
os.chdir(os.path.dirname(os.path.abspath(__file__)))
print(os.getcwd())
strBaseVideoURL = 'https://www.youtube.com/watch?v='
strSearchAPIv3URL= 'https://www.googleapis.com/youtube/v3/search?'
iStart = 0 # Not used -- included to allow skipping first N files when batch fails midway
iProcessed = 0 # Just a counter
strStartURL = f"{strSearchAPIv3URL}key={strAPIKey}&channelId={strChannelID}&part=snippet,id&order=date&maxResults=50"
strYoutubeURL = strStartURL
while True:
inp = urllib.request.urlopen(strYoutubeURL)
resp = json.load(inp)
for i in resp['items']:
if i['id']['kind'] == "youtube#video":
iDaysSinceLastDownload = datetime.datetime.strptime(i['snippet']['publishTime'], "%Y-%m-%dT%H:%M:%SZ") - dateLastDownloaded
# If video was posted since last run time, download the video
if iDaysSinceLastDownload.days >= 0:
strFileName = (i['snippet']['title']).replace('/','-').replace(' ','_')
print(f"{iProcessed}\tDownloading file {strFileName} from {strBaseVideoURL}{i['id']['videoId']}")
# Need to retrieve a youtube object and filter for the *highest* resolution otherwise we get blurry videos
if iProcessed >= iStart:
yt = YouTube(f"{strBaseVideoURL}{i['id']['videoId']}")
yt.streams.filter(progressive=True, file_extension='mp4').order_by('resolution').desc().first().download(filename=f"{strFileName}.mp4")
sleep(90)
iProcessed = iProcessed + 1
try:
next_page_token = resp['nextPageToken']
strYoutubeURL = strStartURL + '&pageToken={}'.format(next_page_token)
print(f"Now getting next page from {strYoutubeURL}")
except:
break
# Update config.py with last run date
f = open("config.py","w")
f.write("import datetime\n")
f.write(f"dateLastDownloaded = datetime.datetime({datetime.datetime.now().year},{datetime.datetime.now().month},{datetime.datetime.now().day},0,0,0)")
f.close