Generating YouTube channel runtime totals =========================================
First draft: 2024-11-27 Published: 2024-12-23
Table of contents:
This is a Rust project aimed at extracting video content duration information through the YouTube API, then aggregating to produce a total runtime for a channel in a given date period.
TL;DR: Project GitHub repository can be found here.
Motivation
Do not know about you, but for me, I can safely state that for some time now I watch significantly more YouTube compared to any other video sources, including other platforms, television and movies, combined. I think the cause of this might be the social aspect of it, as they say "the _You_ in YouTube", as in anyone could upload anything and through various mechanics it can potentially reach viewers who find it entertaining. Let's not go into the algorithms and other corporate trickery, and the business side of the thing, but it sure has showed that many people find it appealing to watch (and make!) a certain down-to-earth style content (although in a way reality TV shows have already indicated this fact).
I would like to highlight one genre in particular: educational content. When I was young I liked to watch TV channels where scientific documentaries were shown all day long, but soon I got fed up with all the advertisements and that I had to accommodate topics I did not care about. Also most of the time it was only explained _why_ they were doing things and not exactly _how_ they were doing them: I figured one has to go and formally study most of the crafts. The advent of the Internet has changed this view, as one could find educational material and even videos on any subject imaginable. Now with YT it is easier to find channels where learning could even be more entertaining than watching any sitcom (at least for me).
I was looking at some in-person educational courses recently where net programme duration was mentioned, in study hours as measurement unit. Also I went through my university degree course lists once when I wanted to know how many of them involved any kind of programming, so that I could relate these figures of hours (I will share some numbers further below). Following this train of thought, I have a couple of hobbies that others can call their profession, and I am always wondering where is the line between amateurism and professionalism? Having someone giving you money to do something sounds kind of vague to me. In situations like this it is always worth looking at things that require a formal license to carry out, like operating certain vehicles for example. Airplane pilots are serious about clocking their flight hours, and even when learning how to drive a car, there is a requirement to complete so-and-so hours on the closed training course before allowed out in public.
If you are like me and watch many educational videos, you might wonder if the total runtime of a channel could add up to the equivalent of a decent educational programme...
Implementation
At first I was looking at the most obvious way of achieving my goal, which is the simple but sometimes adequate method of inspecting the HTML source my browser gives me. For your interest, this would usually be followed by hammering the HTML in a text editor with Regex, until it resembles a CSV that LibreOffice Calc could swallow to produce the sum I am after. Not this time of course, because on YouTube pagination works similarly to the _death scroll_, where a new "page" is appended to the list when you go to the bottom. Sometimes this can be annoying even as a regular user, because you have to sit on the End key for some time if you are looking for the last video you have watched on a channel.
Viewing the HTML in these situations is not easy. I may be somewhat behind the times with my Web knowledge right now, especially on the frontend side, but obviously if the DOM is manipulated to add new items, I will not be seeing those on the regular source viewer. There may be more advanced tools for this exact thing, or else it would be a nightmare to debug these systems, but thanks to Google I had an alternative way of accessing the information I was looking for: the YouTube API. It is a REST interface with seemingly full CRUD (create, read, update, delete) access to YT data including videos, playlists, comments and so on. I am sure there is some kind of permission policy at work so write access is only allowed in relation to one's own channel, but I was only looking at listing operations and could not observe any difference in restrictions compared to the Web frontend. I had to generate an API key which was easy, and I have to say the service comes with a generous amount of usage quota free out of the box for basic development demo projects.
I have imagined the project as a CLI application, at least for the first implementation. I figured I will only need it for a couple of queries, and I do not require an interface more complicated than that.
The `-h` help says the following, in the feature-complete 0.1.1 version:
> Description: > YouTube API tool for calculating the video runtime sum of a channel. > > Usage: > yt_api_videosum [-k api_key] [-s [start_date]] [-e [end_date]] [channel_name] > > Options: > -k YT API key supplied in plain text. > If empty, the program will look for it in the 'config/key.txt' file. > -s > -e Filter the videos by publish date, giving a start- and/or end date for > the active interval. Date is expected in RFC3339 format, > i.e. 'yyyy-mm-ddTHH:MM:SSZ' (note the UTC timezone). > If the timestamp is empty, it will be asked interactively. > -h Display this help and exit. > > Parameters: > channel_name Human-readable name of the channel, with or without the > '@' prefix. If omitted, it will be asked interactively. > > Output: > Aggregated total of video duration is displayed interactively. > Also a full list of the videos are saved to 'output.txt' in CSV format, or in > case the process could not complete, it will contain the last intermediate > JSON response to help figuring out what went wrong. > > Created by Zoltan Kovari, 2024.
You can find an example of the program's output in the next section, now let's talk a bit more about implementation details.
Without doubt the most interesting aspect of the development was figuring out how to handle the time period format, both as input and output. The YouTube Data API reference documentation provided a nice summary on the format used for the duration field in a video resource representation: it is an ISO 8601 compliant duration. I must admit I was not particularly familiar with that standard before, I knew it was somehow similar to RFC3339 in the sense that the popular 'yyyy-mm-ddTHH:MM:SSZ' timestamp format is valid in both, but otherwise it always seemed too complicated to study further. Now again at first glance, the same impulse came to my mind and thanks to the guys at Google who did a thorough job describing the basics of the format in two paragraphs, without further thought I set out to implement a parser for the period format, converting it to `chrono::TimeDelta`.
I am always living by the principle that there is no shame in admitting that you were doing something sub-optimal in the past, if you realize your weakness and try improving on it (or if you can come up with an argument strong and convincing enough in your defense -- but let's not go there). One of my weaknesses is test automation, I always knew of it but somehow never practiced it. Now that my aim is to improve on my Rust programming skills, and at last this is a development environment that supports it built-in without any hassle, I try and sneak in some TDD (test driven development) here and there. In this project I used it for the ISO8601 period parser: I created the tests first and then created the function satisfying the tests. I may even have over-engineered the tests a bit with its more than 2000 test cases, that I even multiplied by mutating the patterns with invalid characters, ensuring those will not be accepted. Similarly, I did a human readable formatting function for TimeDeltas (poorly tested on a mere 175 cases), which is used on program output.
This period parsing deal might have caused an unanticipated influence on me though: realizing that the standard did not stop at days but included months and years long periods as well (and simply these are not used or specified by the YT API docs), my implementation suddenly started to feel as a subset only, and rightfully so. Then I started working on the CSV project needing potential support of multiple time formats, so I am now very much aware of ISO8601 and I think I will have to look around for a crate with the widest support for the standard. This endeavor might be resulting in another article here, as when I started checking now, I had to stop after 10 or so crate candidates...
As for the future of this project, although I can use it as it is, it could be worthwhile to develop it further by adding some kind of web frontend to let others have a play with it. This would pose some interesting challenges, for example I would obviously need to handle the quota somehow, avoiding its overuse. Also with more than one type of client, the library interface could use some additional work as well, I think.
Results
In the following I would like to share the output of some queries I have made with the yt_api_videosum command line program.
Electronics engineering
I would like to start with EE first, as this might be closer to the general theme of this blog (at least as I plan at the moment). The first YouTube channel I followed and watched regularly is the EEVBlog by Dave Jones. It is always tricky (practically impossible) to find a definite starting date, I chose the start of year 2014 as a good approximation based on my endeavors and interests at the time, also I can recognize most of the videos from that time. Dave started very early and had lots of videos by then, I have probably watched quite a few of the older ones buy that is okay as these can balance out any with off-topic subjects.
Let's see the full command line invocation and output:
> ./yt_api_videosum -s "2014-01-01T00:00:00Z" "eevblog" > Info: No API key supplied, trying 'config/key.txt' file... > Successfully loaded API key. > Querying channel info... > Playlist ID extracted. > Querying playlist... > Video count: 1342 > Querying video info.......... > Success, output written to 'output.txt'. > Sum total: 2089734 seconds, or 580 hours 28 minutes 54 seconds
I would like to emphasize that for this to work as intended, every single one of the videos in the given time period has to be watched from start until the end. I do this with my favorite channels but also in many other areas of life, but I am just wired in that sequential way somehow. Also this does not mean that I do not watch many other channels where I filter heavily. For example there is no other EE channel where I had viewed every video, but I wanted to start with EE nevertheless.
This restriction renders the whole concept more like a thought experiment, but I think it is worthy so let's go on: What if I watched my whole EEVBlog backlog, as well as my next three favorite channels from start to end?
Second on my list of most watched EE channels is still related to Dave: The Amp Hour is their regular podcast with Chris Gammell with fresh industry news and interviews. Big Clive is the next, he does teardown-style videos of consumer goods (usually originating from the Far East), and you can learn a ton about electrical safety concerns and reverse-engineering. Last but not least is Martin Lorton, who's videos I had found while looking for multimeter reviews, and quite liked his methodology.
I will give the results in short form from now on:
- EEVblog: 1967 videos, 830 hours
- TheAmpHour: 699 videos, 879 hours
- bigclivedotcom: 2437 videos, 463 hours
- mjlorton: 741 videos, 281 hours
Sum: 2453 hours
Machining
I have found that I have to be in a certain mindset to watch highly technical content, and most of the time I am not at that mood by the end of the day I am afraid. This might be the reason why I generally do not like to watch software-related videos, and I even watch much less electronics these days than I used to.
The topic I find particularly relaxing and entertaining to watch though, is machine shop work. I am subscribed to more than 20 channels in the subject although I am not able to follow even that many. Also I have a list with many more recommendations, because you can never know, I might find some day that I have much more free time to watch videos (although I doubt it somehow).
Three channels I follow sequentially, which means I am going systematically through their videos, always looking up the last one viewed and continuing from there. The first one I successfully "synced up" with (and finally able to view fresh ones whenever available) is OxTools, a.k.a. Tom Lipton, who introduced me to concepts of the trade I had never imagined. Now I am in the process of watching the great Mr. Peterson, a.k.a. Tubalcain with enthusiasm but he is quite prolific (fortunately), so I have to be more diligent to keep up! Even though I had learned the basics through first-hand experience, shown to me by a professional, still I feel at least half of what I know about this trade I owe to MrPete. Also I have included Joe Pieczynski, I really like his professionalism and approach to sharing his knowledge.
- mrpete222: 1540 videos, 420 hours
- oxtoolco: 618 videos, 277 hours
- joepie221: 224 videos, 56 hours
Sum: 753 hours
Note on intervals used: I am sure I have watched every OxTools video. For MrPete, I had to set a starting date (2009-10-01) because there are many off-topic videos early on, and I always know precisely what is the latest video where I am currently in the sequence. Similarly for Joe Pie, I remember that I stopped watching regularly around the time he started his model-making content (2020-09-22), but I have every intent on continuing soon.
Popular science
I will address the results above in a minute, but for a little fun, I have made a couple more queries that might be interesting to share. We are going gradually towards the more entertaining, less educational, but still somewhat useful territories here.
In pop science my interests mostly lay around two themes: aerospace and skepticism of mainstream reporting. In the first topic I would like to give a shout-out to Scott Manly, and Paul Shillito a.k.a. The Curious Droid, for their comprehensive coverage of aeronautical and space technology through the ages past and ongoing. Rocket science is such a hard thing to have a solid grasp on, to understand even the basics, that no surprise it is one of the subjects nowadays with many misconceptions floating around. Also there are many more down-to-earth engineering problems today with scifi-esque solutions that turn out to be leaning more towards the fantastical than scientific. With the help of people like Thunderf00t and the Common Sense Skeptic, one can learn not to accept the marketing narrative blindly, and try applying simple back-of-the-envelope type calculations to check if the claims seem to be at least in the right order of magnitude. As a side note I have to mention the EEVblog here as well of course, for his excellent debunking videos (but obviously those have been counted as EE content above).
- scottmanley: 766 videos, 187 hours
- CuriousDroid: 224 videos, 48 hours
- Thunderf00t: 412 videos, 145 hours
- commonsenseskeptic: 115 videos, 60 hours
Sum: 440 hours
Note on intervals used: Regarding Scott Manley, I did not want to include earlier videos (before 2018-08-24) on Kerbal and other videogame-type content as off-topic (although I got to know about him through those). With CuriousDroid I simply started where the aerospace content begins (2016-08-01). Thunderf00t's science content was always interspersed with other controversial topics, and though I watch every video of his for some time now, I did not go back to many of the earlier ones so I had to draw the line somewhere (2016-07-19). The commonsenseskeptic channel being a younger one, I can be sure that I have watched every video, but there are quite a few re-runs with improved quality so I tried to manually filter out those (I guess this could be a feature request).
Comparison
University
I have promised some numbers that I derived from my BSc. and MSc. studies, to help put the numbers above into some perspective. I have studied Business IT at the University of West Hungary in Sopron, starting 2006. First let's see the unfiltered data table:
+-----+---------+----------------------+---------------+-------------------+ | | Credits | Credit-derived hours | Nominal hours | Number of courses | +-----+---------+----------------------+---------------+-------------------+ | BSc | 210 | 6300 | 2366 | 50 | | MSc | 125 | 3750 | 1274 | 25 | | Sum | 335 | 10050 | 3640 | 75 | +-----+---------+----------------------+---------------+-------------------+
As an EU member state, the Hungarian "kredit" system is in accordance with ECTS, where one credit embodies roughly 25-30 hours worth of study. According to the relevant Wikipedia page, we can consider this to be closer to 30 in Hungary, so I used that value for deriving hours, as a first estimation.
Also there were nominal values for weekly number of hours of lecture, practice and laboratory work in case of every course. Adding up the three numbers and multiplying by 13 (which was a typical course length in weeks if I remember correctly), I got the nominal study hours for every course. Summing for all courses gives a dramatically smaller total compared to the credit-derived figure, but we have to consider that this only includes time meant to be spent on campus in lecture halls and computer rooms, without any additional study at home either throughout the course or later during the examination period.
Of course the Business IT programme included a wide range of subjects, and for some extent every student had the opportunity to fine-tune what courses they wanted to participate in. Naturally, I had almost made it a sport to try and include every course I could which involved some sort of programming.
Following is a table with the same metrics as above filtered to software development related courses:
+-----+---------+----------------------+---------------+-------------------+ | | Credits | Credit-derived hours | Nominal hours | Number of courses | +-----+---------+----------------------+---------------+-------------------+ | BSc | 89 | 2670 | 949 | 19 | | MSc | 69 | 2070 | 702 | 11 | | Sum | 158 | 4740 | 1651 | 30 | +-----+---------+----------------------+---------------+-------------------+
Vocational training
Being interested in machining but in a situation where I would probably not get the medical papers needed for a proper machinist or CNC operator training, I was looking for other related education programmes recently here in Hungary. One example I am particularly interested in right now, is a clock/watchmaker course which gives a state-recognized certification upon successful completion.
Let's hear the numbers: The course is advertised to include 360 hours worth of education in total. There are 112 hours of theoretical material, supplied in some kind of online e-learning format I think. Then the main body of knowledge is communicated during 31 contact days (Saturdays), each 8 hours long, with hands-on training in the workshop.
Now this is obviously not the same level as a university degree, or to be more on-topic, not really comparable to the knowledge a top-notch Swiss clockmaker has to acquire. On the other hand, according to the organizers, this is very much on-par with vocational training of the past, where young students could learn clockmaking as a profession along with their high school education. Now this is only exists in relation with other, higher demand professions, and niche crafts are taught separately in severely limited availability.
Analysis
Sure I could have gathered more data points, but I think we have enough for this article to make our conclusions.
We saw that a roughly 1 year long vocational programme incorporates 360 hours of education. A 7 semester BSc. programme, if we regard the courses related to the core profession only, stands around 950. Similarly, a 4 semester MSc. is at roughly 700 hours. If we plot this along with the linear regression line, it looks something like the following.
How can I relate these numbers to the YouTube runtime statistics I gathered? If I treated them the same, as in 1 hour to 1 hour, the 580 hours worth of EEVBlog would get me somewhere between technician and engineer level in EE, and quite similarly the 750 hours of watching machinists would make me at least a journeyman, I think. But seriously now, the question arise: Who am I fooling here? Am I fooling anyone at all?
For me the most important lesson of my studies at the University was making me aware that if I wanted to, I could learn almost anything by myself. The institutions are there only to provide proof of my knowledge in case I wanted to work on anything serious. Do I have the necessary knowledge to work in EE or as a machinist, by watching others either in-person or though video, and then tinkering in my own lab or workshop? Maybe. Do I have the necessary knowledge to work on medical, aerospace, high performance projects and the like? Probably not, and absolutely should not. Unless it were software related of course...
Last thing I wanted to emphasize is the difference between theoretical and hands-on experience. We can see that the credit-derived hours to nominal is at about 3-to-1 ratio. This is an important lesson: knowledge need to sink in, and we need to spend practicing two times the amount of lectures heard. I need to spend more time sitting before the electronics breadboard, more time standing next to the lathe, and probably even need to program some more. I cannot wait!