by Y-my-R » Sat Feb 03, 2024 8:15 pm
The general rule is:
- Dedicated Word Clock BNC connections need to be terminated via a 75 Ohm Terminator on both ends. If daisy-chaining devices, only devices at the very ends of the chain should be terminated with a 75 Ohm Terminator. Devices that aren't at the end of a chain (i.e. a device in the "middle" if daisy chaining 3 devices) should NOT be terminated.
- Some devices have termination built-in, on their BNC connectors. Some don't. Check the manual, and if it doesn't say (which they VERY often do not), contact the manufacturer if they're internally terminated or not.
- Some devices allow you to switch the built-in termination on or off as needed. For example, the Mackie HDR/MDR has a button for this on the rear panel.
- Some other devices do NOT allow to turn termination on or off and always have it on. The Apogee Clock card, for example, is such a device. You would have to physically modify the card, if you would want to use the Apogee Clock card at any other position in a daisy-chain than the very ends of it, because if it were positioned somewhere "in the middle" the termination should not be active, and will likely cause problems if it is.
- MANY other devices are NOT internally terminated but also do NOT allow you to turn the termination on. This makes it easy to use such devices "in the middle" of a word clock daisy chain, since you don't have to do anything to use them that way. However, if you want to use a non-terminated device with a BNC word clock connector at the very end of a word clock daisy chain, you need to add a T-Piece and a terminator.
- Digital audio connections that also contain the audio data (and not ONLY word clock) do NOT give any termination options. So, none of the above applies to ADAT, S/PDIF, TDIF or AES/EBU connections.
- As mentioned in an earlier post above, most digital audio devices that connect to a computer-host (e.g. most computer audio devices), allow you synchronize to an external, incoming embedded word clock signal that is part of the ADAT and S/PDIF and TDIF and AES/EBU data stream. Many other non-Computer devices also allow this - such as your HD24.
(That the Apogee clock card can't do this, is more of an exception than the rule... or much more, it's SO FRICKEN OLD that it didn't do all the stuff, yet, that was pretty commonplace for most newer devices. At least that's my take on it).
- Syncing off of the embedded word clock signal that is embedded in the ADAT or S/PDIF etc. data stream is perfectly fine, for devices that allow this. However, I do think (this is an assumption, not actual knowledge) that the word clock "pulse" is simply derived from the frequency of when the "audio data packages" arrive at the receiving device. These "data packages" can have different sizes. Typically 16-bit or 24-bit of "size" for each such "pulse" that arrives at a frequency of 44,100 such pulses per second, or 48,000 pulses per second (for devices that only support 44.1 or 48 kHz sample rates, like the old devices we usually talk about on this forum).
So, it's the incoming "data packages" that create the sort of "square wave" that a dedicated word clock connection via BNC usually sends for "pulse on" and "pulse off." For devices carrying such "data packages" it would look more like "actively receiving data = pulse on" vs "not receiving data/transmission-silence = pulse off"... and that would alternate at the frequency of the sample rate (or double, if you want to count the no-data/data-silent parts separately).
This concludes the "dedicated word clock" vs "embedded word clock in audio data streams" bit I wanted to share.
Since we're already "almost" there, let's talk about what the frequency, or "sample frequency" actually does, to give a more complete picture of what's happening in a digital audio transfer:
For digital audio signals (with embedded word clock), such as ADAT, S/PDIF, etc., each such "pulse" as described above, consists of a single "data package" or "sample" of a certain size (16 or 24 bit, usually), that are sent at a certain speed (i.e. 44,100 such "samples" or "data packages" sent per second... or 48,000 for the 48 kHz sample rate).
The data package size, determines the dynamic range, between the loudest, and the quietest signal that can be reproduced, because of the amount of data that is available to represent this (16 bit or 24 bit).
Now let's think about what we're actually trying to "describe" with this type of data. For simplicity, let's thing about a sine wave in the "real world" that comes out of your speakers, while examining what the speaker cone actually does, when playing back that (analog at that point) sine-wave. At the "zero" point, the speaker cone is in a resting position - so, it isn't pushed out, nor "sucked in" by the voice-coil/magnet that is attached to it.
Now while playing the sine wave an looking at it in VERY SLOW motion, the speaker pushes OUTWARD while the slope of the sine-wave rises... then travels back INWARDS after the sine wave reaches its top and then returns to the "zero/resting" point, but continues to essentially get "sucked in" further than the resting position, as the sine wave starts reaching BELOW the zero/resting point of the speaker.
So, a sine wave makes the speaker cone travel outward from the zero point, and inward from the zero point, along with the position of the signal at the sine-wave (above/below the zero aka "resting point" line).
Now, we want to "digitize" such a sine wave to make it usable in computers or digital devices. Because of how "digital (aka 1s and 0s) work to describe data, we can't reproduce a fully "smooth" sine wave as it would be created by an analog oscillator. What is being done instead, is to describe many small sections of the sine-wave in a row, to more or less give the "position" above or below the "resting point/zero-line" in our speaker example, above.
If you think of a sine wave like a nice and even hill in the landscape, you'd essentially have to built "steps" into the hill that are all the same height, to go up the hill on one side, and down the hill on the other side (...and technically, also down into a deep valley that is as deep as the hill is high... but for simplicity, let's just look at the "above the zero/resting-point" for this "hill" example).
The "height" of the "stair-steps" represents our "sample rate." If you carve 44,100 steps into our hill, and look at it from a certain distance, the "steps" have a certain visible size - the hill, at least where our "steps" are, is not longer "smooth" but there are 44,100 steps that "describe" how quickly the hill rises (or falls on the other side).
Now, if you'd carve twice as many but "half as high" steps into the same hill of the same physical height, you'd have to carve 88,200 steps into the hill. The result is a much "smoother" looking representation of the hill via those steps. That's the difference between different sample rates (and the reason why 44,100 steps vs 48,000 steps don't make a big difference in terms of quality).
I don't want to move too far away from this example, but the more steps, the higher the (analog) audio frequencies you can capture or represent. So, even though 44.1 kHz "should" in theory cover the roughly 20 kHz max the human ear can hear on the high end, a SIGNIFICANTLY higher sample rate could theoretically reproduce the "hill" much smoother in the digital realm and capture higher frequencies, clearer harmonic overtones and create less disharmonic distortion.
(Personally, I think that recording at 44.1/48 kHz is perfectly fine, though... most people (...or ALL in my example) who claim to hear a difference, still fail a 44.1 vs 88.2 / 48 vs 96 blind test every single time... we tried that at a company I used to work for... was pretty funny... the arrogance about that dropped by a lot, afterwards, hahaha).
So, the more steps we got to describe "our hill" or our sine wave, the smoother our digital "waveform" representation of it will be, and the higher the audio frequency and clearer the high-frequency overtones we can capture.
So at the maximum travel outward or inward, that our speaker can go at the furthest before it's overextended and no longer sounds as it should (i.e. distortion/break-up, etc.), we have reached our maximum "dynamic range" - or the "loudest point" at the peak of our sine-wave/top-of-the-hill/maximum-speaker-extension.
Now, you might ask... are all the hills the same size? They certainly aren't in the real world. Good point!
Well, no matter how high the hill is, you still only have either 44,100 or 48,000 steps to represent and describe it "digitally."
What if you want to describe a GIANT mountain instead of a hill... well, you ALWAYS want to describe the peak of the mountain... otherwise, you'd have to "cut" our representation off at the top, when running out of "steps" available to describe it. So, what you have to do, is to start describing the mountain from a bit up (aka turn down the gain in the real world)... rather than from the "flat land" on the bottom, so you can still reach the top.
...and that's where the "package size" or "bit depth" of each transmitted sample comes in. Each sample can essentially say "I'm so-and-so far away from the bottom of the hill - or much more from the starting point of our description of the hill."
A larger bit-depth/package size can describe a greater distance without running out of available data-space to describe it, while a lower bit-depth can't cover as much "height."
That is our dynamic range. The more data that is contained in each of the samples/data-packages (16 or 24 bit) that get transmitted 44,100 or 48,000 times a second, the more "dynamic range" can be reproduced.
Now, a common misconception is, that a higher bit-depth allows you to capture LOUDER signals. But that's not the case. In digital audio, there's an absolute "maximum loud" ceiling that you can't go above. And if you hit that ceiling, the top of the hill would appear "flat" while it isn't. And in digital audio, that means that the speaker would essentially "stand still" at it's maximum extension for a period of time, instead of moving out and in... and that results in at least ugly "digital distortion" of if you keep doing it for extended periods of time, speaker damage (the voice coil would overheat from continuous DC voltage).
So, what do you do in digital audio to avoid this? You don't start measuring from the "flat land" below (i.e. absolute silence), but you go a bit up the hill/mountain (aka turn down your gain or "trim" the signal), so you can reach and describe the peak of the mountain, without cutting it "flat" on the top (via hitting the bad and absolute digital ceiling).
So, the bit depth is essentially how much "distance" you have available to describe your hill/mountain/sine-wave. And that "distance" how far your sine-wave can travel up and down that you can describe digitally, is your dynamic range.
This means that you don't have to climb as "far up the hill" (aka into audible and wanted audio territory) to start describing the way up the hill and the top, but you can start further down the hill and cover more distance. Ideally, you'd be able to start all the way down from flat land (i.e. total silence).
Any time you have to "climb up the hill" a bit where there's already audio present, you essentially have quieter audio information disappear in background noise (e.g. as is created by the number of steps available we talked about earlier... the fewer steps, the more audible "quantization noise" that creeps into the picture from the "flat land" side of this description. Not counting the noise floor created by devices like microphones, etc., which often is higher/louder than the digital noise floor and available dynamic range... at least at 24-bit).
In other word, a higher bit depth of 24-bit allows you to capture QUIETER signals before they disappear in the (digital) noise floor (after adjusting your gain to make sure you never-ever-ever-ever have the hill be higher/louder than the dynamic range you have available, since it would cut it "flat" on the top, and make your "speaker" stand still at maximum extension, which is deadly/causes clicks/ugly noise. That's why you should NEVER allow the clip LED turn on on digital meters. It usually means that at least 2 samples in a row, where hitting maximum and create a "flat top" - and that's even worse than the hair style, hahaha).
OK... that was quite the ramble, and I think I repeated myself a bunch of times in the process. But quite honestly, I don't want to go back and re-read what I just typed to clean it up. I hope that analogy is still kind of useful, and maybe makes it easier to picture how digital audio works, by comparing a simple analog sine wave, with speaker movements and hills/mountains and steps up and down (sample rate), while considering the maximum available distance (bit depth) you can "record" digitally, as an analogy.
And to come back to the original question... in ADAT or S/PDIF or other digital signals that contain audio data, the word clock is derived from the frequency of the incoming samples (44.1 kHz/48kHz) instead of being a "separate" square wave, as transmitted over a BNC word clock cable.
Only BNC word clock connections have to deal with 75 Ohm termination in that sense. When syncing off the word clock signal that is derived from digital audio connections that also carry the audio information, no such separate termination is available/necessary.
So... sure... if you connect a BNC cable from the D8B/Apogee Clock card (BNC WC-Master) directly to the BNC on the HD24 (BNC-WC Slave), and IF the HD24 is internally terminated or allows you to switch that on, that's how to set that up right.
If you want to set the HD24 to sync as word clock slave from the signal coming into it via the ADAT optical ports, you can do that, too. In that case, you don't need to connect the BNC word clock cables at all and don't have to think about termination.
Again, sorry about the ramble. I hope it's still kinda useful.