
For real-time application, the expected delay through the resampler would be 7.9 ms. To do this as shown with 20 MHz audio bandwidth and 80 dB resampling image rejection, I estimate that 171 taps would be needed for FIR1, 95 taps for FIR2 and 25 taps for FIR3 (as linear phase filters so one multiplier for every 2 taps). The intermediate blocks can run at any arbitrary higher sampling rate to keep up with the throughput and the input/output blocks are rate matched (consuming samples at 44.1KSps and providing output samples at the 48 KSps rate). This would be implemented with the following structure where the interpolator blocks signify insert of $I$ samples between each sample and the decimation blocks signify selecting every $D$th sample and throwing away the rest. Interp by 4, decimate by 3, interp by 8, decimate by 7, interp by 5, decimate by 7. The following demonstrates one approach to resample from 44.1KHz to 48KHz, where care has been taken to not reduce the sampling rate below 44.1KHz (if that matters for fidelity concerns) and the multiple stages simplifies the filtering needed: The greatest common divisor between the two rates is 300, thus to resample this exactly from 44.1KHz to 48KHz you would need to use the ratio $160/147$ (and the inverse for the other direction): However if the transmitter and receiver are not synchronized then buffering will ultimately be needed (as further detailed at the end of this post). The consumption time and transmission time is identical: One second of data is still one second of data regardless of sampling time.
