5G Networked Music Performances - Will It Work?
5G is the new paradigm of telecommunication and wireless networking. But what is so great about 5G? For the average user, the biggest difference from 4G to 5G will be a massive increase in speed (bandwidth). In reality, 5G is an entirely new networking infrastructure. This is one reason why there are so many high expectations of 5G. We imagine remote-controlled surgery from afar, distributed sporting events, and even stable networked music performances (NMP).
At MCT, we have been experimenting with NMPs for several years. One of the biggest challenges with playing music over the network is the exceptionally high demand for low-latency communication [2] [3] [4]. In fact, research tells us that the ideal roundtrip latency (back and forth between two locations) for synchronous music performances is around 25-30ms, with the maximum we can tolerate at around 40-50ms [5]. For some perspective:
Demo - 30ms audio delay from left to right ear:
In late 2021, we got in touch with Telenor to explore the feasibility of conducting NMPs over 5G. Telenor is currently involved in multiple EU-funded projects that, in part, explore the application of 5G to music technology, including Fudge5G and 5GMediaHub. During a two-week period in late March 2022, we did a series of experiments on a commercial and private 5G network in collaboration with Telenor Research. In this post, we present these experiments in detail, explaining the technical setup, methods, and preliminary results.
Setup
We conducted two experiments in two separate locations on two different 5G networks.
Commercial 5G Experiment
The first experiment was carried out at the Musicology Department at the University of Oslo, on March 30th 2022, on a first-generation commercial 5G network. Most commercial 5G networks today rely on a so-called Non-Stand-Alone (NSA) core network, the same as 4G. However, Telenor pre-configured our routers to access specific Access-Point-Names (APNs). This configuration enabled Peet To Peer (P2P) connectivity between our machines with faster packet routing.
Private 5G Experiment
The second experiment was carried out in Elverum, close to Terningmoen Army Base, on April 4th 2022, on a 5G Network-on-Wheels (5GNoW) solution. The 5GNoW is a private 5G network, or Non-Public Network (NPN), that relies on a Stand-Alone (SA) core network. As we understood, these kinds of systems are mostly used for experimental testing of various 5G applications in the field.
Hardware and Software
For both experiments, we used a pair of Huawei H138-380 CPE Pro 3 5G Routers to connect to the network.
To send audio and video back and forth, we used our own portable and custom-built NMP systems. These racks are essentially bundles of high-end software, audio/video peripherals and networking tools that can provide the lowest possible latency on audio/video transmissions over the network, given that all other the pieces of the puzzle are correct. Full documentation and more detailed info about these systems is available here.
For NMPs, our go-to AV transmission software is LoLa (Low Latency AV STreaming System). This high-end application was developed at the Trieste Conservatory (Italy) in collaboration with GARR, the Italian Research and Academic Network. To provide ultra-low latency, Lola requires high-end GPU-equipped PCs, soundcard with very stable ASIO drivers (that support buffer sizes of 32 and 64 samples), and specialized Ximea video cameras.
In addition to Lola, we used the JACK2 and JackTrip bundle as our secondary software. JackTrip is another popular audio transmission application developed by CCRMA at Stanford University (USA). JackTrip is audio-only and accomodates a wider range of soundcards and buffer sizes. This is “bad” for latency optimization but essential to tolerate more unstable connections and network jitter.
Experiments
To explore to what extent 5G can accomodate NMPs, we measured the stability, quality, and latency of roundtrip audio and video signals using Lola and JackTrip. In any real NMP scenario, we only care about technical configurations that render stable AV transfer over time with a minimal dropouts and other unwanted artifacts. Therefore, we only measured signal latency when the best possible tradeoff between stability and quality was found. We did three tests in each experiment:
1) Measuring the Network Coverage and Bandwidth
By using the iPerf networking utilities, the Huawei routers’ own location-optimizing software, Telenor’s online coverage map, and Ookla’s online speedtester, we were able to make network bandwidth and coverage estimates throughout the experiments, ensuring that our load did not exceed the capacity of the network.
2) Finding the Sweet-spots
To find the best tradeoff between stability and quality, we sent a constant stream of audio and video over the network and looped the signals back to their source, as depicted in Figure 1. With this, we were able to monitor the AV quality of our connections in real-time.
To fine-tune the audio, we adjusted software and hardware buffer sizes to locate the lowest possible configuration that ensured a stable audio transmission over a significant period (maybe 10minutes total). For the video, we used a similar a approach, only adjusting the framerate, compression (M-JPEG) amount, and video resolution to find the sweet-spot.
3) Measuring the Latency
With the software and hardware parameters fine-tuned, we measured the audio and video latency with a similar loopback system:
We measured the audio latency in two steps:
- Digital roundtrip time (digital RTT)
With digital RTT, we refer to the measurement of audio latency from software to software (or PC to PC), and back again. With this method, we bypassed the latency induced by our external soundcards and mixers. For the measurements, we used jackTrip in P2P mode. By utilizing the -x1
argument client-side, we were able to record and monitor the digital RTT in real-time.
- Analog roundtrip time (analog RTT)
With analog RTT, we refer to the measurement of audio latency through the entire chain depicted in Figure 2. To make these measurements, we used another laptop with a designated audio interface. From this secondary laptop/soundcard we sent audio impulses from output 2 to the NMP kits and received the signal back again on input 2. For reference, we closed output 1 to input 1 on the soundcard and sent identical audio impulses to output 1. Then, in software, we measured the analog RTT by looking at the temporal offset between inputs 1 and 2.
For video, we took advantage of the fact that our two NMP kits were in the same room. The measure the latency, we sent a Ximea video feed of me doing some claps 👏 from one NMP kit to the other. While displaying the video feeds in full-screen on both computer monitors, we filmed the monitors with a secondary camera. Then, we used the footage from the secondary camera to determine the video latency by counting the offset in frames between the two monitors.
Results
Commercial 5G Experiment
Inside the Musicology building at UiO, the 5G reception was poor. After inspecting Telenor’s coverage map of our location, we decided to place the routers outside and pre-configured them to be in Bridge mode, hoping it would generate better coverage, create a more stable connection between our routers, and boost overall performance. According to the routers’ location-optimizing software, we achieved a stable 75% 5G coverage at this location. From here, we measured a stable 60Mbps bandwidth.
The transmission sweet-spot for audio was achieved using jackTrip with a buffer size of 512. Unfortunately, experimenting with lower buffer sizes only resulted in massive jittery audio and dropouts. We found the optimal stereo audio settings to be the following:
Using the above configuration, we measured a 110ms digital RTT and a 165ms analog RTT of the commercial 5G network at UiO. Considering that buffering 512 samples at 48Khz takes 10ms, and that during our analog RTT measurement audio had to pass through our soundcards a total of 4 times, this kind of delta between digital and analog RTT was expected.
Demo - 165ms audio delay from left to right ear:
For video, we were able to use the Ximea low-latency cameras with Lola as the software utilizes a different buffering strategy for video transfer. After experimenting with various settings and buffering tools, we achieved a stable transmission with some drops (mostly not visible to human eyes) using the following settings:
With this configuration, we measured the one-way video latency between our machines to be approximatley 7 frames. At 60FPS, this is equal to a 116ms latency one-way, or 232ms RTT.
Private 5GNoW Experiment
At Terningmoen, Elverum, time was of the essence. We only had about 3-4 hours to set up our equipment, configure the network, and do our tests. From the start, we ran into unexpected issues in the 5G modem setup with the 5GNoW van. Also, establishing UDP and TCP/UDP connections over the network was unusually slow, illuminating further network issues. Therefore, we had to settle for limited bandwidth of approximately 13-14Mbps, enough to experiment with audio.
The transmission sweet-spot for audio was achieved using jackTrip with a buffer sizes of 256 and 512. At 256, we got an audibly ok quality audio, but one that was unstable over time with noticeable dropouts. At 512, the audio was clear and stable over a significant period. We found the optimal stereo audio settings to be the following:
Using the above configurations we measured a 55-60ms digital RTT and an analog RTT at 74ms when using a buffer size of 256. Although impressive, this configuration rendered borderline audio quality that would be unpleasant in the long run. Using a buffer size of 512, we measured a 90-100ms digital RTT (similar to experiment nr.1, only slightly faster) and thus an analog RTT at about 140ms.
Demo - 74ms audio delay from left to right ear:
Summary and Concluding Thoughts
During a two-week period in late March 2022, we investigated the feasibility of conducting NMPs over commercial and private 5G networks, in collaboration with Telenor Research. On two separate occasions, we measured the stability, quality, and latency of transferring uncompressed audio and compressed video over the networks with high-end hardware and software utilities. When accepting borderline conditions on a private 5G network (5GNoW), we managed to push the audio RTT latency down to 75ms. However, in more realistic conditions on a first-generation commercial 5G, we achieved an analog RTT audio latency of 165ms and 116ms one-way latency for video.
Compared with audio RTT benchmarks mentioned in the introduction (between 25-50ms), our 5G test results were not particularly promising. However, I believe we could conduct successful NMPs over 5G if we could get the audio analog RTT down between 50-70ms. There are many documented strategies for coping with latency-rich environments [1]. In fact, MCT students have recently explored some of these strategies in detail (read more here). Also, considering that we had unfortunate testing conditions in Elverum, there is reason to be optimistic about achieving better RTT audio and video scores if we manage to resolve these issues at a later stage.
On the other hand, although we can do more testing, some things we cannot mitigate. For instance, we cannot change how the 5G protocol is written, to some extent how our routers/modems choose to buffer, or the inherent instability of using wireless networks. Because of this, we have to be practical and make the best with what we can get. Moreover, lower latency should be possible when the Ultra-Reliable Low Latency Communications (URLLC) feature of 5G is implemented on future Telenor networks.
Going Forward
Our plans are now to do more testing at Telenor’s Oslo Hub at Fornebu in the summer of 2022. As discussed, we hope that we can resolve the private 5G networking issues and improve on our results from Elverum. We look forward to doing more tests and will keep you posted on the results.
References
[1] Carôt, A., & Werner, C. (2009). Fundamentals and principles of musical telepresence. Journal of Science and Technology of the Arts, 1(1), 26-37. https://doi.org/10.7559/citarj.v1i1.6
[2] Chafe, C., Gurevich, M., Leslie, G., & Tyan, S. (2004). Effect of Time Delay on Ensemble Accuracy. ISMA. Center for Computer Research in Music and Acoustics, Stanford University
[3] Rofe, M., & Reuben, F. (2017). Telematic performance and the challenge of latency. Journal of Music, Technology and Education, 10(2–3), 167–183
[4] Rottondi, C., Chafe, C., Allocchio, C, Sarti A. (2016). An overview on networked music performance technologies. IEEE Access 4: 8823-8843.
[5] Schuett, N. (2002). The Effect of Latency on Ensemble Performance, Technical Report at CCRMA
Department of Music, Stanford University, Stanford, USA.