Encrypt sound: why there would be no music on the Internet without cryptography
In the twentieth century, one of the most secret areas of research, along with atomic and space development, was cryptography – the science of methods of protecting information. But if everyone knows about achievements in the space and atomic sphere, then not everyone knows the role of cryptography in scientific progress and in everyday life. The most important technologies and scientific theories for the 21st century originated within the walls of closed institutions and enterprises. Subsequently, they entered our daily life, although the secrecy label has not yet been removed from many archives that store the history of the first years of their development.
The connection between cryptographic developments and modernity is not at all obvious. For example, who would have thought that such developments formed the basis of the technologies thanks to which we now listen to music?
Defending speech
With the development of telephone networks, providing secure government and military voice communications has become an important issue. The first patents in this area appeared in the 19th century, but working devices were not developed until the 1930s. By the beginning of World War II, a kind of arms race had begun.
Zeros and ones
Sound is an analog wave – that is, a continuous function. In order to securely encrypt it, it must be encoded – turned into a discrete sequence, that is, digitized.
Discretization is a representation of a continuous function using a series of discrete values. Applied to sound, it is the process of converting an analog sound wave into a digital data stream by measuring (sampling, or sampling) the signal level at a specific frequency.
Scheme of the principle of sampling sound from the thesis of Trakhtman Avraam Mendelevich “Narrowing the frequency band during telephone transmission by using the features of the structure of speech”, 1941. Abraham Trakhtman – a developer in the laboratory of Kotelnikov, in the 1950s – a leading engineer at the Scientific Research Institute of Communications (Marfinskaya Sharashka). From the collection of the Museum of Cryptography
Without sampling technologies, not only digital music recording would be impossible. These days, analog-to-digital converters are used in computer audio cards, mobile phones and wireless headphones, and in many non-audio applications such as sensors, gyroscopes, servos and other devices.
The theoretical basis for discretization of continuous functions is the Kotelnikov theorem (in English literature – the Nyquist-Shannon theorem), which can be used to determine the minimum allowable sampling (measurement) frequency of a continuous function. For music, this frequency is considered to be 44,100 hertz – twice the hearing limit of the human ear (about 22,000 hertz).
Claude Shannon
In parallel, the American scientist Claude Shannon and the Soviet scientist Vladimir Kotelnikov carried out fundamental work on the problem of discretization. Due to the extreme secrecy during the war period, two great scientists who worked on the fundamental problems of communication theory, information theory and cryptography came to the same results independently, not knowing about each other’s developments.
Due to the fact that Shannon’s works were allowed for publication much earlier than Kotelnikov’s, his name became better known in the world scientific community. The primacy of Kotelnikov in many issues of information theory was recognized by the international scientific community only in the 1990s.
Vladimir Kotelnikov. Photo from the collection of the Museum of Cryptography
The first practical application of audio sampling was implemented in the SIGSALY digital speech encryption system. This system was developed in the American company Bell Telephone Laboratories in 1941-1942 for conducting secret telephone conversations at the highest level, in particular between British Prime Minister Winston Churchill and US President Theodore Roosevelt. The sound quality was extremely low – the voice came out “like Donald Duck.” At the same time, one SIGSALY terminal weighed 50 tons and occupied a separate room. Only 12 SIGSALY terminals have been installed in the world.
Installing SIGSALY. Photos from the archives of the NSA, 1940s
During this period, the USSR successfully used the Sobol-P analog device developed in Kotelnikov’s laboratory to protect government communications, which retained an acceptable sound quality. From a technical point of view, “Sobol-P” did not belong to scramblers, but to scramblers – devices that perform analog signal transformations, “mixing” its frequency and time characteristics. But in parallel with the development of Sobol, work was also underway on digital devices (the so-called clipped speech technology, which in modern terminology is something in between pulse-width modulation and delta-modulation).
In the late 1940s, employees of the Marfinsky Laboratory developed the M-803 device, the first Soviet digital speech encryption device that guaranteed the secrecy of telephone conversations on government communication lines.
Testing of the M-803 system. From the photo album “On the history of the laboratory of secrecy of the Government V.Ch. communication. 1948-1949 “USSR, Moscow. Collection of the Museum of Cryptography
The difficulty with reliable speech encryption lies in the fact that speech sounds are a large stream of information that needs to be encrypted and transmitted with minimal delay – almost in real time. All of us during the quarantine were faced with the fact that modern technology does not always cope with such a problem – when making calls via Zoom, WhatsApp or Telegram. What can we say about the first half of the twentieth century!
Vocoder
In order to reduce the amount of information that needs to be encrypted, the information can be precompressed. Vocoder was one of the first ways to compress sound.
Vocoder (from English voice encoder – voice encoder) – a device for encoding, digitizing and reproducing spoken speech. The vocoder analyzer allows you to extract and translate the main characteristics of speech – pitch, noise level, formant. On the basis of these characteristics, a sound is synthesized on the recipient’s side that reproduces speech quite legibly.
Back in the 1930s, Homer Dudley, an acoustic engineer at the American company Bell Labs, tried to solve the problem of compressing audio information. Dudley’s idea was to transmit not the sound wave itself, but its encoded characteristics during telephone conversations, keeping only the elements important for understanding the meaning, and then re-synthesize the sound on the addressee’s side.
In 1935, Dudley applied for a patent for a Signal Transmitter, which reflected the basic principles of a vocoder, and proposed using a vocoder to ensure the secrecy of telephone conversations. Dudley was able to implement a speech synthesizer, but the analyzer was very difficult.
Soviet developers knew about Dudley’s work, but approached the implementation of their device more fundamentally. Kotelnikov recalled: “… I caught my eye a link to an article by Homer Dudley, published in October 1940, where it was said that he had made a speech transformer -” Vocoder “. I rushed to look, but it turned out that nothing concrete was written there. But all the same it was very useful: he has the same idea, which means that we are on the right path. In general, we started to make our own “vocoder”. And just before the war, we already had a prototype of it working. True, while he still “spoke” badly, “in a trembling voice.”
Much later, when documents about the war period were declassified and Dudley’s patents expired, the vocoder found a new use for itself – in music. In 1959 Siemens used a vocoder in one of its musical synthesizers, and in 1968 legendary synthesizer developer Robert Moog put together his version of the vocoder. The vocoder has been used by Pink Floyd, Kraftwerk, Jean Michel Jarre, Michael Jackson, Red Hot Chili Peppers and Moby. And in Soviet animation and cinema, vocoders were used to create voices of robots and fantastic creatures. However, if you listen to all these works, it turns out that the speech in them is not very legible …
Psychoacoustics
Dudley’s simple band-pass vocoder distorted the voice a lot. The way in which this problem was solved by Soviet developers, from a modern point of view, can be called psychoacoustic. Scientists began by analyzing which elements of the audio signal are key to understanding speech, and which can be neglected. For the development in this area, specialists from various fields were attracted: not only mathematicians and engineers, but also philologists and linguists.
In the Soviet Union, work on improving the vocoder was carried out in several laboratories, including the famous Marfinsky sharashka, in which German prisoners of war and political prisoners were involved in the work. Among the prisoners in Marfino worked Alexander Solzhenitsyn and Lev Kopelev, studying the phonetics of the Russian language. Among the tasks solved by Solzhenitsyn and Kopelev was not only the identification of the speaker’s personality (around which the plot of the novel “The First Circle” is built), but also the analysis of how we distinguish between different sounds. How do we speed up speech when we speak faster than usual? What is common and different in a female, male voice, in a tongue twister, in different intonations of the same phrase? What elements of speech are key for distinguishing between meaning and intonation, and what can you do without? What frequencies of the spectrum of human speech can be cut without losing intelligibility?
This approach, based on taking into account the peculiarities of human perception of sound, is called psychoacoustic, or perceptual. It is at the heart of the most widely used audio compression method, the MP3 format. In MP3, audio compression occurs at the expense of reducing the accuracy of those parts of the audio stream that are not key for their perception by the average person.
Thanks to research carried out in Marfino, Soviet vocoders were finely calibrated to those frequencies that are key to distinguishing the meaning of speech. For example, the M-803 vocoder, developed in the late 1940s, contained a separate block for capturing individual speech intonation, as well as a number of filters that capture different phonemes. In 1953, the M-803-5 modification was used on the Moscow-Wünsdorf communication line, and a year later the equipment provided secure communication during the Berlin meeting of the foreign ministers of the USSR, the USA, Great Britain and France.
M-803 is the first Soviet digital speech encryption device. Photo album “To the history of the laboratory of secrecy of the Government V.Ch. communication. 1948-1949 “USSR, Moscow. Collection of the Museum of Cryptography
In the digital age, digital speech analysis algorithms have been developed based on the vocoder principle, allowing much more accurate speech coding. In 1966, Japanese engineers proposed the linear predictive coding (LPC) algorithm, which is still used to code speech in many cellular and Internet communication standards, including popular applications such as Discord, WhatsApp and Skype.
13
Uncompressed voice recording spectrogram.
The first Russian Museum of Cryptography, which will open in Moscow in the fall of 2021, will tell you more about speech encryption technologies, encryption techniques and much more.