The paper investigates new machine listening technologies through a comparison of phenomenological and empirical/media-archeological approaches. While phenomenology associates listening with subjectivity, empiricism takes into account the technical operations involved with listening processes in both human and non-human apparatuses. Based on this theoretical framework, the paper undertakes a media-archeological investigation of two algorithms employed in copyright detection: “acoustic fingerprinting” and “audio watermarking”. In the technical operations of sound recognition algorithms, empirical analysis suggests the coexistence of a multiplicity of spatialities: from the “sound event”, which occurs in three-dimensional physical space, to its mathematical representation in vector space, and to the one-dimensional informational space of data processing and machine-to-machine communication. Recalling Deleuze’s definition of “the fold”, we define these coexistent spatial dimensions in techno-culturally mediated sound as “the folded space” of machine listening. We go on to argue that the issue of space in machine listening consists of the virtually infinite variability of the sound event being subjected to automatic recognition. The difficulty lies in conciliating the theoretically enduring information transmitted by sound with the contingent manifestation of sound affected by space. To make machines able to deal with the site-specificity of sound, recognition algorithms need to reconstruct the three-dimensional space on a signal processing level, in a sort of reverse-engineering of the sound phenomenon that recalls the concept of “implicit sonicity” defined by Wolfgang Ernst. While the metaphors and social representations adopted to describe machine listening are often anthropomorphic – and the very term “listening”, when referring to numerical operations, can be seen as a metaphor in itself – we argue that both human listening and machine listening are co-defined in a socio-technical network, in which the listening space no longer coincides with the position of the listening subject, but is negotiated between human and nonhuman agencies.
The folded space of machine listening
Napolitano D.
;
2021-01-01
Abstract
The paper investigates new machine listening technologies through a comparison of phenomenological and empirical/media-archeological approaches. While phenomenology associates listening with subjectivity, empiricism takes into account the technical operations involved with listening processes in both human and non-human apparatuses. Based on this theoretical framework, the paper undertakes a media-archeological investigation of two algorithms employed in copyright detection: “acoustic fingerprinting” and “audio watermarking”. In the technical operations of sound recognition algorithms, empirical analysis suggests the coexistence of a multiplicity of spatialities: from the “sound event”, which occurs in three-dimensional physical space, to its mathematical representation in vector space, and to the one-dimensional informational space of data processing and machine-to-machine communication. Recalling Deleuze’s definition of “the fold”, we define these coexistent spatial dimensions in techno-culturally mediated sound as “the folded space” of machine listening. We go on to argue that the issue of space in machine listening consists of the virtually infinite variability of the sound event being subjected to automatic recognition. The difficulty lies in conciliating the theoretically enduring information transmitted by sound with the contingent manifestation of sound affected by space. To make machines able to deal with the site-specificity of sound, recognition algorithms need to reconstruct the three-dimensional space on a signal processing level, in a sort of reverse-engineering of the sound phenomenon that recalls the concept of “implicit sonicity” defined by Wolfgang Ernst. While the metaphors and social representations adopted to describe machine listening are often anthropomorphic – and the very term “listening”, when referring to numerical operations, can be seen as a metaphor in itself – we argue that both human listening and machine listening are co-defined in a socio-technical network, in which the listening space no longer coincides with the position of the listening subject, but is negotiated between human and nonhuman agencies.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.