The MuTant Team-Project is a Joint project between Inria, CNRS, UPMC and IRCAM hosted by IRCAM-Centre Pompidou at the heart of Paris. The MuTant team is a scientific research project at the intersection of the sciences and the arts, and specialized in music.
The research conducted in MuTant is devoted both to leveraging capabilities of musical interactions between humans and computers, and to the development of tools to foster the authoring of interaction and time in computer music. Our research program departs from Interactive music systems for computer music composition and performance introduced in mid-1980s at Ircam. Within this paradigm, the computer is brought into the cycle of musical creation as an intelligent performer and equipped with a listening machine capable of analyzing, coordinating and anticipating its own and other musicians' actions within a musically coherent and synchronous context. Figure illustrates this paradigm. The use of Interactive Music Systems have become universal ever since and their practice has not ceased to nourish multidisciplinary research. From a research perspective, an interactive music systems deals with two problems: realtime machine listening , or music information retrieval from musicians on stage, and music programming paradigms reactive to the realtime recognition and extraction. Whereas each field has generated subsequent literature, few attempts have been made to address the global problem by putting the two domains in direct interaction.
In modern practices, the computer's role goes beyond rendering pre-recorded accompaniments and is replaced by concurrent, synchronous and realtime programs defined during the compositional phase by artists and programmers. This context is commonly referred to as Machine Musicianship where the computer does not blindly follow the human but instead has a high degree of musical autonomy and competence. In this project, we aim at developing computer systems and language to support real-time intelligent behavior for such interactions.
MuTant's research program lies at the intersection and union of two themes, often considered as disjoint but inseparable within a musical context:
Realtime music information retrieval and processing
Synchronous and realtime programming for computer music
When human listeners are confronted with musical sounds, they rapidly and automatically find their way in the music. Even musically untrained listeners have an exceptional ability to make rapid judgments about music from short examples, such as determining music style, performer, beating, and specific events such as instruments or pitches. Making computer systems capable of similar capabilities requires advances in both music cognition, and analysis and retrieval systems employing signal processing and machine learning.
In a panel session at the 13th National Conference on Artificial Intelligence in 1996, Rodney Brooks (noted figure in robotics) remarked that while automatic speech recognition was a highly researched domain, there had been few works trying to build machines able to understand “non-speech sound”. He went further to name this as one of the biggest challenges faced by Artificial Intelligence . More than 15 years have passed. Systems now exist that are able to analyze the contents of music and audio signals and communities such as International Symposium on Music Information Retrieval (MIR) and Sound and Music Computing (SMC) have formed. But we still lack reliable Real-Time machine listening systems.
The first thorough study of machine listening appeared in Eric Scheirer's PhD thesis at MIT Media Lab in 2001 with a focus on low-level listening such as pitch and musical tempo, paving the way for a decade of research. Since the work of Scheirer, the literature has focused on task-dependent methods for machine listening such as pitch estimation, beat detection, structure discovery and more. Unfortunately, the majority of existing approaches are designed for information retrieval on large databases or off-line methods. Whereas the very act of listening is real-time, very little literature exists for supporting real-time machine listening. This argument becomes more clear while looking at the yearly Music Information Retrieval Evaluation eXchange (MIREX), with different retrieval tasks and submitted systems from international institutions, where almost no emphasis exists on real-time machine listening. Most MIR contributions focus on off-line approaches to information retrieval (where the system has access to future data) with less focus on on-line and realtime approaches to information decoding.
On another front, most MIR algorithms suffer from modeling of temporal structures and temporal dynamics specific to music (where most algorithms have roots in speech or biological sequence without correct adoption to temporal streams such as music). Despite tremendous progress using modern signal processing and statistical learning, there is much to be done to achieve the same level of abstract understanding for example in text and image analysis on music data. On another hand, it is important to notice that even untrained listeners are easily able to capture many aspects of formal and symbolic structures from an audio stream in realtime. Realtime machine listening is thus still a major challenge for artificial sciences that should be addressed both on application and theoretical fronts.
In the MuTant project, we focus on realtime and online methods of music information retrieval out of audio signals. One of the primary goals of such systems is to fill in the gap between signal representation and symbolic information (such as pitch, tempo, expressivity, etc.) contained in music signals. MuTant's current activities focus on two main applications: score following or realtime audio-to-score alignment , and realtime transcription of music signals with impacts both on signal processing using machine learning techniques and their application in real-world scenarios.
The second aspect of an interactive music system is to react to extracted high-level and low-level music information based on pre-defined actions. The simplest scenario is automatic accompaniment, delegating the interpretation of one or several musical voices to a computer, in interaction with a live solo (or ensemble) musician(s). The most popular form of such systems is the automatic accompaniment of an orchestral recording with that of a soloist in the classical music repertoire (concertos for example). In the larger context of interactive music systems, the “notes” or musical elements in the accompaniment are replaced by “programs” that are written during the phase of composition and are evaluated in realtime in reaction and relative to musicians' performance. The programs in question here can range from sound playback, to realtime sound synthesis by simulating physical models, and realtime transformation of musician's audio and gesture.
Such musical practice is commonly referred to as the realtime school in computer music, developed naturally with the invention of the first score following systems, and led to the invention of the first prototype of realtime digital signal processors and subsequents , and the realtime graphical programming environment Max for their control at Ircam. With the advent and availability of DSPs in personal computers, integrated realtime event and signal processing graphical language MaxMSP was developed at Ircam, which today is the worldwide standard platform for realtime interactive arts programming. This approach to music making was first formalized by composers such as Philippe Manoury and Pierre Boulez, in collaboration with researchers at Ircam, and soon became a standard in musical composition with computers.
Besides realtime performance and implementation issues, little work has underlined the formal aspects of such practices in realtime music programming, in accordance to the long and quite rich tradition of musical notations. Recent progress has convinced both the researcher and artistic bodies that this programming paradigm is close to synchronous reactive programming languages, with concrete analogies between both: parallel synchrony and concurrency is equivalent to musical polyphony, periodic sampling to rhythmic patterns, hierarchical structures to micro-polyphonies, and demands for novel hybrid models of time among others. Antescofo is therefore an early response to such demands that needs further explorations and studies.
Within the MuTant project, we propose to tackle this aspect of the research within two consecutive lines:
Development of a Timed and Synchronous DSL for Real Time Musician-Computer Interaction: The design of relevant time models and dedicated temporal interactions mechanisms are integrated in the ongoing and continuous development of the Antescofo language. The new tools are validated in the production of new musical pieces and other musical applications. This work is performed in strong coupling with composers and performers. The PhD works of José Echeveste (computer science) and Julia Blondeau (composer) take place in this context.
Formal Methods: Failure during an artistic performance should be avoided. This naturally leads to the use of formal methods, like static analysis, verification or test generation, to ensure formally that Antescofo programs will behave as expected on stage. The checked properties may also provide some assistance to the composer especially in the context of “non deterministic score” in an interactive framework. The PhD of Clément Poncelet is devoted to model based testing methods in this context.
While operating systems shield the computer hardware from all other software, it provides a comfortable environment for program execution and evades offensive use of hardware by providing various services related to essential tasks. However, integrating discrete and continuous multimedia data demands additional services, especially for real-time processing of continuous-media such as audio and video. To this end interactive systems are sometimes referred to as off-the-shelf operating systems for real-time audio. The difficulty in providing correct real-time services has much to do with human perception. Correctness for real-time audio is more stringent than video because human ear is more sensitive to audio gaps and glitches than human eye is to video jitter . Here we expose the foundations of existing sound and music operating systems and focus on their major drawbacks with regards to today practices.
An important aspect of any real-time operating system is fault-tolerance with regards to short-time failure of continuous-media computation, delivery delay or missing deadlines. Existing multimedia operating systems are soft real-time where missing a deadline does not necessarily lead to system failure and have their roots in pioneering work in . Soft real-time is acceptable in simple applications such as video-on-demand delivery, where initial delay in delivery will not directly lead to critical consequences and can be compensated (general scheme used for audio-video synchronization), but with considerable consequences for Interactive Systems: Timing failure in interactive systems will heavily affect inter-operability of models of computation, where incorrect ordering can lead to unpredictable and unreliable results. Moreover, interaction between computing and listening machines (both dynamic with respect of internal computation and physical environment) requires tighter and explicit temporal semantics since interaction between physical environment and the system can be continuous and not demand-driven.
Fulfilling timing requirements of continuous media demands explicit use of scheduling techniques. As shown earlier, existing Interactive Music Systems rely on combined event/signal processing. In real-time, scheduling techniques aim at gluing the two engines together with the aim of timely delivery of computations between agents and components, from the physical environment, as well as to hardware components. The first remark in studying existing system is that they all employ static scheduling, whereas interactive computing demands more and more time-aware and context-aware dynamic methods. The scheduling mechanisms are neither aware of time, nor the nature and semantics of computations at stake. Computational elements are considered in a functional manner and reaction and execution requirements are simply ignored. For example, Max scheduling mechanisms can delay message delivery when many time-critical tasks are requested within one cycle . SuperCollider uses Earliest-Deadline-First (EDF) algorithms and cycles can be simply missed . This situation leads to non-deterministic behavior with deterministic components and poses great difficulties for preservation of underlying techniques, art pieces, and algorithms. The situation has become worse with the demand for nomad physical computing where individual programs and modules are available but no action coordination or orchestration is proposed to design integrated systems. System designers are penalized for expressivity, predictability and reliability of their design despite potentially reliable components.
Existing systems have been successful in programing and executing small system comprised of few programs. However, severe problems arise when scaling from program to system-level for moderate or complex programs leading to unpredictable behavior. Computational elements are considered as functions and reaction and execution requirements are simply ignored. System designers have uniformly chosen to hide timing properties from higher abstractions, and despite its utmost importance in multimedia computing, timing becomes an accident of implementation. This confusing situation for both artists and system designers, is quite similar to the one described in Edward Lee's seminal paper “Computing needs time” stating: “general-purpose computers are increasingly asked to interact with physical processes through integrated media such as audio. [...] and they don't always do it well. The technological basis that engineers have chosen for general-purpose computing [...] does not support these applications well. Changes that ensure this support could improve them and enable many others” .
Despite all shortcomings, one of the main advantages of environments such as Max and PureData to other available systems, and probably the key to their success, is their ability to handle both synchronous processes (such as audio or video delivery and processing) within an asynchronous environment (user and environmental interactions). Besides this fact, multimedia service scheduling at large has a tendency to go more and more towards computing besides mere on-time delivery. This brings in the important question of hybrid scheduling of heterogeneous time and computing models in such environments, a subject that has had very few studies in multimedia processing but studied in areas such simulation applications. We hope to address this issue scientifically by first an explicit study of current challenges in the domain, and second by proposing appropriate methods for such systems. This research is inscribed in the three year ANR project INEDIT coordinated by the team leader (Ended on October 2015).
The combination of both realtime machine listening systems and reactive programming paradigms has enabled the authoring of interactive music systems as well as their realtime performance within a coherent synchronous framework called Antescofo (See also ). The module, developed since 2008 by the team members, has gained increasing attention within the user community worldwide with more than 50 prestigious public performances yearly. The outcomes of the teams's research will enhance the interactive and reactive aspects of this emerging paradigm as well as creating novel authoring tool for such purposes.
The AscoGraph authoring environment, started in 2013 and shown in Figure , is the first step towards such authoring environment and extended specifically in 2015 as reported in , .
The outcome of the ANR Project INEDIT (with LABRI and GRAME and coordinated by team leader), has further extended applications of Antescofo to other domains where temporal scenari are necessary such as in robotics and automatic scene understanding.
The Antescofo User Community counts more than 150 active members as of early 2016 from prestigeous institutions and music ensembles from all around the world.
Realtime Music Information Retrieval is used as front-end for various applications requiring sonic interaction between software/hardware and the physical worlds. MuTant has focused on realtime machine listening since its inception and holds state-of-the-art algorithms for realtime alignment of audio to symbolic score, realtime tempo detection, realtime multiple-pitch extraction. Recent results have pushed our application to more generalised listening schemes beyond music signals as reported in . The Masters thesis of M. Sibru provides benchmark for possible extensions of this paradigm to general Sound Scenes and is currently being pursued as a PhD project.
In 2015, MuTant acquired a Poppy Robot and has started developping tools for real-time Sound Scene Understanding. We hope to publish preliminary stable results in 2016.
Technologies developed by MuTant can find their way with general public (besides professional musicians) and within the entertainment industry. Recent trends in music industry show signs of tendencies towards more intelligent and interactive interfaces for music applications. We tested the Singing Accompaniment version of Antescofo during Ircam's Open House in June 2015 with more than 100 participants in our Open Mic Secssion (See Video).
MuTant team leader was awarded the French Ministry of Research's iLab Award (17th Edition) in the emergence category, which should culminate to the creation of a spin-off out of this technology.
Best Student Paper Award, IEEE 2015 International Conference on Acoustics, Speech and Signal Processing (ICASSP), in Machine Learning for Signal Processing Category
Best Student Paper award, International Symposium on Computer Music Interdisciplinary Research 2015 (CMMR)
Public Antescofo Open Mic Session, Ircam Open House on June 2015 (with 2000+ participants).
Numerous Public Concerts worldwide including performances with (2015 highlights) Berlin Philharmonics (March), Barbican Center in London (May), Warsaw Autumn Festival, and more.
Functional Description Antescofo is a modular polyphonic Score Following system as well as a Synchronous Programming language for musical composition. The module allows for automatic recognition of music score position and tempo from a realtime audio Stream coming from performer(s), making it possible to synchronize an instrumental performance with computer realized elements. The synchronous language within Antescofo allows flexible writing of time and interaction in computer music.
Antescofo v0.9 was released in November 2015. It contains major additions in the language (see Sections and ) as well as machine listning especially for singing voice and highly polyphonic instruments (See Release Notes). Antescofo Reference Guide is a collaborative document referencing the language and its usage, showcasing the software's latest developments.
Participants: Arshia Cont, Jean-Louis Giavitto, Philippe Cuvillier and José Echeveste
Contact: Arshia Cont
Functional Description
AscoGraph, the Antescofo graphical score editor released in 2013, provides a autonomous Integrated Development Environment (IDE) for the authoring of Antescofo scores. Antescofo listening machine, when going forward in the score during recognition, uses the message passing paradigm to perform tasks such as automatic accompaniment, spatialization, etc. The Antescofo score is a text file containing notes (chord, notes, trills, ...) to follow, synchronization strategies on how to trigger actions, and electronic actions (the reactive language). This editor shares the same score parsing routines with Antescofo core, so the validity of the score is checked on saving while editing in AscoGraph, with proper parsing errors handling. Graphically, the application is divided in two parts. On the left side, a graphical representation of the score, using a timeline with tracks view. On the right side, a text editor with syntax coloring of the score is displayed. Both views can be edited and are synchronized on saving. Special objects such as "curves", are graphically editable: they are used to provide high-level variable automation facilities like breakpoints functions (BPF) with more than 30 interpolations possible types between points, graphically editable.
In 2015, AscoGraph's User Interaction was redesigned as reported in , and furthermore, a new Score Import procedure was developped and released in v0.25 (See Release Notes). See also .
Contact: Arshia Cont
The frequent use of Antescofo in live and public performances with human musicians implies strong requirements of temporal reliability and robustness to unforeseen errors in input. To address these requirements and help the development of the system and authoring of pieces by users, we are developing a platform for the automation of testing the behavior of Antescofo on a given score, with of focus on timed behavior. It is based on state of the art techniques and tools for model-based testing of embedded systems , and makes it possible to automate the following main tasks:
offline and on-the-fly generation of relevant input data for testing (i.e. fake performances of musicians, including timing values), with the sake of exhaustiveness,
computation of the corresponding expected output, according to a formal specification of the expected behavior of the system on a given mixed score,
black-box execution of the input test data on the System Under Test,
comparison of expected and real output and production of a test verdict.
The input and output data are timed traces (sequences of discrete events together with inter-event durations). Our method is based on formal models (specifications) in an ad hoc medium-level intermediate representation (IR). We have developed a compiler for producing automatically such IR models from Antescofo high level mixed scores.
Then, in the offline approach, the IR is passed, after conversion to Timed Automata, to the model-checker Uppaal, to which is delegated the above task (1), following coverage criteria, and the task (2), by simulation. In the online approach, tasks (1) and (2) are realized during the execution of the IR by a Virtual Machine developed on purpose. Moreover, we have implemented several tools for Tasks (3) and (4), corresponding to different boundaries for the implementation under test (black box): e.g. the interpreter of Antescofo's synchronous language alone, or with tempo detection, or the whole system.
Our fully automatic framework has been applied to real mixed scores used in concerts and the results obtained have permitted to identify bugs in Antescofo.
We are developing a new system for rhythm transcription, which is the conversion of sequences of timestamped discrete events into common-western music notation. The input events may e.g. come from a performance on a MIDI keyboard or may also be the result of a computation. Our system privileges the user interactions in order to search for a satisfying balances between different criteria, in particular the precision of the transcription and the readability of the music score in outcome. It is integrated in the graphical environment for computer assisted music composition OpenMusic, and and will be released publicly as a library of this system on the the Ircam's Forum.
We have developed a uniform approach for transcription, based on hierarchical representations of notation of duration as rhythm trees, and efficient algorithms for the lazy enumeration of solutions. It has been implemented via a dedicated interface making it possible the interactive exploration of the space of solutions, their visualization and their edition, with a particular focus on the processing of grace-notes and rests.
We consider a new discriminative approach to the problems of segmentation and of audio-to-score alignment. For each musical event, templates have to be built or learnt before performing any alignment. Because annotating a large database music files would be a tedious task, we develop an original approach to learn templates without annotations, but only the knowledge of the music scores associated to music files. We consider the two distinct informations provided by the music scores: (i) an exact ordered list of musical events and (ii) an approximate prior information about relative duration of events. We extend the celebrated Dynamic Time Warping algorithm (DTW) to a convex problem that learns optimal classifiers for all events while jointly aligning files, using this weak supervision only. We show that the relative duration between events can be easily used as a penalization of our cost function and allows us to drastically improve performances of our approach. We describe in details our approach and preliminary results obtained on a large-scale database in .
This work was done in collaboration with the SIERRA project-team at Inria Paris.
We develop a new stochastic model of symbolic (MIDI) performance of polyphonic scores, based on Semi-Markov models, to align MIDI performances of music scores. In our approach, the evolution of the music performer and the production of performed notes are modeled with a hierarchical extension of hidden semi-Markov models (HSMM). By comparing with a previously studied model based on hidden Markov model (HMM), we give theoretical reasons why the present model is advantageous to deal with complex music event such as trills, tremolos, arpeggios, and other ornaments. This is also confirmed empirically by comparing the accuracy of score following and analysing the errors. We also develop a hybrid of this HSMM-based model and the HMM-based model which is computationally more efficient and retains the advantages of the former model. The present model yields one of the state-of-the-art score following algorithms for symbolic performance and can possibly be applicable for other music recognition problems. Details and results are published in .
This work was done in collaboration with Eita Nakamura from the National Institute of Informatics of Tokyo, Japan.
Singing voice is specific in music: a vocal performance conveys both music (melody/pitch) and lyrics (text/phoneme) content. We develop and original approach that aims at exploiting the advantages of melody and lyric information for real-time audio-to-score alignment of singing voice. First, lyrics are added as a separate observation stream into a template-based hidden semi-Markov model (HSMM), whose observation model is based on the construction of vowel templates. Second, early and late fusion of melody and lyric information are processed during real-time audio-to-score alignment. An experiment conducted with two professional singers (male/female) shows that the performance of a lyrics-based system is comparable to that of melody-based score following systems. Furthermore, late fusion of melody and lyric information substantially improves the alignment performance. Finally, maximum a posteriori adaptation (MAP) of the vowel templates from one singer to the other suggests that lyric information can be efficiently used for any singer. Preliminary results are published in .
Audio segmentation is an essential problem in many audio signal processing tasks, which tries to segment an audio signal into homogeneous chunks. Rather than separately finding change points and computing similarities between segments, we focus on joint segmentation and clustering, using the framework of hidden Markov and semi-Markov models. We introduced a new incremental EM algorithm for hidden Markov models (HMMs) and showed that it compares favorably to existing online EM algorithms for HMMs. Early experimental results on musical note segmentation and environmental sound clustering are promising and will be pursued further in 2015.
Theoretical results were published in in collaboration with the SIERRA project-team, and experimental results were further extended in . Early experimental setups show that our algorithms out perform state-of-the-art supervised methods for Percussion Sound classification. In collaboration with IRCyNN (Nantes) we are currently studying algorithmic extensions to complex environmental sounds.
José Echeveste developped several synchonization strategies in the framework of his PhD thesis. Their formalization is based on a dynamic real-time extension of the time map formalism, going beyond state-of-the-art where the largest body of literature on time maps is devoted to static functions, defined and known at all times before any manipulation is done. Only the latest work of Liang and Danneberg (2011) have considered dynamic time map in the synchronization problem. However their approach suffer from a consistency drawback: the convergence of the tempo depends on the events occuring during the catching trajectory. In our approach we have developped a lag-depend formulation of the catching trajectory, which is insensitive to the actual events. This adaptive strategy consider only the deviation in tempo and position and is otherwise context-independant, it ensure convergence both in position and tempo, and it is efficient: there is no need to a fine sampling clock to discretize the time evolution: as long as the prediction time map do not change, delays are computed only once using the accompaniment time map. Our approach is general enough to handle various important issues in automatic accompaniment: latency management, integration of non constant tempo specifications in the score (accelerando, ritardanto, rubato...), handling of missing events, etc. Synchronization strategies have been fully formalized in the PhD report of José Echeveste together with a complete Antescofo core including other dynamic constructions.
Composers develops their own idiosyncratic compositional language through their pieces. In addition, composers and sound engineers have to face drastically different performance set-up for the same piece. This situation advocate for the development of new generic mechanism to simplify the development of generic yet dedicated libraries in Antescofo. In cooperation with various composers (Marco Stroppa, Julia Blondeau, Jason Freeman, Jose Miguel Fernandez, Yann Marez) we have introduced seevral new mechanisms in Antescofo to ease the building of dedicated yet reusable library of compositional pieces: extnesion of the functional language to include new control structure, introduction of continuation combinators making possible to start actions at the end of other durative actions, marshalling of Antescofo values, etc. The most notable ones are actor-based features to implement temporal objects. Object templates are specified and then instantiated at will. A temporal object encapsulate a local state; it can react to logical condition; it offers instantaneous as well as durative methods; reaction to synchronous broadcast can be defined as well as exceptional condition handlers. These new features are currently tested in the development of new pieces and are expected to evolve following the feedbacks from these applications.
DSP processing in Antescofo is an experimental extension of the language started in 2014 and aimed at driving various DSP processing capabilities directly within Antescofo. DSP processors are defined directly in an Antescofo score, harnessing various signal processing libraries. These DSP processors are then dynamically connected together using Antescofo audio links. Input and output channels are used to link these processors with the host environment while internal channels connect DSP among themselves. The connections are specified with a new kind of Antescofo actions, the patch. So, the connections can be changed dynamically in response to the events detected by the listening machine and can be synchronized using the expressive repertoire of synchronization strategies available in Antescofo. Ordinary Antescofo variables can be used to control the DSP computations, which add an additional level of dynamicity. Currently, FAUST and and a few specific signal processors (notably FFT) can be defined. Several benefits results of this tight integration. The network of signal processors is heterogeneous, mixing DSP nodes specified with different tools. The network of signal processors can change dynamically in time following the result of a computation. This approach answers the shortcomings of fixed (static) dataflow models of the Max or PureData host environments. Signal processing is controlled at a symbolic level and can be guided, e.g. by information available in the augmented score (like position, expected tempo, etc.). The tight integration makes possible to specify, concisely and more effectively, finer and more precise control of the signal processing, at a lower computational cost. One example is the use of symbolic curve specification to specify variations of control parameters at sample rate. It makes it possible to embed sound analysis inside Antescofo as well. At last but not least, signal processing can be done more efficiently. For example, in the remaking of Boulez' piece Antheme 2 there is an improvment of performance in time of 45 % compared to the original version with the audio effects managed in Max.
The current work focuses on the development of a dedicated type system enabling a finer control of scheduling and audio buffer size, refining results previously developped in the cyclostatic scheduling of synchronous dataflow. Early results are published in .
This work applies an information visualisation perspective to a set of revisions in the timeline-based representation of action items in AscoGraph, the dedicated user interface to Antescofo. Our contribution is twofold: (a) a design study of the proposed new model, and (b) a technical, algorithmic component. In the former, we show how our model relates to principles of information coherence and clarity, facility of seeking and navigation, hierarchical distinction and explicit linking. In the latter, we frame the problem of arranging action rectangles in a 2D space as a strip packing problem, with the additional constraint that the (horizontal) time coordinates of each block are fixed. We introduce three algorithms of increasing complexity for automatic arrangement, estimate their packing performance and analyse their strengths and weaknesses. We evaluate the systemic improvements achieved and their applicability for other time-based datasets. Furthermore, algorithms for efficient automatic stacking of time-overlapping action blocks are developped, as well as mathematical proof for their time-coherency during dynamic visualizations.
We have been pursuing our studies on the application of model-based timed testing techniques to the interactive music system (IMS) Antescofo, in the context of the Phd of Clément Poncelet and in relation with the developments presented in Section .
Several formal methods have been developed for automatic conformance testing of critical embedded software, with the execution of a real implementation under test (IUT, or black-box) in a testing framework, where carefully selected inputs are sent to the IUT and then the outputs are observed and analyzed. In conformance model-based testing (MBT), the input and corresponding expected outputs are generated according to formal models of the IUT and the environment. The case of IMS presents important originalities compared to other applications of MBT to realtime systems. On the one hand, the time model of IMS comprises several time units, including the wall clock time, measured in seconds, and the time of music scores, measured in number of beats relatively to a tempo. This situation raises several new problems for the generation of test suites and their execution. On the other hand, we can reasonably assume that a given mixed score of Antescofo specifies completely the expected timed behavior of the IMS, and compile automatically the given score into a formal model of the IUT’s expected behavior, using an intermediate representation. This give a fully automatic test method, which is in contrast with other approaches which generally require experts to write the specification manually.
We have developed online and offline approches to MBT for Antescofo. The offline approach relies on tools of the Uppaal suite , , using a translation of our models into timed automata. These results have been presented during the 30th ACM/SIGAPP Symposium On Applied Computing, track Software Verification and Testing and an article describing this approach has been accepted for publication in the Journal of New Music Research. The online approach is based on a new virtual machine executing the models of score in intermediate representation (see Section ).
Rhythmic data are commonly represented by tree structures (rhythms trees) in assisted music composition environments, such as OpenMusic, due to the theoretical proximity of such structures with traditional musical notation. We are studying the application in this context of techniques and tools for processing tree structure, which were originally developed for other areas such as natural language processing, automatic deduction, Web data processing... We are particularly interested in two well established formalisms with solid theoretical foundations: tree automata and term rewriting.
Our first main contribution in that context is the development of a new framework for rhythm transcription, the problem of the generation, from a sequence of timestamped notes, e.g. a file in MIDI format, of a score in traditional music notation) – see Section . This problem arises immediately as insoluble unequivocally: we shall calibrate the system to fit the musical context, balancing constraints of precision, or of simplicity / readability of the generated scores. We are developing in collaboration with Jean Bresson (Ircam) and Slawek Staworko (LINKS, currently on leave at University of Edinburgh) an approach based on algorithms for the enumeration of large sets of weighted trees (tree series), representing possible solutions to a problem of transcription. The implementation work is performed by Adrien Ycart, under a research engineer contract with Ircam. This work has been presented in .
Our second contribution, in collaboration with Prof. Masahiko Sakai (Nagoya University), is a proposal of a structural theory (equational system on rhythm trees) defining equivalence on rhythm notations , . This approach can be used for example to generate, by transformation, different notations possible the same rate, with the ability to select in accordance with certain constraints. We have also conducted related work on the theory of term rewriting .
Title: Interactivity in the Authoring of Time and Interactions
Project acronym: INEDIT
Type: ANR Contenu et Interaction 2012 (CONTINT)
Instrument: ANR Grant
Duration: September 2012 - November 2015
Coordinator: IRCAM (France)
Other partners: Grame (Lyon, France), LaBRI (Bordeaux, France).
Abstract: The INEDIT project aims to provide a scientific view of the interoperability between common tools for music and audio productions, in order to open new creative dimensions coupling authoring of time and authoring of interaction. This coupling allows the development of novel dimensions in interacting with new media. Our approach lies within a formal language paradigm: An interactive piece can be seen as a virtual interpreter articulating locally synchronous temporal flows (audio signals) within globally asynchronous event sequence (discrete timed actions in interactive composition). Process evaluation is then to respond reactively to signals and events from an environment with heterogeneous actions coordinated in time and space by the interpreter. This coordination is specified by the composer who should be able to express and visualize time constraints and complex interactive scenarios between mediums. To achieve this, the project focuses on the development of novel technologies: dedicated multimedia schedulers, runtime compilation, innovative visualization and tangible interfaces based on augmented paper, allowing the specification and realtime control of authored processes. Among posed scientific challenges within the INEDIT project is the formalization of temporal relations within a musical context, and in particular the development of a GALS (Globally Asynchronous, Locally Synchronous) approach to computing that would bridge in the gap between synchronous and asynchronous constraints with multiple scales of time, a common challenge to existing multimedia frameworks.
Florent Jacquemard participates actively in the Efficace ANR Project. This project explores the relations between computation, time and interactions in computer-aided music composition, using OpenMusic and other technologies developed at IRCAM and at CNMAT (UC Berkeley). The participant consider computer-aided composition out of its traditional "offline" paradigm, and try to integrate compositional processes in structured interactions with their external context. These interactions can take place during executions or performances, or at the early compositional stages (in the processes that lead to the creation of musical material). There are particular focus on a number of specific directions, such as the reactive approaches for computer-aided composition, the notion of dynamic time structures in computation and music, rhythmic and symbolic time structures, or the interactive control, visualisation and execution of sound synthesis and spatialization processes .
Jean-Louis Giavitto participates in the SynBioTIC ANR Blanc project (with IBISC, University of Evry, LAC University of Paris-Est, ISC - Ecole Polytechnique).
The MuTant team is also an active member of the ANR CHRONOS Network by Gérard Berry, Collège de France).
Program: PHC Amadeus ()
Project acronym: LETITBE
Project title: Logical Execution Time for Interactive And Composition Assistance Music Systems
Duration: 01/2015 - 12/2016
Coordinator: Florent Jacquemard, Christoph Kirsch
Other partners: Department of Computer Sciences University of Salzburg, Austria
Abstract: The objective of this project is to contribute to the development of computer music systems supporting advanced temporal structure in music and advanced dynamics in interactivity. For this purpose we are proposing to re-design and re-engineer computer music systems (from IRCAM at Paris) using advanced notions of time and their software counterparts developed for safety-critical embedded systems (from University of Salzburg). In particular, we are applying the so-called logical execution time paradigm as well as its accompanying time safety analysis, real-time code generation, and portable code execution to computer music systems. Timing in music is obviously very important. Advanced treatment of time in safety-critical embedded systems has helped address extremely challenging problems such as predictability and portability of real-time code. We believe similar progress can be made in computer music systems potentially enabling new application areas. The objective of the project is ideally suited for a collaboration of partners with complementary expertise in computer music and real-time systems.
MuTant team hosted a Master Level student from the Inria Chile Center in partnership with the Pontificia Universidad Catolica de Chile. The project, undertaken by Nicolas Schmidt Gubbins and supervised by Arshia Cont and Jean-Louis Giavitto, ended in the first prototype of an embedded Antescofo engine (see ) with internal audio processing on Raspberry PI and UDOO mini-computers (See Presentation Video). A publication of preliminary results is underway and early results reported in .
We are pursuing a long term collaboration with Masahiko Sakai (U. Nagoya) on term rewriting techniques and applications (in particular applications related to rhythm notation) , .
We are collaborating with Slawek Staworko (LINKS, currently on leave at U. Edinburgh), and more generaly the Algomus group at Lille, in the context of our projects on rhythm transcription described at Sections and .
MuTant team collaborates with Bucharest Polytechnic University, in the framework of Grig Burloiu's PhD Thesis on AscoGraph UIX design which has resulted in a the new design of AscoGraph (see ) and two publications , .
MuTant team collaborated with researchers at National Institute of Informatics of Tokyo on real-time Symbolic Alignment of music data resulting in the publication in .
Masahiko Sakai (Professor at the University of Nagoya) visited MuTant for two weeks in September 20154, for collaboration on term rewriting techniques applied to tree-structured symbolic representations of rhythm.
Slawek Staworko (LINKS, on leave at U. of Edinburgh) visited MuTant for two weeks in September and December 2015, for collaborations on the problem of automatic rhythm transcription.
Professor Miller Puckette (UCSD) visited MuTant for two weeks in May 2015, participating in the PhD defense of José Echeveste and collaborating with the team on the new Audio Processing engine for embedded mini-computers.
The MuTant team hosted an International Internship from Pontificia Universidad Catolica de Chile, Nicolas Schmidt, working on the first instances of embedded Antescofo Audio Engine (See Presentation Video) (see also ) .
MuTant team organized the First Antescofo user Symposium during Ircam Forum Workshops in November 2015 with 6 contributions and over 30 participants.
Florent Jacquemard has participated in the program committee of the first International Conference on Technologies for Music Notation and Representation (TENOR 2015).
Jean-Louis Giavitto has participated in the program committee of the 41st International Computer Music Conference (ICMC), the ninth IEEE International Conference on Self-Adaptive and Self-Organizing Systems (SASO), the International Workshop on Nature Inspired Cooperative Strategies for Optimization (NICSO), the 13th European Conference on Artificial Life (ECAL), the Digital Entertainment Technologies and Arts (DETA track at GECCO 2015, 10th International Conference on Information Processing in Cells and Tissues IPCAT, and TENOR 2015. He organized jointly with Gérard Berry the Réunion de l'inter-section des applications des sciences on “Informatique et Musique”.
The members of the team participated as reviewers for IEEE ICASSP, ACM Multimedia Conferences, Sound and Music Computing, International Computer Music Conference (ICMC), Digital Audio Effects Conference (DAFx), IEEE Symposium on Logic In Computer Science (LICS), International Conference on Automated Deduction (CADE), Conference on Concurrency Theory (Concur), Computer Science Logic (CSL), Conference on Rewriting Techniques and Applications (RTA). and more.
Jean-Louis Giavitto is the redactor-in-chief of TSI (Technique et␊ Science Informatiques) published by Lavoisier. He has coorganizd with Antoine Spicher (Univ. Paris Est), Stefan Dulman (Univ. Twente) and Mirko Viroli (Univ. of Milano) a special issue of The Knowledge Engineering Review on Spatial Computing. This issue is finalized and will be printed in 2016.
The of the team participated as reviewers for the journal Information and Computation, ACM TOPLAS, IEEE Transactions on Multimedia, IEEE Transactions on Audio and Speech Signal Processing, and ACM Transactions on Intelligent Systems.
Jean-Louis Giavitto was invited to give the annual seminar of the Decanat des Science, Université de Namur (mai 2015) as well as at the International workshop Mathemusical Conversations: mathematics and computation in performance and composition jointly hosted by the Yong Siew Toh Conservatory of Music and the Institute for Mathematical Sciences in Singapore, in collaboration with the Center for Digital Music, Queen Mary University of London
Arshia Cont is an elected board member of International Computer Music Association (ICMA) in charge of organizing the annual ICMC Conference and promoting research in the field.
Jean-Louis Giavitto is in the management team of the GDR GPL (Genie de␊ la programmation et du logiciel), responsible with Etienne Moreau of the “Languages and Verification” pole of the GDR. He is also and expert for the ANR DEFI program and a reviewer for FET projects for the UC.
PhD defended: José Echeveste, Accorder le temps de la machine et celui du musicien, started in october 2011, supervisor: Arshia Cont and Jean-Louis Giavitto.
PhD in progress: Clément Poncelet, Formal methods for analyzing human-machine interaction in complex timed scenario. Started in october 2013, supervisor: Florent Jacquemard.
PhD in progress: Philippe Cuvillier, Probabilistic Decoding of strongly-timed events in realtime, supervisor: Arshia Cont.
PhD in progress: Julia Blondeau, Espaces compositionnels et temps multiples : de la relation forme/matériauq (thèse en art), supervisor: Jean-Louis Giavitto, co-director Dominique Pradelle (Philosophy, Sorbonne), Started October 2015.
PhD in progress: Maxim Sirbu, Online Interaction via Machine Listening. Supervisors: Arshia Cont (MuTant) and Mathieu Lagrange (IrCyNN), Started October 2015.
Jean-Louis Giavitto was reviewer of the Habilitation of René Douence (Ecole des Mines de nantes, Composition non modulaire modulaire). He was examiners of the PhD thesis of Sergiu Ivanov (University of Paris Est, Étude de la puissance d'expression et de l'universalité des modèles de calcul inspirés par la biologie), and the PhD of Simon Martiel (University of Nice, Approches informatique et mathématique des dynamiques causales de graphes).
Arshia Cont appeared as expert in BFM Business TV special live event on Music and New Technology.