Transcribing Music From Audio

Published 2024-04-07. Last modified 2024-04-09.
Time to read: 11 minutes.

This page is part of the av_studio collection.

For me, the most laborious part of transcribing a song played on guitar or saxophone is capturing the nuance in the timing of the notes.

These are my notes from my exploration of the available software options as of the date this page was last updated. As always, if a reader makes a suggestion for procedures and other technology that would yield better results, please let me know and I will update this page.

There is no fully automatic transcription program;
the reviewed software merely provides a headstart for manual transcription

Overview

Test Audio File

The audio file I am using as a test, Sanchez_05.flac, is a recording of a classical guitar in which I play some fast single notes (trills), intervals and chords. I used a click track to play along with when making the recording. The click track is not part of the recording.

The timing is subtle and complex, with the melody using slides, hammer-ons, and ringing bass notes. This is a challenging piece to transcribe accurately. You can listen to the test audio file:

Two Steps

Two steps are required for current technology to convert a nuanced audio file into a playable musical score:

Converting audio to MIDI or MusicXML.
Transcribing the MIDI or MusicXML into a digital score that can be viewed, edited, and printed.

This article discusses both steps and compares alternative software packages for each. The best output MIDI file from the winner of step 1 was provided to each of the competing annotation programs in step 2.

Human input is required for both steps to achieve usable results. These tests demonstrated that today’s software tools are quite far from being good enough to do their jobs unassisted.

I believe the status quo is likely to improve dramatically over the next few years.

1) Audio to MIDI

I evaluated how the available audio-to-MIDI conversion programs compared for this step. Most of these programs use neural networks; this is the kind of task that neural networks excel at.

🎶

The clear winner for step 1 was NeuralNote. If you do not care about the details of the other contestants, jump to the section on NeuralNote.

💩

I was surprised when Ableton Live 12 ranked dead last. I had expected the big dog to be the winner of step 1, or at least one of the top performers. Live was the only software product that I tested that did not use a neural network for converting audio to MIDI. As of 2024, an algorithmic or heuristic approach is unlikely to be able to compete with neural networks for this type of task.

2) MIDI Transcription

Step 2 uses the MIDI or MidiXML that was converted from the original audio. For this step, music notation programs like Dorico, Finale, Guitar Pro, Lilypond, MuseScore and Sibelius have several tasks:

Import from MIDI and/or MusicXML
Edit scores
Preview scores
Render scores

The first task, import, is challenging when working with a nuanced recording. Although the annotation programs that I tested work well for manually entered scores, interactive parameter tuning is required when importing nuanced MIDI files. Programs without the ability to provide immediate feedback when dialing the optimum importation parameters provided unusable junk that was not worth the time it would take to manually correct them.

😒

Guitar Pro did a better job of importing MIDI than the other contestants because it allows some parameters to be tuned while importing. Considerable work would be required to clean up the transcribed score, but this is much better than annotating the score without the head start provided by NeuralNote in tandem with Guitar Pro.

💩

MuseScore does not offer any control over the import process, so its results were unusable.

Perhaps a similar program utilizing a neural network will soon emerge that pushes the state of the art dramatically forward. I do not know of such a program today, but just as sure as the sun will rise tomorrow, such a program is very likely to enter the fray soon.

Step 1 - Audio to MIDI

Here are the detailed test results for step 1 (Audio-to-MIDI conversion).

AI-MIDI

ai-midi.com was simple to use. The website offers no controls.

Immediately after submitting the audio, the tempo and time signature were displayed. A nice touch!

After several minutes, a .mid file with the same name that I had uploaded became available. I named it ai_midi.mid so I could keep track of all the MIDI files created while writing this article.

AI-MIDI produced a recognizable result, but it needs significant cleanup. This represents a considerable savings of time and energy over an entirely manual process, but is far from ideal. This is the generated MIDI file:

💩 The MIDI file generated by AI-MIDI did not capture the trills, and they are essential for the song’s riff. This meant that AI-MIDI was unsuitable for my purposes.

Spotify Basic Pitch

Spotify released Basic Pitch, an open-source audio-to-MIDI converter, 22 months ago. Neural networks have come a long way since then, but Basic Pitch demonstrates that bigger models may not always be better. Spotify wrote an excellent article about Basic Pitch.

The liberal Apache-2.0 license is used for the GitHub project. The project has 3,000 stars and 222 forks, so this is a very popular project. The GitHub project is packaged as a Python library and a web page. The research paper has the technical details: A Lightweight Instrument-Agnostic Model for Polyphonic Note Transcription and Multipitch Estimation. Here is a direct link to the PDF.

The Sanchez_05.flac audio file was used as before for testing:

After conversion, the web page had more information appended:

😏

I did not make any adjustments and pressed PLAY. The results were markedly better than the results from AI-MIDI. Spotify’s program may be older, but it works much better!

Opening the SHOW MIDI ADJUSTMENTS section revealed these additional controls, which I did not modify for the test. Note that the default minimum note length is shown as 11 ms, while the shortest possible minimum note length is shown as 3 ms.

I downloaded the MIDI file that was converted from the FLAC file, basic_pitch_transcription.mid. You can listen to it:

I found that attempting a second conversion would fail unless I refreshed the web page first.

NeuralNote

NeuralNote is an open-source project on GitHub that provides AU and VST plugins, as well as a standalone program for performing audio-to-MIDI conversion. The project uses the very common, and quite liberal, Apache-2.0 license. It has 1,000 stars and 53 forks, so this is a popular project.

This GitHub project has no dependencies, which is unusual for a project with this much technology in it. It is written in C++ and uses two git submodules that provide the popular JUCE framework and RTNeural. I found code from the Basic Pitch project in Lib/Model/BasicPitchCNN.cpp; instead of building off Basic Pitch, this project is essentially a fork of that project, combined with JUCE boilerplate, plus a little magic pixie dust.

NeuralNote uses internally the model from Spotify’s basic-pitch. See their blogpost and paper for more information.

In NeuralNote, basic-pitch is run using RTNeural for the CNN part and ONNXRuntime for the feature part (Constant-Q transform calculation + Harmonic Stacking). As part of this project, we contributed to RTNeural to add 2D convolution support.

Installation

I downloaded the Windows standalone zip file and uncompressed it. There is no installation procedure; simply drag the executable file from the zip container file to the directory that you want the program to reside in. Here is how I added it to the Windows Start Menu:

Dragged the executable file from the zip container file to C:\Program Files.
Right-clicked on C:\Program Files\NeuralNote.exe and selected Pin to Start.
Pressed the Windows key and waited for the Windows menu to appear.
Right-clicked on the NeuralNote icon and selected Resize / Small. (Why Microsoft has such trouble settling on reasonable user interface defaults is difficult to understand.)

This is how NeuralNote looks when it first launches:

Setup

If you want to preview the conversion, which I strongly recommend, you should configure the audio device and the sample rate. This setting is not used for, and has no effect on, the audio-to-MIDI conversion.

Clicking on the small Options button at the top left of the user interface displays a small menu window. Selecting Audio/MIDI Settings displays the Audio/MIDI Settings window, which we will see shortly.

Another way to display the Audio/MIDI Settings window is to click on the Settings… button at the top right of the window.

Having two paths to the Audio/MIDI Settings window is redundant, which might be confusing to some users. This is what the Audio/MIDI Settings window looks like:

For a Windows computer, the default Audio device type is Windows Audio, which for me only displayed one sample rate. Selecting ASIO for the Audio device type caused the window to display an additional option, Device, which had a default value of ASIO4ALLv2.

ASIO4ALL is a well-known audio device driver. I had previously installed ASIO4ALL; it was only displayed because all ASIO drivers on my system were displayed, and it was selected because it was the alphabetically first driver of that type. It is quite functional, but very slow.

Instead of using ASIO4ALL, I selected ASIO MADIface USB, which RME provides for the UFX III audio interface that the test computer was connected to.

ASIO devices support many sample rates. I kept the pre-selected rate.

There is no Save button, which is fine. I simply closed the Audio/MIDI Settings window, and the modified settings were permanently saved.

Testing

I then dragged and dropped the test audio file (Sanchez_05.flac) onto the NeuralNote area labeled LOAD OR DROP AN AUDIO FILE. This is what NeuralNote looked like after loading the test audio file using default settings:

The results were similar to the results for Basic Pitch, which is to say pretty good. However, the trills were not being picked up. I decreased the MIN NOTE DURATION from 125 ms to the minimum possible value, 35 ms. This is a much longer duration than the 3 ms minimum possible with the parent project, Basic Pitch; however, it was a short enough duration to pick up the trills.

I think the value of 3 ms in BasicPitch is incorrect. The unit is not ms but frames. A basic pitch frame is 11.6 ms (256 samples at 22050 Hz). So the 3 ms on the basic pitch website corresponds to the 35 ms in NeuralNote.

I discovered that issue while going through their code to implement NeuralNote, but I forgot to report it to them.

– Damien Ronssin, primary NeuralNote author

I highlighted some text in the above image by enclosing it in a red rectangle. The user interface for saving a converted MIDI file is unusual; click on the highlighted text and drag to the desktop or a file folder. The name of the saved file cannot be modified when saved; it was Sanchez_05_NNTranscription.mid.

The preview demonstrated that the trills were captured. NeuralNote is the only program tested that was able to do this well:

🎶

This file (Sanchez_05_NNTranscription.mid) was used to test the contestants for Step 2: MIDI Transcription.

There is no way to type in numeric values or use the up and down arrows to adjust the values; instead, the little knobs must be turned by clicking on them, then dragging vertically, which is awkward.

Ableton Live 12

Ableton can do many things, including convert audio to MIDI. If you have a copy, you should try it; however, I believe NeuralNote is by far the best option.

These are the steps that I followed:

I launched Ableton Live Suite 12, which presented me with a new project, as always.
I then dropped the audio file onto one of the clip slots of an audio track. The clip and the track were automagically labeled Sanchez 05.
I right-clicked on the clip and selected Convert Melody to New MIDI Track.

Conversion took over 5 minutes, much longer than the other methods discussed above. This created a new clip called MIDI Sanchez_05 on a new track called # Melody to MIDI.
I set the Stretch parameter to zero, so no warping would be applied.
I exported the MIDI by clicking on the new clip, then pressing CTRL-Shift-E.
The file was saved as MIDI Sanchez_05.mid.
💩 As you can hear, the results were truly awful.

Step 2 - MIDI Transcription

Here are the detailed test results for step 2 (MIDI transcription).

Guitar Pro 8

Guitar Pro is a popular and very capable commercial program. I previously wrote about Guitar Pro. When importing MIDI, Guitar Pro previews results according to user-specified import parameters; modifying the controls causes the preview to update accordingly. The ability to control import parmaters and preview the results is why Guitar Pro was judged to be the top contender.

To create a new Guitar Pro score consisting of a track for guitar with both standard notation and tabulature, I used the File / Import / MIDI… menu item to import the MIDI produced by NeuralNote (Sanchez_05_NNTranscription.mid) with the following settings:

The effect of changing each of the following parameters was immediately visible in the preview panel:

Enabled Allow multivoice so successive notes did not silence the preceding notes.
Enabled Allow triplets; the original audio contained quintuplets, but enabling triplets helped.
1/16 note Quantization.

Guitar Pro does cannot generate an MP3 file from a score that was created from MIDI, unless the MIDI tracks are converted to RSE (Realistic Sound Engine) format first. I converted the Nylon Guitar track from MIDI to RSE format by:

Enabling Global View by pressing F8.
Selecting the track in the global view. This opened the Inspector panel on the right of the Guitar Pro window.
Clicking on the TRACK tab at the top of the Inspector panel.
Clicking on RSE, next to the MIDI button.

I set the following additional parameters:

Selected Playing style: Fingers.
Enabled Auto let ring.

While in the Track Inspector, I indicated that the original had been played with a partial capo on strings 3, 4 and 5. This causes Guitar Pro to autogenerate better tabulature. Clicking on the Tuning: area (highlighted in bright red) opened the Tuning window, shown below, where I specified:

Alteration: ♯
Partial Capo: Fret 2 on strings 3, 4 and 5

I then pressed the Keep the Fingering button.

Here is an image showing a portion of the results, as displayed in Guitar Pro 8 when View / Screen - Horizontal was chosen. Four voices were created; voice 1 is active and the other 3 voices are shown in light gray.

This is how the rendered MIDI from NeuralNote sounds after processing by Guitar Pro 8 using the above settings:

😒

Even though this was the best result from any contestant, a lot of manual editing will be required to fix the transcription errors; If for some reason you prefer another music annotation program, export the score from Guitar Pro to MusicXML, then import into your preferred annotation program.

It can be helpful to add an audio track containing the original audio file. Use the Track / Add Audio File... menu item to do that.

When I tried to add the audio file by dragging it using File Manager and dropping on to the audio file panel, a message saying "Failed to locate audio file!" appeared. Using the menu works, however.

The file I added, Sanchez_05.flac, was 16 MB. Adding it to the score increased the size of the Guitar Pro file (with a .gp extension) from 144 KB to 25 MB, so I did not save this.

MuseScore Studio 4.3

MuseScore Studio 4.3 is a F/OSS project on GitHub that is licensed under GPL v3.0. It has 2,500 forks and 11,600 stars; this is the most popular project that I looked at for this article. It has a very high level of activity and is well funded. I previously wrote about MuseScore 3 and 4; and this article builds on that information.

To transcribe the MIDI generated from NeuralNote (Sanchez_05_NNTranscription), I opened the MIDI file with MuseScore, which is how the importation process works. There are no adjustable parameters.

The tempo was set incorrectly at 189 bpm instead of 120 bpm. To correct the tempo to 120 bpm I:

Opened the Properties panel on the left side of the MuseScore window.
Expanded the Tempo accordion menu.
Enabled the Follow written tempo option (this should be the default!)
Double-clicked on the written tempo and changed 189 to 120.

I saved the file as Sanchez_05_NNTranscription.mscz so readers could examine it.

This is how MuseScore presents the score using my favorite layout:

MuseScore sounded like it stuttered badly when playing back the score. I exported the score as an MP3 and learned that the problem was not stuttering, the problem was that MuseScore made a mess of the MIDI import. I saved the file as Sanchez_05_NNTranscription-Piano.mp3 so you can listen to it:

💩

MuseScore introduced so many transcription errors that it was unusable for this task. Notice how different the displayed score looks from the version produced by Guitar Pro 8.