Science and technology

My favourite open supply library for analyzing music recordsdata

In my previous article, I created a framework for analyzing the directories and subdirectories of music recordsdata, utilizing the groovy.File class, which extends and streamlines java.File and simplifies its use. In this text, I exploit the open supply JAudiotagger library to investigate the tags of the music recordsdata within the music listing and subdirectories. Be certain to learn the primary article on this sequence in case you intend to observe alongside.

Install Java and Groovy

Groovy relies on Java, and requires a Java set up. Both a current and respectable model of Java and Groovy could be in your Linux distribution’s repositories. Groovy can be put in straight from the Apache Foundation website. A pleasant various for Linux customers is SDKMan, which can be utilized to get a number of variations of Java, Groovy, and plenty of different associated instruments. For this text, I exploit SDK’s releases of:

  • Java: model 11.0.12-open of OpenJDK 11
  • Groovy: model 3.0.8

Back to the issue

In the 15 or so years that I’ve been rigorously ripping my CD assortment and more and more shopping for digital downloads, I’ve discovered that ripping packages and digital music obtain distributors are everywhere in the map in the case of tagging music recordsdata. Sometimes, my recordsdata are lacking tags that may be helpful to music gamers, comparable to ALBUMSORT. Sometimes this implies my recordsdata are filled with tags I do not care about, comparable to MUSICBRAINZ_DISCID, that trigger some music gamers to alter the order of presentation in obscure methods, in order that one album seems to be many, or types in a wierd order.

Given that I’ve practically 10,000 tracks in practically 700 albums, it is fairly good when my music participant manages to show my assortment in a fairly comprehensible order. Therefore, the last word purpose of this sequence is to create a couple of helpful scripts to assist determine lacking or uncommon tags and facilitate the creation of a piece plan to repair tagging issues. This explicit script analyzes the tags of music recordsdata and creates a CSV file that I can load into LibreOffice or OnlyOffice to search for issues. It will not take a look at lacking cowl.jpg recordsdata nor present album sub-subdirectories that comprise different recordsdata, as a result of this is not related on the music file degree.

My Groovy framework plus JAudiotagger

Once once more, begin with the code. As earlier than, I’ve included feedback within the script that mirror the (comparatively abbreviated) “comment notes” that I sometimes go away for myself:

     1  @Grab('web.jthink:jaudiotagger:3.0.1')
     2  import org.jaudiotagger.audio.*
       
     3  def logger = java.util.logging.Logger.getLogger('org.jaudiotagger');
     4  logger.setLevel(java.util.logging.Level.OFF);
       
     5  // Define the music library listing
       
     6  def musicLibraryDirName = '/var/lib/mpd/music'
       
     7  // These are the music file tags we're joyful to see
     8  // Some tags can happen greater than as soon as in a given file
       
     9  def wantedFieldIdSet = ['ALBUM', 'ALBUMARTIST',
    10      'ALBUMARTISTSORT', 'ARTIST', 'ARTISTSORT',
    11      'COMPOSER', 'COMPOSERSORT', 'COVERART', 'DATE',
    12      'GENRE', 'TITLE', 'TITLESORT', 'TRACKNUMBER',
    13      'TRACKTOTAL', 'VENDOR', 'YEAR'] as LinkedHashSet
       
    14  // Print the CSV file header
       
    15  print "artistDir|albumDir|contentFile"
    16  print "|$')"
    17  println "|other tags"
       
    18  // Iterate over every listing within the music libary listing
    19  // These are assumed to be artist directories
       
    20  new File(musicLibraryDirName).eachDir { artistDir ->
       
    21      // Iterate over every listing within the artist listing
    22      // These are assumed to be album directories
       
    23      artistDir.eachDir { albumDir ->
       
    24          // Iterate over every file within the album listing
    25          // These are assumed to be content material or associated
    26          // (cowl.jpg, PDFs with liner notes and so on)
       
    27          albumDir.everyFile { content materialFile ->
       
    28              // Initialize the counter map for tags we like
    29              // and the listing for undesirable tags
       
    30              def fieldKeyCounters = wantedFieldIdSet.collectEntries { e ->
    31                  [(e): 0]
    32              }
    33              def unwantedFieldIds = []
       
    34              // Analyze the file and print the evaluation
       
    35              if (content materialFile.title ==~ /.*.(flac|mp3|ogg)/) {
    36                  def af = AudioFileIO.learn(content materialFile)
    37                  af.tag.fields.every { tagField ->
    38                      if (tagField.id in wantedFieldIdSet)
    39                          fieldKeyCounters[tagField.id]++
    40                      else
    41                          unwantedFieldIds << tagField.id
    42                  }
    43                  print "${artistDir.name}|${albumDir.name}|${contentFile.name}"
    44                  wantedFieldIdSet.every { fieldId ->
    45                      print "|${fieldKeyCounters[fieldId]}"
    46                  }
    47                  println "|${unwantedFieldIds.join(',')}"
    48              }
       
    49          }
    50      }
    51  }

 

Line 1 is a type of awesomely beautiful Groovy services that simplify life enormously. It seems that the type developer of JAudiotagger makes a compiled model out there on the Maven central repository. In Java, this requires some XML ceremony and configuration. Using Groovy, I simply use the @Grab annotation, and Groovy handles the remainder behind the scenes.

Line 2 imports the related class recordsdata from the JAudiotagger library.

Lines 3-4 configure the JAudiotagger library to show off logging. In my very own experiments, the default degree is sort of verbose and the output of any script utilizing JAudiotagger is stuffed with logging data. This works properly as a result of Groovy builds the script right into a static fundamental class. I’m certain I’m not the one one who has configured the logger in some occasion technique solely to see the configuration rubbish collected after the occasion technique returns.

Lines 5-6 are from the framework launched in Part 1.

Lines 7-13 create a LinkedHashSet containing the listing of tags that I hope might be in every file (or, not less than, I’m OK with having in every file). I exploit a LinkedHashSet right here in order that the tags are ordered.

This is an efficient time to level out a discrepancy within the terminology I’ve been utilizing up till now and the category definitions within the JAudiotagger library. What I’ve been calling “tags” are what JAudiotagger calls org.jaudiotagger.tag.TagSubject situations. These situations dwell inside an occasion of org.jaudiotagger.tag.Tag. So the “tag” from JAudiotagger’s standpoint is the gathering of “tag fields”. I’m going to observe their naming conference for the remainder of this text.

This assortment of strings displays a little bit of prior digging with metaflac. Finally, it is price mentioning that JAudiotagger’s org.jaudiotagger.tag.FieldKey makes use of “_” to separate phrases within the area keys, which appears incompatible with the strings returned by org.jaudiotagger.tag.Tag.getFields(), so I don’t use FieldKey.

Lines 14-17 print the CSV file header. Note the usage of Groovy’s *. unfold operator to use toLowerCase() to every (higher case) string aspect of wantedFieldIdSet.

Lines 18-27 are from the framework launched in Part 1, descending into the sub-sub-directories the place the music recordsdata are discovered.

Lines 28-32 initialize a map of counters for the specified fields. I exploit counters right here as a result of some tag fields can happen greater than as soon as in a given file. Note the usage of wantedFieldIdSet.collectEntries to construct a map utilizing the set parts as keys (the important thing worth e is in parentheses, because it have to be evaluated). I clarify this in additional element in this article about maps in Groovy.

Line 33 initializes a listing for accumulating undesirable tag area IDs.

Lines 34-48 analyzes any FLAC, MP3 or OGG music recordsdata discovered:

  • Line 35 makes use of the Groovy match operator ==~ and a “slashy” common expression to test file title patterns;
  • Line 36 reads the music file metadata utilizing org.jaudiotagger.AudioFileIO.learn() into the variable af
  • Line 37-48 loops over the tag fields discovered within the metadata:
    • Line 37 makes use of Groovy’s every() technique to iterate over the listing of tag fields returned by af.tag.getFields(), which in Groovy may be abbreviated to af.tag.fields
    • Line 38-39 counts any incidence of a wished tag area ID
    • Line 40-41 appends an incidence of an undesirable tag area ID to the undesirable listing
    • Line 43-47 prints out the counts and undesirable fields (if any)

That’s it!

Typically, I might run this as follows:

$ groovy TagAnalyzer2.groovy > tagAnalysis2.csv
$

And then I load the ensuing CSV right into a spreadsheet. For instance, with LibreOffice Calc, I am going to the Sheet menu and choose Insert sheet from file. I set the delimiter character to |. In my case, the outcomes seem like this:

(Chris Hermansen, CC BY-SA 4.0)

I prefer to have the ALBUMARTIST outlined in addition to the ARTIST for some music gamers in order that the recordsdata in an album are grouped collectively when artists on particular person tracks fluctuate. This occurs in compilation albums, but in addition in some albums with visitor artists the place the ARTIST area would possibly say for instance “Tony Bennett and Snoop Dogg” (I made that up. I feel.) Lines 22 and onward within the spreadsheet proven above do not specify the album artist, so I would need to repair that going ahead.

Here is what the final column displaying undesirable area ids appears like:

(Chris Hermansen, CC BY-SA 4.0)

Note that these tags could also be of some curiosity and so the “wanted” listing is modified to incorporate them. I might arrange some sort of script to delete area IDs BPM, ARTWORKGUID, CATALOGUENUMBER, ISRC and PUBLISHER.

Next steps

In the following article, I’ll step again from tracks and test for cowl.jpg and different non-music recordsdata mendacity round in artist subdirectories and album sub-subdirectories.

Most Popular

To Top