Janico Greifenberg's

Ibis and Baboon

Playing With Music and Overtone I

One of the more spectacular projects in Clojure is Overtone, a toolkit for programmable music written by Sam Aaron and others. Overtone is a front-end to the SuperCollider sound synthesis server giving you a Clojure library to use synthesis, sampling and event instrument construction.

This post is the beginning of a series, documenting my experiments with generating music with Overtone. To get started, in this post we discuss how to play a short melody.

Subject of Kunst der Fuge

The melody is the subject of “Die Kunst der Fuge” by J.S. Bach.

To follow along you need the Leiningen build tool. Create a project as a sandbox.

1
2
$ lein new jamsession
$ cd jamsession

Edit the project.clj file in that directory and add the following to the dependencies.

1
[overtone "LATEST"]

When you start a REPL, leiningen automatically downloads all dependencies.

1
$ lein repl

When you get the user=> prompt load overtone and see if it’s working.

1
2
3
user=> (use 'overtone.live)

user=> (demo (sin-osc))

You should hear a short and somewhat annoying sound. If it doesn’t work for you, check the Overtone wiki.

Now we need a proper instrument so that we get less annoying sounds. Let’s go with a piano.

1
2
3
user=> (use 'overtone.inst.sampled-piano)

user=> (sampled-piano)

As the name implies this piano uses samples to make sounds. When it’s used for the first time, it downloads the samples which can take some time. When the samples are downloaded, you should hear a single note. Without arguments, the sampled-piano function plays a C. You can pass a MIDI note value to specify which note you want.

1
user=> (sampled-piano 62)

Overtone provides the note function to resolve a symbolic name to the MIDI value.

1
user=> (sampled-piano (note :d4))

The next thing we need is timing, because we don’t want to make music by typing lines like these really fast into a REPL. Overtone gives us at and apply-at to schedule actions. at is a macro that takes a timestamp and a body of Clojure forms, executing the body at the given time.

1
user=> (at (+ 1000 (now)) (sampled-piano (note :c#4)))

This plays a note one second from now (the now function returns the current timestamp). apply-at lets us schedule a function application which is handy for recursively performing a function at specific times.

1
user=> (apply-at (+ 1000 (now)) sampled-piano (note :c#4) [])

This does the same thing as the at form above. The last argument is a sequence that gets applied to the scheduled function (sampled-piano in this case) as arguments, like the apply function from clojure.core. We don’t need this here, but it is necessary to have at least an empty vector in the arguments.

We could start playing melodies with these functions, but dealing with millisecond timestamps is cumbersome. A more appropriate level of abstraction is to work with counts and bars, as you would when making music without a computer. Fortunately, Overtone provides us with metronomes that can be used to keep track of an ongoing beat and resolve the timestamp of specific beats.

The metronome function creates a metronome with a given number of beats per minute that starts counting in the background immediately and returns a function to access that metronome. Called without arguments, the metronome access function returns the next beat number. When called with a beat number it returns the timestamp when that beat will come up.

1
2
3
4
5
user=> (def metro (metronome 120))
user=> (metro)
;=> 12
user=> (metro 42)
;=> 1.358189651961E12

(The return values depend on the timing when you execute them, so you will see different ones.)

Now that we have an instrument, and beat based scheduling, we can start to think about playing a melody. Starting from the outside, what should a playing function look like? The function should take a metronome, so that we can define the speed and use the same metronome across different sound making functions, the instrument to use, and the melody.

But how do we encode the melody? We only want to play a single voice for the moment, so a sequence of notes is enough. Each note can be encoded as a vector of the symbolic name of its pitch (which I also referred to as note above – music terminology is a bit ambiguous) and its duration in beats. So the subject of “Kunst der Fuge” shown above can be encoded as:

1
2
(def subject [[:d4 2] [:a4 2] [:f4 2] [:d4 2] [:c#4 2] [:d4 1] [:e4 1]
              [:f4 2.5] [:g4 0.5] [:f4 0.5] [:e4 0.5] [:d4 1]])

Knowing what the data look like, we can implement the functions to interpret them. The first step is a function that plays a single note. In order to be able to make the note part of the melody we have to schedule playing it using at.

1
user=> (at (metro (metro)) (sampled-piano (note :c4)))

Here we schedule the note for the next beat. But just scheduling when to play the note and then the next note afterwards is not enough, as you can hear the sound dragging on for a few seconds. We also need a way to stop it after the duration we encoded in our note vector.

To stop a note, we need the handle returned by the playing function and use the ctl function to set the gate to 0.

1
2
user=> (def id (at (metro (metro)) (sampled-piano (note :c4))))
user=> (ctl id :gate 0)

Tying this together into a function we get:

1
2
3
4
5
6
7
(defn play-one
  [metronome beat instrument [pitch dur]]
  (let [end (+ beat dur)]
    (if pitch
      (let [id (at (metronome beat) (instrument (note pitch)))]
        (at (metronome end) (ctl id :gate 0))))
    end))

We take the metronome, the target beat, and the instrument as arguments. In the final argument, we destructure the note vector into pitch and duration. Then we calculate the beat when the note needs to stop by adding the duration to the start beat. In order to encode rests with a pitch of nil, we only go into the playing code, if the pitch is not nil. After that we play the note and stop it as shown above. Finally, we return to beat when this note ends, so that we know the timing of the next note.

Finally, we can now define the play function that goes through the melody, calls play-one for each note and uses apply-at to schedule a recursive execution of itself at for the next note.

1
2
3
4
5
6
7
8
9
(defn play
  ([metronome inst score]
     (play metronome (metronome) inst score))
  ([metronome beat instrument score]
     (let [cur-note (first score)]
       (when cur-note
         (let [next-beat (play-one metronome beat instrument cur-note)]
           (apply-at (metronome next-beat) play metronome next-beat instrument
             (next score) []))))))

The recursive call has to get the beat, on which it should play its first note, but for a convenient API, we make that argument optional and ask the metronome for the next beat, so that it starts playing at once, when called. We cannot use the next beat from the metronome at each recursive call, as we need to be able to handle fractional beats for eighth or sixteenth notes.

We have reached our goal and can play the subject of “Kunst der Fuge”:

1
user=> (play (metronome 120) sampled-piano subject)

An Experiment About Static and Dynamic Type Systems

Thanks to Christoph Treude’s post on It will never work in theory I found this paper by Stefan Hanenberg empirically comparing the effects of using static and dynamic type systems.

In this paper we presented results of an experiment that explores the impact of static and dynamic type systems on the development of a piece of software (a parser) with 49 subjects. We measured two different points in the development: first, the development time until a minimal scanner has been implemented, and second the quality of the resulting software measured by the number successful test cases fulfilled by the parser. In none of these measured points the use of the static type system turned out to have a significant positive impact. In the first case, the use of the statically typed programming language had a significant negative impact, in the latter one, no significant difference could be measured.

These are interesting results (especially since it kind of validated my preference for dynamic languages), although the author cautions against considering this study as a conclusive verdict on the matter (he seems to prefer static typing). The methodology is really thorough – most notably, the author implemented new languages for use in the study, so that none of the subjects could have any prior experience to skews the results.

Renaissance Evil

Once upon a time (circa 1475) the whimsical Will that scripts the Great Scroll of the Cosmos woke up in the morning and decided: Some day centuries from now, when mankind has outgrown the dastardly moustaches of melodrama and moved on to a phase of complex antiheroes, sympathetic villains and moral ambiguity, I want history teachers to be able to stand at the front of the classroom and say, “Yes, he really did go around dressed all in black wearing a mask and killing people for fun.” Thus Cesare Borgia was conceived.

A great post about the papal dynasty of the Borgias. It is actually part of a series about Machiavelli: Machiavelli Part I: S.P.Q.F. , Part I addendum, and Part II: The Three Branches of Ethics.

Danah Boyd on Freedom of Offensive Speech

I’m deeply committed to the value of free speech. I understand its costs and I despise when it’s used as a tool to degrade and demean people or groups. I hate when it’s used to justify unhealthy behavior or reinforce norms that disgust me. But I tolerate these things because I believe that it’s one of the most critical tools of freedom. I firmly believe that censoring speech erodes a society more than allowing icky speech does. I also firmly believe that efforts to hamper free speech do a greater disservice to oppressed people than permitting disgusting speech. It’s a trade-off and it’s a trade-off that I accept. Yet, it’s also a trade-off that cannot be taken for granted, especially in a global society.

I wholeheartedly agree. This is a difficult trade-off, but freedom of speech is too essential to compromise on. And somebody somewhere is going to be offended by everything.

A Plausible Threat

Three years ago I wrote a post about the difficulty of convincing people of the value of privacy. I argued that targeted ads — creepy though they seem to privacy advocates — are not a good argument to convince others of the value of being parsimonious with personal data.

The recent discovery of the app Girls Around Me provides a concrete illustration of the dangers caused by promiscuous sharing.

Girls Around Me uses Foursquare, the location-based mobile service, to determine your location. It then scans for women in the area who have recently checked-in on the service. Once you identify a woman you’d like to talk to, one that inevitably has no idea you’re snooping on her, you can connect to her through Facebook, see her full name, profile photos and send her a message.

While the app is creepy in its own right, Charles Stross analyzes what allows this app to exist.

… the app is not the problem. The problem is the deployment by profit-oriented corporations of behavioural psychology techniques to induce people to over-share information which can then be aggregated and disclosed to third parties for targeted marketing purposes.

Furthermore, Stross describes dystopian visions of using the widely available data for purposes far more creepy than Girls Around Me:

Facebook encourages us to disclose a wide range of information about ourselves, including our religion and a photograph. Religion is obvious: “Yids Among Us” would obviously be one of the go-to tools of choice for Neo-Nazis. As for skin colour, ethnicity identification from face images is out there already. Want to go queer bashing? There’s an algorithm out there for guessing sexual orientation based on the network graph of the target’s facebook friends. It’s probably possible to apply this sort of data mining exercise to determine whether a woman has had an abortion or is pro-choice.

Chomsky, Norvig and Practical Machine Learning

Is it more important for science to describe how things behave or to explain why things behave the way they do? This seems to be the question behind statements Noam Chomsky made at a symposium regarding machine learning and an article by Peter Norvig discussing Chomsky’s opinions.

Chomsky is highly critical of the commonly used statistical models for machine learning that focus on the “how”-part. He discounts the practical success of these models as unimportant for the advancement of science. His main goal – as far as I understand it – is to find the principles on which language is based.

Norvig argues in favour of statistical models and purely descriptive research as something well worth pursuing and not as rare in the history of science as Chomsky claims it is. He cites several examples from Chomsky’s publications that show a lack of knowledge about the capabilities of machine learning algorithms. Norvig’s most important argument – to my understanding – is that our current statistical models perform better in describing the reality of language than our current explanatory models.

Regardless of the advantages of statistical models, explanatory models definitely are easier for humans to talk and think about. And however our brains really learn languages, rules are how we teach them. Beyond the scientific considerations Norvig’s article and Chomsky’s statements focus on, I find this aspect relevant for the practical use of machine learning as well. Many applications or services that rely on machine learning suffer in usability, because you cannot really understand why they do what they do.

Spam filters are a good example for this problem. Originally, programs like Spamassasin only used explicit rules (e.g. does the text contain the word “viagra”) to determine whether a mail was spam. For each matching rule a certain number of points is added to the mail’s spam score. And usually the program adds a header where all the matching rules and the number of points resulting from it are listed (e.g. DRUGS_ERECTILE=0.282). These scores are helpful when looking why some mails where not classified correctly.

Spam detection was, however, greatly improved by the introduction of Bayesian filters. These probabilistic filters are trained with corpora of mails marked as spam or not-spam and calculate the probability of new mails being spam. In Spamassassin, this results in a single rather opaque score BAYES_99=3.5. The other descriptive scores are still there, but from a brief look through my recent spam, the Bayesian classifier contributes the most significant numbers for the mails that were correctly filtered.

The good news is that statistical models don’t always mean that you won’t get a good explanation. Amazon, for example, shows you why you get a certain recommendation. Below each recommended item, there is a link to “fix this recommendation”.

Fix recommendations on Amazon.com

The page you get, shows items you bought or looked at that the recommender thought were similar to the one you now get as recommendation. The page also gives you ways to tell Amazon not to use these items for you in the future.

In applications with a machine learning component, giving some explanation or reasoning goes a long way to improve the usability.

On the News

Mandy Brown wrote a really good post about the evolution her news-reading behaviour. She started out reading one newspaper and than transitioned to reading many different online news sources. I can absolutely relate to that. I also share the transition in the behavioural pattern of reading news: with the newspaper I read it once in the morning during breakfast; now I’m checking the news all day.

An than Mandy talks about expectations from the news she reads:

I want a reading experience that defends the news from the circus that online advertising creates. I want good storytelling and analysis, not naked facts. I want news that admits and defends its point of view (and acknowledges that there is a truth to be uncovered), not news that parrots the party line while making claims to objectivity. I want long essays on the events at Fukushima and the consequences for nuclear power going forward, not shrieking dispatches of each new fire or setback. I want a history of American engagement in Libya, putting the events of the past few weeks in context. I want twenty thousand words on the recession and its effects on the middle class, not another lone statistic about the unemployment rate. I want thoughtful, investigative journalism that exposes the ways in which our government is failing us, so that we can make it better.

I can say ‘yes’ to each of the points, except for the second item. I do want good storytelling and analysis, but I also want links to the naked facts. When I read about a planned increase in taxes for Diesel fuel where one quoted expert claims that preferring Diesel is bad from an environmental point of view, while car manufactures claim that Diesel is good for the environment, I want links to studies about the different properties of Diesel vs. gasoline with respect to environmental issues. When I read an op-ed about health care, where the author claims that the largest part of the health care costs for a person are incurred in the final year of their lives no matter how long the person lived, I want a link to a statistic supporting the claim. Bonus points for additional links to stats that do not agree and an explanation why they are less credible.

I have another addition to this wish list while we’re at it: I do want news that admits and defends its point of view, and I also want pointers to articles with different points of view.

Link: Punk Rock Languages

Chris Adamson wrote a polemic about programming languages for the March 2011 issue of the PragPub Magazin that is both entertaining and thoughprovoking.

The natural appeal of the language is to write software with it, not to mess with the language itself—Solve your users’ problems rather than indulging your own programming fetishes.

The Functional Elegance of Ring Middleware

Higher-order functions are to me the most awe-inspiring feature of functional programming languages. However, like many incredibly elegant concepts, its greatness is not immediately obvious (recursion is another example that comes to mind). When I was exclusively working with imperative programming languages and came across the definition of higher-order functions, I thought the idea was in part trivial and in part irrelevant theoretical nonsense.

[…] Higher-order functions […] are functions which do at least one of the following:

  • take one or more functions as an input
  • output a function.

Wikipedia

The first part, I could relate to. This is like passing a function pointer in C or an anonymous inner class in Java as e.g. event handlers. In both cases the syntax is so cumbersome that is feels like honest hard work, not like an awesome technique based on theory.

The second part – output a function – why would I want that? The typical examples are as helpful as fibonacci numbers are for explaining recursion to someone who is biased towards a narrow notion of practicality. They often go look something like this:

1
2
3
4
5
6
7
(defn plus-x [x]
  (fn [y] (+ x y)))

(def plus2 (plus-x 2))

(plus2 3)
=> 5

The function plus-x takes a parameter x and returns a function that takes a parameter y and returns x + y.

(If you didn’t believe me about the cumbersome syntax, here is the above example translated into Java. Related: Execution in the Kingdom of Nouns)

I’m telling you this, because I want to highlight a particularly elegant and practical example of higher-order functions: Ring middleware. Ring is a Clojure library for writing web apps. It gives you an abstraction on top of HTTP similar to Rack for Ruby or WSGI for Python. Higher-level frameworks/libraries such as Compojure, Moustache, or Sandbar are built on top of Ring.

A simple “Hello, World” with Ring looks like this (from the README):

1
2
3
4
5
6
7
8
(use 'ring.adapter.jetty)

(defn app [req]
  {:status  200
   :headers {"Content-Type" "text/html"}
   :body    "Hello World from Ring"})

(run-jetty app {:port 8080})

A ring handler (app in the example) is simply a function that takes a map representing the incoming request and returns a map representing the response. Such a handler can be given to an adapter (Jetty in this case) that deals with the actual HTTP connection and calls the handler function. Thus the adapter is a higher-order function of the kind I once thought to be trivial.

If we want to serve static files with this app, we can add middleware that wraps our app to look for requests to files in a given directory:

1
2
3
4
5
6
7
8
9
10
(use 'ring.middleware.file 'ring.adapter.jetty)

(defn hello [req]
  {:status 200
   :headers {"Content-Type" "text/html"}
   :body "Hello World"})

(def app (wrap-file hello "public"))

(run-jetty app {:port 8080})

Here, we have a higher-order function of the second kind: wrap-file. hello is a handler function in its own right – it is equivalent to app in the previous example – but we do not pass it the the adapter directly. The result of wrap-file is a new handler function, it has to be, otherwise we couldn’t pass it to the adapter.

The implementation of the wrapper is simple and clean:

1
2
3
4
5
6
7
8
9
10
11
(defn wrap-file
  [app #^String root-path]
  (ensure-dir root-path)
  (fn [req]
    (if-not (= :get (:request-method req))
      (app req)
      (let [path (.substring (codec/url-decode (:uri req)) 1)]
        (or
          (response/file-response path
            {:root root-path :index-files? true :html-files? true})
          (app req))))))

First the function checks if the the directory is actually there by calling ensure-dir. The rest of the code in wrap-file is building the resulting handler function. If the request method is not GET, the inner handler is called. Otherwise, we extract the path and try to make a response out of that using file-response. If that returns nil, the inner handler is called.

Now, this is all nice and well, but the awesome elegance of this approach to middleware becomes apparent, when we realize that wrappers can be wrapped around other wrappers in arbitrary numbers:

1
(def app (wrap-params (wrap-file-info (wrap-file hello "public"))))

The resulting handler has a wrapper that makes the request parameters easier to process and one that adds content-type headers to the response.

As the prefix notation is not the most readable way to express such a chain of call, you would rather use the threading macro to write this:

1
2
3
4
(def app (-> hello
             (wrap-file "public")
             (wrap-file-info)
             (wrap-params)))

But that is just syntactic sugar, the beauty of this solution is possible, because it uses higher-order functions to great effect.

IDSC v: Accessing and Modifying Vectors

In the previous post of the Immutable Data-Structure Canon we looked at vectors, their internal structure, how they are created, and how more elements are inserted. In this post we continue where we left off and examine the code used to access and remove values from a vector.

Accessing Elements

The elements of a vector are accessible by index. The way vectors are usually implemented, this is a constant time operation, as it only takes the calculation of the offset of a memory location. As we’ve seen, Clojure vectors are implemented as trees to allow for shared structures between persistent “modified” versions, so the access methods need to find their way around that structure.

Let’s say, for example, that we have a vector v with 1500 elements and we want to get the one at index 1101. There are three different ways to find an element in a vector by index: the nth function, the get function, and using the vector as a function. They differ in how they handle the vector being nil, the index being out of range, and whether they support a “not found” argument. For this discussion we’ll use nth; it returns nil if the vector is nil, throws an exception, if the index is out of range, but you can pass an optional argument that is returned, when the index is not found.

1
(nth v 1101)

Internally, this function is mapped to a call to the nth method in PersistentVector:

1
2
3
4
public Object nth(int i){
  Object[] node = arrayFor(i);
  return node[i & 0x01f];
}

The arrayFor method handles the lookup in the internal structure and returns on of the 32-element arrays where the elements are stored – either from one of the leaf nodes or from the tail (line 2). The index in that array is calculated by applying a bit-mask to the index passed into the method (line 3). For our example that local index is 13.

1
2
3
4
5
6
7
8
9
10
11
12
public Object[] arrayFor(int i){
  if(i >= 0 && i < cnt)
      {
      if(i >= tailoff())
          return tail;
      Node node = root;
      for(int level = shift; level > 0; level -= 5)
          node = (Node) node.array[(i >>> level) & 0x01f];
      return node.array;
      }
  throw new IndexOutOfBoundsException();
}

The arrayFor method checks that the index is valid and throws an exception if it is not (lines 2, 11). If the requested index is greater than the number of values in the tree, the tail array is returned (lines 4,5). In our example, this is not the case, we need to look into the tree.

As a tree with two layers (root and leaves) can hold 1024 elements, the tree in for a vector with 1500 elements needs three layers. The index we’re looking for is in the second subtree, as it’s greater than 1024. The field shift which holds a multiple of 5 proportional to the height of the tree is 10 in our case, so that we enter the loop in line 7 with 10 as level.

Inside the loop, we find the subtree holding our value by bit-shifting the index by the current level and applying a bit-mask to get it into the 32-element frame. The index for the subtree in our example is 1, as expected. In the second iteration of the loop, we look at the leaves, so that level is decremented to 5. This time, we find the node with index 2. The loop terminates here, as there is not level further down. The method returns the array attached to the node we found.

Summarizing all the index-offsets, we have:

  • the second subtree of the root (the first subtree contains 1024 elements),
  • the third leaf of that subtree (the leaves contain 32 elements each), and
  • the fourteenth element of the leaf (calculated in nth).

Thus we have the 1102nd element of the vector. Bingo!

Like the cons operation we saw last time, the number of steps necessary to find the nth element depends on the height of the tree, so the complexity here is once again O(log32 N).

Deleting Elements

To finish the discussion of vectors, let’s look at deleting elements. The only position where efficient deletions are possible is the end. This can be done using the pop function that is conveniently mapped to the pop method of PersistentVector:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
public PersistentVector pop(){
  if(cnt == 0)
      throw new IllegalStateException("Can't pop empty vector");
  if(cnt == 1)
      return EMPTY.withMeta(meta());
  if(cnt-tailoff() > 1)
      {
      Object[] newTail = new Object[tail.length - 1];
      System.arraycopy(tail, 0, newTail, 0, newTail.length);
      return new PersistentVector(meta(), cnt - 1, shift, root, newTail);
      }
  Object[] newtail = arrayFor(cnt - 2);
  Node newroot = popTail(shift, root);
  int newshift = shift;
  if(newroot == null)
      {
      newroot = EMPTY_NODE;
      }
  if(shift > 5 && newroot.array[1] == null)
      {
      newroot = (Node) newroot.array[0];
      newshift -= 5;
      }
  return new PersistentVector(meta(), cnt - 1, newshift, newroot, newtail);
}

First, the edge-cases are handled: pop on an empty vector does not work (lines 2,3), and pop on a one-element vector returns an empty vector (lines 4,5). Then we look at the easy case that the tail holds more than one element (line 6); we just copy all tail elements except for the last one into a new array and use it to create a new vector that shares the tree with the original (lines 8-10).

Why more than one? Why not one or more?

The condition for the simple case is phrased so that we don’t use it on a tail with only one element. This is because we need to change the tree when the tail run empty; the last node becomes the new tail in that case.

Shrinking the tree starts with getting the array from the last node by calling the arrayFor method (line 12). Next, we call popTail to get a new root node for a tree without the last node (line 13). This leaves us with three cases:

  1. The node we removed was the only node, so that our return vector gets an empty node as root (lines 15-18).
  2. The tree has intermediate levels between the root and the leaves and the removed node was the only leaf in the second subtree, so that we can remove one layer (lines 19-23).
  3. Otherwise the new tree still has the same height as the old tree.

Like the other vector operations we looked at, pop also has a complexity of O(log32 N), because the number of steps necessary is dependent on the height of the tree.

Summary

  • Vectors are implemented based on a tree and a tail array.
  • All arrays (both in the nodes of the tree and in the tail) have up to 32 elements.
  • The leaves always have exactly 32 elements in them.
  • Elements can added and removed efficiently at the end of the vector.
  • Addition, removal, and index lookup are O(log32 N), which is essentially constant time in practice.