Saturday, 22 February 2014

Mathematical notation at a glance

Notation... If you have come across it, whether it's mathematical, musical or otherwise, perhaps you thought you had just stumbled into The Matrix or an episode of The Fringe; with the alphanumeric characters and a plethora of symbols, it can look like an alien language or some sort of ASCII art abomination.

While that might be the case, notation is used to convey ideas through symbols and makes life easier, especially when conveying complex ideas to a diverse audience. How?

See what you make of this:

Levenshtein algorithm notation from wikipedia article
Levenshtein Distance/Optimal String Alignment Algorithm
Source: http://en.wikipedia.org/wiki/Levenshtein_distance
Welcome to the real world of notation. I came across this algorithm a while ago where it was used in a problem involving two sets of data with no direct relationship apart from peoples names. A common problem with data such as peoples names is misspellings which means direct matches aren't guaranteed; that's where this algorithm can be put to use. It can also be used in computational biology, but that is beyond the scope of this post (do an internet search for 'Levenshtein biology' if you are intrigued about its use in biology).

The algorithm gives you the distance between two sequences; in other words, the cost involved in changing something into something else. I'm not going to delve into great detail about the Levenshtein algorithm in this post, just how to understand the notation. Feel free to make comments about the algorithm though if you'd like to know more.

So where do you begin with that algorithm? How do we turn that into something more than a notation? First step is to understand the symbols, much like understanding any language. Some elements of that algorithm you probably understand such as the -+ and = symbols, min and max. You will no doubt understand the words if and otherwise, but perhaps not how they are applied. We do need one extra piece of information which is: the Levenshtein distance between two strings a, b  is given by leva,b(|a||b|) which I will explain first.

So, firstly, a string is a sequence of characters which could be numbers, letters or symbols. a, b are used to represent two values, in this case, two strings. Lets say a represents the name 'Jon' and b represents 'Johnny'. So the Levenshtein distance between 'Jon' and 'Johnny' is given by leva,b(|a||b|). What is leva,b(|a||b|)? That is a function which takes two inputs |a| and |b| and outputs something. The vertical lines either side mean the magnitude or length of a and b. What are the lengths of the names 'Jon' and 'Johnny'? They are 3 and 6. We can now rewrite that function like so: lev'Jon','Johnny'(3, 6).

Let's have another look at that algorithm. You can see something that looks similar to leva,b(|a|, |b|) but it isn't quite the same; |a| and |b| are now i and j, but all that means is that i and j represent |a| and |b| in the same way that a and b represent 'Jon' and 'Johnny'. At this point, we should have a better idea about what certain elements of the algorithm mean, or at the very least, what they represent.

There are still curly braces and the if and otherwise words which we may not understand though. Quite simply, the curly braces group things together. Reading from left to right, the far left curly brace groups together the if min(i, j) = 0, max(i, j) and min. The inner curly brace groups together the three function references. The if and otherwise are conditions, so if the minimum value of i and j is equal to 0 then we need the maximum value of i and j which means leva,b(ij) or the Levenshtein distance is the maximum value of i and j which represent the lengths of a and b.

So that is the if condition handled, what about the otherwise? Well, if min(i, j) does not equal 0 then we need the minimum of that inner group with the three function references which means leva,b(ij) or the Levenshtein distance is the minimum value of that inner group.

The inner group is probably the most complicated part of this whole algorithm. Each of those three references refers to the Levenshtein algorithm but with different inputs for i and j. Not only that, but for each of those references, except the last one, we add 1 to whatever leva,b(ij) is equal to which I explained above. So what about this last bit: + 1(aibj)? It means we add 1 if the character at position i of sequence a does not equal the character at position j of the sequence b. So if a and b are 'Jon' and 'Johnny', and position i and j are equal to 3 (or 2 in zero-based arrays), then we take the third letter of each name and compare them. In this instance, the third letters are 'n' and 'h' which are not equal, so we would add 1. You might then ask, what do we add if the two characters are equal? We don't add anything, so it's 0.

Head spinning? You might be happy to know that that is where the glance at notation ends and you can throw questions my way if you would like to know more.

One final thing... I may have shown you how to read elements of mathematical notation, but I never showed you a way of turning it into something more that a bunch of symbols, something that can be used. It may not surprise you that the one way I would show you how it can be used is through computer programming; it's what I do. If you don't understand computer programming I would still encourage you to have a look at this computer code I wrote using the Python programming language and see if you can spot the similarities between the code and the original algorithm.

def LevDist(s, s_len, t, t_len):
 if min(s_len, t_len) == 0:
  return max(s_len, t_len)
 else:
  cost = 0
  if s[s_len-1] != t[t_len-1]:
   cost = 1
  return min(LevDist(s, s_len-1, t, t_len)+1,
                       min(LevDist(s, s_len, t, t_len-1)+1, 
                       LevDist(s, s_len-1, t, t_len-1)+cost))

For the programmers among us, I know this is a very simple and poorly performing solution. The reason it has such poor performance is because of the recursion. The time it takes to determine the distance between strings increases exponentially in relation to the length of the two strings, resulting in poor performance. I won't be discussing better implementations in this post, but if you want to have a look at some potential solutions in one of many programming languages then check out this link to rosettacode.org website.

I also found this resource from stanford.edu regarding edit distances/Levenshtein algorithm which is worth a read: Edit Distance Notes

Tuesday, 18 February 2014

Natural ability - a misused and misunderstood term?

Had an interesting discussion about the term 'natural ability' (or 'naturally talented/gifted') with a friend of mine, Daniel Rose, after he made a comment regarding his 'natural abilities in systems penetration testing'. Now, I'm not highly educated when it comes to language, but I felt that in this instance, natural ability was the incorrect term to use. Why? Well, I felt that his abilities in pen testing were acquired, that is to say, they weren't an innate or inborn ability, he had to learn them.

"Natural ability: an ability that is inherited"
Source: Princeton dictionary
Consider the Princeton dictionary definition of natural ability which is: 'an ability that is inherited'; I find that definition is as broad as Dwayne Johnsons' shoulders and the Princeton dictionary is just one of several that use the same definition. To some degree I'd say the dictionary definition supports Dans use of the term more so than my suggestion that it was an incorrect use. However, when I think of inheriting something, I think of money, disease or perhaps physical traits from my parents, none of which I have had to work for to acquire.

Here's a CNN news article from a few years back about Shaun White, a world renowned professional snowboarder that seems to have an affinity towards board sports. The article is fittingly entitled: "Shaun White: a natural board talent". After watching Shaun in the 2014 Winter Olympics, there is no doubt he is a magician with a board, performing jaw-dropping tricks like the McTwists, Corkscrews and 1440s that seemingly defy the laws of gravity and physics.

But I wouldn't say that his snowboard abilities are inherited, he had to work for them and work damn hard. He may have acquired his snowboarding abilities much faster than his peers and excelled in comparison because of the environment and influences surrounding him, but he still had to learn to ride a snowboard, to get air, to perform a trick and to push the limits of his abilities.

My thinking ties in with some aspects of this article which was written with some conviction and naturally sparks a larger debate and may put a few noses out of joint in the process. I certainly don't believe natural ability is a myth, just a term that is used in the wrong context?

Perhaps use of the term 'natural ability' is a way of referring to our surroundings and the impact they have on an ability that we have acquired or learnt, rather than an ability or skill that we have seemingly acquired without any knowledge, understanding or practice thereof (aptitude). Perhaps I don't understand the use of language, in particular the English language; either way, it intrigues me, and I'd be interested in hearing more thoughts on this.