Archive

Archive for the ‘F#’ Category

Setting Client Credentials when calling an HTTPS Web Service with the WSDL Type Provider

February 10, 2015 1 comment

.. just because it was so hard to find it online.  (This works with .Net 4.5 and F# 3.1).

You will need to add references to:

  • System.Runtime.Serialization
  • System.ServiceModel
  • System.IdentityModel
  • FSharp.Data.TypeProviders

Here’s the code, I’m sure you can work out the important bit.  It’s calling a WCF service called MyService, your names may vary:

open System
open System.ServiceModel
open System.IdentityModel
open Microsoft.FSharp.Data.TypeProviders

type MyService = WsdlService<"https://someuri.com/MyService">

let someService = MyService.GetBasicHttpBinding_SomeService()

// important bit here!
someService.DataContext.ClientCredentials.UserName.UserName <- "MyUser"
someService.DataContext.ClientCredentials.UserName.Password <- "Password1"

someService.DoStuff()
// you should probably catch some exceptions here

Categories: F#, WCF

Digit recognition in F# with k-nearest neighbours

May 29, 2014 5 comments

In this post, I’ll step through using a simple Machine Learning algorithm called k-nearest neighbours (KNN) to perform handwritten digit recognition.  This is another of the Hello Worlds of ML – it’s how I got introduced to it (via Mathias Brandewinder’s Digit Recogniser Dojo), and also one I’ve run as a dojo a few times myself. It’s based on the Kaggle learning competition of the same name.

KNN is a lazy classification algorithm, which is used to determine what an unknown example of something is based on it’s similarity to known examples.  Specifically, it finds the k most similar (nearest) examples, and classifies it according to what they are.

Here’s a visualisation, in which we’re trying to work out what the unknown green thing is (blue square or red triangle?)

KNN visualisation

 

Based on the three nearest examples, we would classify the unknown item as a red triangle.

(Interestingly, if we’d chosen a k of 5, we would say it’s a blue square – we’ll discuss how to approach that later).

Th dataset we’ll be running this on is (a subset of) the MNIST handwritten digits, which consists of several thousand 28×28 pixel greyscale images that look a bit like this:

MNIST handwritten digits

So how do you determine the ‘distance’ between two such images?  A common way is to use Euclidean distance, which when applied in this situation involves comparing each pixel and summing the differences between them – or, to be more precise, summing the squares of the distances.

Euclidean Distance

(The actual equation, which you can see above, uses the square root of the sum of the squares – but the square root part doesn’t make any difference in this case). So if two images were identical, the distance would be 0.  If two of the pixels were different, by 20 and 50 respectively, the distance would be 20^2 + 50^2 = 400 + 2500 = 2900 (still very similar).

OK, lesson over, let’s get implementing!  I’ll be doing this in F#, but it’s pretty similar in C# if you use a lot of LINQ.

I’ve put the source code on github.  (You can find a basic version of it in C#, as well as several other languages, in my Digit Recogniser Dojo repository).  I’m using arrays for most of this, which although aren’t the most natural thing to use from F#, give better performance – which can make a fair difference when working with big datasets.

I’m going to be using 5000 digits, which is enough to get pretty good results without taking too long.  They come formatted as a CSV, with each record having a label (e.g. which digit it represents) followed by each of the 784 pixels (28×28) represented by a number from 0-255:

digits csv

First of all, let’s read the records, and get rid of the header:

let dataLines =
    File.ReadAllLines(__SOURCE_DIRECTORY__ + """\trainingsample.csv""").[1..]

We can check our progress using F# Interactive, and see that we now have an array of strings containing a record each:

val dataLines : string [] =
  [|"1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0"+[1669 chars];
    "0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0"+[1925 chars];

We can then split the lines up and parse the numbers:

let dataNumbers =
    dataLines
    |> Array.map (fun line -> line.Split(','))
    |> Array.map (Array.map (int))

Now we have an array of arrays of integers:

val dataNumbers : int [] [] =
  [|[|1; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0;
      0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0;
      0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0;
      0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0;
      0; 0; 0; 0; ...|];
    [|0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0;

Working with arrays of arrays doesn’t make things too clear, so let’s create a type to store the records in:

type DigitRecord = { Label:int; Pixels:int[] }

let dataRecords =
    dataNumbers
    |> Array.map (fun record -> {Label = record.[0]; Pixels = record.[1..]})

So now we have an array of DigitRecords:

val dataRecords : DigitRecord [] =
  [|{Label = 1;
     Pixels =
      [|0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0;
        0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0;
        0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0;
        0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0;
        0; 0; 0; 0; ...|];};

Thinking ahead a bit, we’re going to need some way of testing our algorithm to see how good it is (ans also, as described shortly, to work out what value to use for ‘k’). The typical approach for this is to split the known dataset into three – one to train the algorithm, one to choose what options to use, and another to test the accuracy of the final system:

let trainingSet = dataRecords.[..3999]
let crossValidationSet = dataRecords.[4000..4499]
let testSet = dataRecords.[4500..]

We’re nearly at the fun part, so we’ll soon need a way of measuring the distance between two digits, as described earlier. F#’s map2 function comes in very handy for this, applying a function two two same-sized arrays at the same time:

let distanceTo (unknownDigit:int[]) (knownDigit:DigitRecord) =
    Array.map2 (
        fun unknown known ->
            let difference = unknown-known
            int64 (difference * difference)
        ) unknownDigit knownDigit.Pixels
    |> Array.sum

For the unknown parameter we’ll take the raw pixels, as if we ever come to use this on real data we won’t have a label for it (so it won’t be in a DigitRecord).

Let’s give it a test run:

> {Label=1;Pixels=[|120;150|]} |> distanceTo [|100;100|];;
val it : int64 = 2900L

Looks about right!

Now we get to the real meat of it, classifying an unknown digit based on the k nearest known examples. In this function, we compare the digit against he whole training set, take the k closest, and count which label occurs the most times:

let classifyByNearest k (unknownDigit:int[]) =
    trainingSet
    |> Array.sortBy (distanceTo unknownDigit)
    |> Seq.take k
    |> Seq.countBy (fun digit -> digit.Label )
    |> Seq.maxBy (fun (label,count) -> count)
    |> fun (label,count) -> label

F# is particularly good at this kind of code, although you can do something not too much more complicated in C# with a good helping of LINQ (which is basically a functional library).

And with that, we have everything we need to start making predictions! Let’s give it a blast. We need to chose a value for ‘k’ – let’s just use 1 for now:

testSet.[..4]
|> Array.iter (fun digit ->
    printfn "Actual: %d, Predicted: %d"
        digit.Label
        (digit.Pixels |> classifyByNearest 1))
Actual: 5, Predicted: 6
Actual: 3, Predicted: 3
Actual: 9, Predicted: 9
Actual: 8, Predicted: 8
Actual: 7, Predicted: 7

Four out of five, not too shabby! Let’s write a function to find out what the accuracy is for the whole validation set.  We’re taking the dataset to calculate the accuracy on as a parameter because we’ll use it in two different ways, which you’ll see shortly.

let calculateAccuracyWithNearest k dataSet =
    dataSet
    |> Array.averageBy (fun digit ->
        if digit.Pixels |> classifyByNearest k = digit.Label then 1.0
        else 0.0)

So, let’s give it a try!  We still haven’t yet worked out what value to use for k, so let’s stick with 1 for now.

> testSet |> calculateAccuracyWithNearest 1;;
val it : float = 0.93

93%, not bad for 30-odd lines of code!

But what about k, how do we decide what value to use?  This is where the cross-validation set comes in.  The idea is that you use the cross-validation set to test the various options you have (in this case, different values of k) and find the best one(s).  You can then use the test set to get a final figure for the accuracy of your algorithm.  The reason we have separate validation and test sets, instead of just using the same test set for both, is to help get a final figure which is more representative of what our algorithm will get when used on real data.

If we just took the accuracy as measured by the validation set when finding the best options, we’d have an artificially high figure, as we’ve specifically chosen the options to maximise that number.  The chances are, when we run it on real data, the accuracy will be lower.

To begin with, we can try a few values for k and plot the result (with FSharp.Charting), to see if we’re in the right ballpark:

let predictionAccuracy =
    [1;3;9;27]
    |> List.map (fun k ->
        (k, crossValidationSet |> calculateAccuracyWithNearest k))

Chart.Line(predictionAccuracy)
val predictionAccuracy : (int * float) list =
  [(1, 0.93); (3, 0.936); (9, 0.94); (27, 0.91)]

accuracy chart

As you can hopefully see, the accuracy seems to peak somewhere around 10 then starts to drop off. So let’s try all of the values in this range to find the best one (this make take a while – time to get a coffee!):

let bestK =
    [1..20]
    |> List.maxBy (fun k ->
        crossValidationSet |> calculateAccuracyWithNearest k)

Turns out the best value for k, at least according to the validation set, is 6. On the validation set, that gets us a heady 94% accuracy:

> crossValidationSet |> calculateAccuracyWithNearest bestK;;
val it : float = 0.94

As previously described, that’s likely to be a little optimistic, so lets get a final measure using the test set:

> testSet |> calculateAccuracyWithNearest bestK;;
val it : float = 0.926

Final result: 92.6%!

Not a bad result for a simple algorithm. In fact in this case, we actually get a better result on the test set using a k of 1. With so little in it, we might decide just use a ‘1-nearest neighbours’ algorithm, which would be simpler still.

Like many Machine Learning systems, this also gets better the more data you use – I submitted code based on this to the Kaggle contest, which has a dataset of 50,000 digits, and got an accuracy of around 97%. More data would also help with the disparity between the validation and test set, and mean the options we chose using the validation set would be more likely to be optimum for that set as well.

KNN can be used in several situations – you just need some way of measuring of similarity between two things.  For example, you could use Levenshtein distance to compare strings and find the best match for somebody’s name, or physical distance to find the nearest post office for a given home.

An interesting alternative is the weighted nearest neighbour algorithm, which potentially measures the distance to all neighbours but weighs their influence based on their distance (e.g. close ones count for a lot when taking the vote, distant ones hardly at all).

So there we have it, digit recognition with KNN.  I hope you found it interesting.  May your machines learn well!

Categories: F#, Machine Learning

Web requests in F# now easy! Introducing Http.fs

November 15, 2013 3 comments

TL;DR

I’ve made a module which makes HTTP calls (like downloading a web page) easy, available now on GitHub – Http.fs

Introduction

I had a project recently which involved making a lot of HTTP requests and dealing with the responses.  F# being my current language of choice, I was using that.  Unfortunately, .Net’s HttpWebRequest/Response aren’t that nice to use from F# (or C#, frankly).

For example, here’s how you might make an HTTP Post, from F# Snippets:

open System.Text
open System.IO
open System.Net

let url = "http://posttestserver.com/post.php"

let req = HttpWebRequest.Create(url) : ?> HttpWebRequest
req.ProtocolVersion req.Method <- "POST"

let postBytes = Encoding.ASCII.GetBytes("fname=Tomas&lname=Petricek")
req.ContentType <- "application/x-www-form-urlencoded";
req.ContentLength let reqStream = req.GetRequestStream()
reqStream.Write(postBytes, 0, postBytes.Length);
reqStream.Close()

let resp = req.GetResponse()
let stream = resp.GetResponseStream()
let reader = new StreamReader(stream)
let html = reader.ReadToEnd()

There are a few things I don’t really like about doing this:

  • It’s a lot of code!
  • You have to mess around with streams
  • The types used are mutable, so not really idiomatic F#
  • You have to set things (e.g. ‘POST’) as strings, so not typesafe
  • It’s not unit testable
  • You have to cast things (e.g. req) to the correct type

In fact there are many other problems with HttpWebRequest/Response which aren’t demonstrated by this sample, including:

  • Some headers are defined, others aren’t (so you have to set them as strings)
  • You need to mess around with the cookie container to get cookies working
  • If the response code is anything but 200-level, you get an exception (!)
  • Getting headers and cookies from the response isn’t pretty

Since then I’ve discovered HttpClient, which does address some of these issues, but it’s still not great to use from F# (and only available in .Net 4.5).

So I started to write some wrapper functions around this stuff, and it eventually turned into:

Http.fs!

Http.fs is a module which contains a few types and functions for more easily working with Http requests and responses from F#. It uses HttpWebRequest/Response under the hood, although these aren’t exposed directly when you use it.

Downloading a single web page is as simple as:

let page = (createRequest Get "http://www.google.com" |> getResponseBody)

And if you want to do something more in-depth, like the example above, that would look like this:

open HttpClient

let response =
  createRequest Post "http://posttestserver.com/post.php"
  |> withBody "fname=Tomas&lname=Petricek"
  |> withHeader (ContentType "application/x-www-form-urlencoded")
  |> getResponse

Then you could access the response elements like so:

response.StatusCode
response.EntityBody.Value
response.Headers.[Server]

And of course, it has asynchronous functions to let you do things like download multiple pages in parallel:

["http://news.bbc.co.uk"
 "http://www.wikipedia.com"
 "http://www.stackoverflow.com"]
|> List.map (fun url -> createRequest Get url |> getResponseBodyAsync)
|> Async.Parallel
|> Async.RunSynchronously
|> Array.iter (printfn "%s")

There are more details on the GitHub page. The project also contains a sample application which shows how it can be used and tested.

So if you’re using F# and want to make a complex HTTP request – or just download a web page – check out Http.fs!

Update

This is now available on NuGet.  To install:

PM> install-package Http.fs  
Categories: F#, HTTP

Hello Neurons – ENCOG Neural Network XOR example in F#

November 14, 2013 1 comment

I’ve been playing with Machine Learning lately, starting with Abhishek Kumar’s Introduction to Machine Learning video on PluralSight.

This video guides you though using the ENCOG library (available on NuGet) to build a simple neural network for the XOR (eXclusive OR) logic table, which is the ‘Hello World’ of Neural Networks.

I’m not going to go into the details of ML or Neural Networks here (I don’t know them, for a start), but one thing I found was that the .Net ENCOG examples were all in C#.  As such, I though I’d post my F# version here. (See the C# version for comparison).

So, without further ado:

open Encog.ML.Data.Basic
open Encog.Engine.Network.Activation
open Encog.Neural.Networks
open Encog.Neural.Networks.Layers
open Encog.Neural.Networks.Training.Propagation.Resilient

let createNetwork() =
    let network = BasicNetwork()
    network.AddLayer( BasicLayer( null, true, 2 ))
    network.AddLayer( BasicLayer( ActivationSigmoid(), true, 2 ))
    network.AddLayer( BasicLayer( ActivationSigmoid(), false, 1 ))
    network.Structure.FinalizeStructure()
    network.Reset()
    network

let train trainingSet (network: BasicNetwork) =
    let trainedNetwork = network.Clone() : ?> BasicNetwork
    let trainer = ResilientPropagation(trainedNetwork, trainingSet)

    let rec trainIteration epoch error =
        match error > 0.001 with
        | false -> ()
        | true -> trainer.Iteration()
                  printfn "Iteration no : %d, Error: %f" epoch error
                  trainIteration (epoch + 1) trainer.Error

    trainIteration 1 1.0
    trainedNetwork

[<EntryPoint>]
let main argv =

    let xor_input =
        [|
            [| 0.0 ; 0.0 |]
            [| 1.0 ; 0.0 |]
            [| 0.0 ; 1.0 |]
            [| 1.0 ; 1.0 |]
        |]

    let xor_ideal =
        [|
            [| 0.0 |]
            [| 1.0 |]
            [| 1.0 |]
            [| 0.0 |]
        |]

    let trainingSet = BasicMLDataSet(xor_input, xor_ideal)
    let network = createNetwork()

    let trainedNetwork = network |> train trainingSet

    trainingSet
    |> Seq.iter (
        fun item ->
            let output = trainedNetwork.Compute(item.Input)
            printfn "Input: %f, %f Ideal: %f Actual: %f"
                item.Input.[0]  item.Input.[1] item.Ideal.[0] output.[0])

    printfn "Press return to exit.."
    System.Console.Read() |> ignore

    0 // return an integer exit code

The main difference over the C# version is that the training iterations are done with recursion instead of looping, and the training returns a new network rather than updating the existing one. Nothing wrong with doing it that way per se, but it gave me a warm feeling inside to make it all ‘functional’.

It may be a while before I create Skynet, but you’ve got to start somewhere..

Categories: F#, Machine Learning

Making a simple State Machine with F# Actors

July 30, 2013 3 comments

F# comes with a built-in Actor framework, using the MailboxProcessor class.  The main reason for using actors is to simplify concurrent processing, but they also hit another sweet spot – they’re pretty good for implementing Finite State Machines (FSM).

I won’t go into the detail of how to implement a standard actor, but here’s a simple example which maintains a sum of the numbers posted to it.  It uses an asynchronous workflow which runs it asynchronously in a different thread, and a recursive function to maintain the state (although as an actor’s state is not shared, mutable state would be acceptable):

let countingActor =
  MailboxProcessor.Start(fun inbox ->
    let rec loop num = async {
      do printfn "num = %i" num
      let! msg = inbox.Receive()
      return! loop(num+msg) }
    loop 0)

And here it is being called in FSI:

countingActor.Post 10;;
num = 10
val it : unit = ()

countingActor.Post 5;;
num = 15
val it : unit = ()

Perhaps this is not the most elegant-looking or understandable code – I hear actors in Erlang and Scala are much more succinct – but once you give it a try it starts to make sense. For what we’re about to consider, the important point to note is that the recursive ‘loop’ function is being called infinitely, each time waiting until a message has been received.

So, how to make a state machine with it?  To keep this as simple as possible, let’s consider a hugely inefficient climate control system which has only two states – heating and cooling.  We can define this as a MailboxProcessor:

// define the messages which can be used to change the state,
// using a Discriminated Union
type message =
    | HeatUp
    | CoolDown

// define the actor
let climateControl = MailboxProcessor.Start( fun inbox ->

    // the 'heating' state
    let rec heating() = async {
        printfn "Heating"
        let! msg = inbox.Receive()
        match msg with
        | CoolDown -> return! cooling()
        | _ -> return! heating()}

    // the 'cooling' state
    and cooling() = async {
        printfn "Cooling"
        let! msg = inbox.Receive()
        match msg with
        | HeatUp -> return! heating()
        | _ -> return! cooling()}

    // the initial state
    heating()
    )

And post messages to it to change states:

climateControl.Post HeatUp;;
Heating
val it : unit = ()

climateControl.Post CoolDown;;
Cooling
val it : unit = ()

And that’s it! The main points of interest are the states, each defined by a separate function heating() and cooling() – note they are mutually recursive so have to be declared with ‘and’ – and the ‘match’ expression used to change states.

Of course they can get more complicated, but you have the basic pattern. For a more in-depth look at F# actors in general, check out this post by on F# for Fun and Profit.

Happy state transitions!

Categories: F# Tags: ,

F# Koans – Stock Example

March 31, 2013 10 comments

At Leeds Sharp the other day, we started doing the F# Koans as a way of of trying out the language, with the aim of expanding our minds with some functional programming.  A bit like coder’s LSD, if you will.

The koans are pretty good at introducing you to things, although initially it’s a bit too easy to get through them without really understanding what’s going on.

However, when you get to the stock example that all changes, as you have to actually use some of what you’ve learned to solve a more involved problem. The purpose of this post is to put up my solution  in the hope of getting into a bit of a discussion with others who’ve done it, at Leeds Sharp or otherwise. I have no doubt that my implementation could be improved, so any advice would be more than welcome!

module ``about the stock example`` =

  let stockData =
    [ "Date,Open,High,Low,Close,Volume,Adj Close";
      "2012-03-30,32.40,32.41,32.04,32.26,31749400,32.26";
      // ...
      "2012-02-29,31.89,32.00,31.61,31.74,59323600,31.74"; ]

  // function to split a comma-separated string into an array
  let splitOnCommas (dataAsString:string) =
    dataAsString.Split([|','|])

  // function to take the desired parts of the array as a tuple,
  // including opening-closing difference
  let createPriceDifferenceTuple (fullList:string[]) =
    ( fullList.[0],
      abs (Double.Parse(fullList.[1]) - Double.Parse(fullList.[4])) )

  let maximumDifferenceTuple =
    // split strings into arrays, ignoring header
    List.map splitOnCommas stockData.Tail
    // map arrays to tuples
    |> List.map createPriceDifferenceTuple
    // get maximum tuple based on second element (price difference)
    |> List.maxBy snd

  [<Koan>]
  let YouGotTheAnswerCorrect() =
    let result = fst maximumDifferenceTuple // get date, first item in tuple
    AssertEquality "2012-03-13" result

Categories: Development, F#