tag:blogger.com,1999:blog-53362732024-03-07T18:16:14.331+09:00A++ [Eric Torreborre's Blog]Lots of assertions, some pass, some failUnknownnoreply@blogger.comBlogger103125tag:blogger.com,1999:blog-5336273.post-17688768072696417322022-09-17T03:20:00.002+09:002022-09-17T03:21:33.927+09:00ICFP 2022<span style="font-family: arial;">Here are notes for some of the presentations I attended at <a href="https://icfp22.sigplan.org/">ICFP 2022 in Ljubljana</a>, Slovenia.
Most of those notes will only be useful for myself but here is my:</span><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;"><b>TL;DR</b> </span><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">Here are the topics I enjoyed the most (in no particular order):</span></div><div><ul style="text-align: left;"><li><span style="font-family: arial;">Infinite traversals with the Predictable typeclass</span></li><li><span style="font-family: arial;">David Christiansen's keynote drawing a future for Haskell</span></li><li><span style="font-family: arial;">Modelling probabilistic models with the probfx library</span></li><li><span style="font-family: arial;">Open Transactional Actions to get IO + STM</span></li><li><span style="font-family: arial;">The advent of OCaml 5 with concurrency and associated tooling</span></li><li><span style="font-family: arial;">The ongoing work to make GHC more modular</span></li><li><span style="font-family: arial;">The Functional Architecture workshop (lead by Mike Sperber)</span></li><li><span style="font-family: arial;">The rec-def library by Joachim Breitner</span></li></ul></div><div><span style="font-family: arial;">Now for the fuzzy feelings:</span></div><div><ul style="text-align: left;"><li><span style="font-family: arial;">it is really great to be able to see people in person again. Online conferences are just not my thing. Half the point is being able to have informal discussions</span></li><li><span style="font-family: arial;">it feels great to be part of this intersection of academia and industry. They (academia) give us solutions we (industry) give them new problems</span></li><li><span style="font-family: arial;">I like the melting pot between the various languages like Haskell and OCaml, but also Adga, Coq and all the research languages. Being in the same place really benefits everyone </span></li><li><span style="font-family: arial;">I am fascinated to see that we are still finding new calculi on top of the lambda calculus with new and better ways to execute them</span></li><li><span style="font-family: arial;">Yes it's still tough to walk around with an imposter syndrome but hey I had a meaningful conversation about a language for quantum computing so it's not that bad :-)</span></li><li><span style="font-family: arial;">Funding is probably still a problem for Haskell, many ideas, not enough time. But there is definitely some momentum. The past months have been better than before and it seems that it's only improving</span></li></ul></div><div><span style="font-family: arial;"><b><br /></b></span></div><div><span style="font-family: arial;"><b>NOTES</b></span><h1><span style="font-family: arial; font-size: large;">Haskell Implementors workshop (Sunday)</span></h1><span style="font-family: arial;">
The Haskell Implementors workshop gathers GHC developers and users discussing the evolution of the language and tooling.
Here are my highlights for this year. </span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial; font-size: large;">State of GHC </span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;"> The 6-months cadence is bringing us useful production-oriented features like: </span></div><div><ul style="text-align: left;"><li><span style="font-family: arial;"> profiling without -prof </span></li><li><span style="font-family: arial;"> stacktraces </span></li></ul></div><div><span style="font-family: arial;">Some notable community initiatives: </span></div><div><ul style="text-align: left;"><li><span style="font-family: arial;">the <a href="https://github.com/haskellfoundation/error-message-index" target="_blank">Haskell error message index</a> </span></li><li><span style="font-family: arial;"><u>proposal</u>: Hackage.X overlay to spread out changes faster (anyone can propose a patch to a library when a new GHC version is out) </span></li><li><span style="font-family: arial;"><u>proposal</u>: "tick-tock" release cadence -> long-term maintenance releases + frequent non-backported releases </span></li></ul></div><div><span style="font-family: arial; font-size: large;">Compiling Mu with GHC: Halfway Down the Rabbit Hole</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">Mu is a non-lazy Haskell 98 language used at Standard Charted with millions of lines of code.
It contains features not present in GHC to support relational algebra types and programs: </span></div><div><ul style="text-align: left;"><li><span style="font-family: arial;">open data kinds </span></li><li><span style="font-family: arial;">closed type families with fundeps, etc... </span></li></ul></div><div><span style="font-family: arial;">Cortex is the internal language used by Mu and SC is trying to port it to GHC in order to reduce the maintenance burden.
It is not trivial because the GHC API is not well-defined nor stable.</span></div><div><span style="font-family: arial;">One library from Digital Asset helps. <span style="color: #2b00fe;">ghc-lib-generator</span> turns a GHC source tree into a standalone package (= turns GHC into a library). </span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">Some code now compiles with GHC much more work remains to be done around specialization/monophormization etc... </span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial; font-size: large;">A Termination Checker for Haskell Rewrite Rules</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">Rewrite rules are generally not checked. We just hope that: </span></div><div><ul style="text-align: left;"><li><span style="font-family: arial;">they terminate </span></li><li><span style="font-family: arial;">they are consistent (produce the same results depending on the order they are being applied) </span></li></ul></div><div><span style="font-family: arial;">GSOL is a termination + confluence GHC plugin which has been used to check 177 Haskell package (that tool won some termination and checking competitions in 2018 and 2022, yes there is such a thing). </span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;"><u>Good news</u>: all those rules are terminating. </span></div><div><span style="font-family: arial;">Maybe bad news, there are many non-confluent rules
(for example in </span><span style="font-family: courier;">Control.Arrow</span><span style="font-family: arial;">) </span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;"><span style="font-size: large;">Annotating Deeply Embedded Languages</span> </span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">Accelerate is a Haskell library to run computations on different hardware like GPUs.
It uses a deeply embedded DSL where terms are eventually translated to other programs. </span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;"><u>Problem</u>: how to relate the code which is effectively executed, with arbitrary names, to
the names in the original Haskell code? </span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">Accelerate defined a new constraint which generates compilation error when a </span><span style="font-family: courier;">HasCallStack</span><span style="font-family: arial;"> constraint is
missing in a sequence of function calls. </span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: courier;">type SourceMapped = (?requiresSourceMapping :: ReadTheDocs, HasCallStack)
data ReadTheDocs = TakenCareOf
sourceMap :: HasCallStack => (SourceMapped => a) -> a</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">This is probably a good thing for users of free monads but the thing I really need is [<a href="https://gitlab.haskell.org/ghc/ghc/-/issues/18159" target="_blank">exception stacktraces</a> :-). </span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial; font-size: large;">Modularizing GHC </span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">This work is led by a small group at IOG who need to have better backend for GHCJS in order to integrate off-chain JavaScript code in their execution pipeline. </span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;"><u>Problem</u>: GHC is a very old compiler which could benefit from a *lot* of refactorings. The details can be found in <a href="https://hsyl20.fr/home/files/papers/2022-ghc-modularity.pdf" target="_blank">ghc-modularity</a> but one obvious example is the sharing of a </span><span style="font-family: courier;">DynFlags</span><span style="font-family: arial;"> data type which contains dynamic configuration flags which have more than 600 uses. </span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">We can expect from this work: </span></div><div><ul style="text-align: left;"><li><span style="font-family: arial;">easier experiments with GHC </span></li><li><span style="font-family: arial;">easier integration with other tools in the eco-system </span></li><li><span style="font-family: arial;">maybe even better performances </span></li></ul></div><div><span style="font-family: arial; font-size: large;">Recursive definitions </span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">Typical example of an elegant recursive computation in Haskell: compute the transitive closure of a graph. </span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;"><u>Problem</u>: what if the graph is cyclic?</span></div><div><br /></div><div><span style="font-family: arial;"><a href="https://hackage.haskell.org/package/rec-def" target="_blank">ref-def</a> is a library with a </span><span style="font-family: courier;">R</span><span style="font-family: arial;"> datatype allowing the recursive definition of sets.
It uses </span><span style="font-family: courier;">unsafePerformIO</span><span style="font-family: arial;"> under the hood but offers a pure interface. </span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial; font-size: large;">Across the pond </span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;"> Let's compare the Haskell and the Racket ecosystems.
For example Hoogle is nice but </span></div><div><span style="font-family: arial;">`</span><span style="font-family: courier;">hoogle --local</span><span style="font-family: arial;">` does not work out of the box because it needs information that only Cabal has.</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">On the other hand all the tools for Racket, documentation, build tool, compiler, etc... are integrated as libraries usable directly from the repl. </span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">Why is Racket better? </span></div><div><ul style="text-align: left;"><li><span style="font-family: arial;">the context is shared by all the libraries / tools </span></li><li><span style="font-family: arial;">the context API is very stable </span></li></ul></div><div><span style="font-family: arial;"><span style="font-size: large;">Haskell Playground </span></span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;"> The <a href="https://play-haskell.tomsmeding.com" target="_blank">Haskell Playground</a> started as a pastebin for Haskell
but it supports now several versions of GHC and produces Core/ASM code.
Beside the communication and test of snippets this can be very useful to investigate performance and optimisation issues IMO. </span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;"><span style="font-size: large;">CSI: Haskell: Fault-Localization in Lazy Languages using Runtime Tracing</span></span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;"><u>Idea</u>: use the Haskell coverage support to add trace information when runtime errors occur
- since Haskell is lazy it is likely that the producer of a faulty value is going to be close to its consumer
If we report this (summarized) trace information in the error report then we have a form of data flow analysis
and we can use it to track down the source of faulty data.</span></div></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;"><span style="font-size: x-large;">ICFP Day 1</span></span></div><div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial; font-size: large;">Keynote: Programming the network</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">Network programming is fairly static with a strong division between a "control plane" (what's the topology of the network) and a "data plane"</span></div><div><span style="font-family: arial;">(how data is routed from one node to another).</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">But it would be nice to make it more programmable:</span></div><div><ul style="text-align: left;"><li><span style="font-family: arial;">take an existing in-house network and move it to the cloud keeping the same topology</span></li><li><span style="font-family: arial;">do traffic engineering. Traditionally done with a protocol, today we'd like to run optimisers and change the config more dynamically</span></li><li><span style="font-family: arial;">debug the traffic: understand how a given packet flows</span></li><li><span style="font-family: arial;">support caching, coordination protocols, have failure detectors built into the network</span></li></ul></div><div><span style="font-family: arial;">PL techniques can be used to support this. The NetKAT DSL describes links and behaviours which can then be compiled to forwarding tables.</span></div><div><span style="font-family: arial;">We can also use techniques to make sure that some security policies are maintained when a network is reconfigured.</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial; font-size: large;">The Theory of Call-by-Value Solvability</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">The lambda calculus is a minimal calculus, with no notion of "result".</span></div><div><span style="font-family: arial;">We need to be able to distinguish divergent terms from other terms.</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">The proper way, the one that gives a consistent theory, is to define a notion of "solvability":</span></div><div><ul style="text-align: left;"><li><span style="font-family: arial;">a term is solvable if there is a head context </span><span style="font-family: courier;">H</span><span style="font-family: arial;"> such that </span><span style="font-family: courier;">H<t></span><span style="font-family: arial;"> reduces to identity</span></li></ul></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">This is a way of saying that some non-terminating terms can still be subterms of a terminating term and it's ok. </span><span style="font-family: arial;">There are different characterizations of solvability and one of them is multitypes (multisets of intersected types)</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial; font-size: large;">A Simple and Efficient Implementation of Strong Call by Need by an Abstract Machine</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">Reminder: call by need is like call by name, terms are only evaluated when required but they are evaluated once. </span><span style="font-family: arial;">How to implement this calculus efficiently?</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;"> 1. Start with an abstract machine (KN machine)</span></div><div><span style="font-family: arial;"> 2. deconstruct it to some functional code (definitial interpreter)</span></div><div><span style="font-family: arial;"> 3. optimise the functional code</span></div><div><span style="font-family: arial;"> 4. reconstruct an optimal machine (RKNL machine)</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">How can we make sure that RKNL implements strong CBN?</span><span style="font-family: arial;"> This is proven through a "Ghost Abstract Machine" which simplifies the rules</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">The implementation efficiency is proven via a potential function.</span></div><div><br /></div><div><span style="font-family: arial; font-size: large;">Multi-types and reasonable space</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">In a computational model time is modelled as number of steps to compute something,</span></div><div><span style="font-family: arial;">and space is modelled as the maximum size of visited configurations.</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">We can use multi-types to relate the type of a term to the size of derivation of a subterm, hence give an idea of the time taken. </span><span style="font-family: arial;">What about space? If sharing is used in a machine (for time reasons) then space is hard to compute.</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">The Space Krivine Abstract Machine accounts for space: sharing is limited to terms (to account for the size of the input), </span><span style="font-family: arial;">and garbage collection is eager.</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">The Space KAM is a good ("reasonable" = linearly closed to the reality) space cost model (there's a translation to a Turing Machine to a Space KAM).</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">Unfortunately multi-types do not account for the size of closures.</span></div><div><span style="font-family: arial;"><u>Idea</u>: enrich multi-types with closure types. A multi-set is labelled with a natural number for the closure size.</span></div><div><span style="font-family: arial;">Then closure types are a complete and sound methodology for the Space KAM.</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;"><u>Future work</u>: type systems for space complexity analysis (this is not doable for time since type inference is an undecidable property)</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial; font-size: large;">Denotational semantics as a foundation for cost recurrence extraction for functional languages</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">Can we use denotational semantics to come up with the equations for calculating the algorithmic cost of a function? </span><span style="font-family: arial;">Several steps:</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;"> 1. transform the program into a writer monad with usage costs</span></div><div><span style="font-family: arial;"> 2. interpret in a model, using naturals to interpret a tree for example</span></div><div><span style="font-family: arial;"> 3. have a theorem to prove that we really have a model of the recurrence language</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial; font-size: large;">Random Testing of a Higher-Order Blockchain Language (experience report)</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">In smart contracts we need to deal with:</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;"> 1. static semantic bugs: an exception is thrown and the contract function cannot be executed</span></div><div><span style="font-family: arial;"> 2. cost semantic bugs: not enough gas is charged for expensive computations => DDOS attacks</span></div><div><span style="font-family: arial;"> 3. compiler exploit: a compiler can be exploited by sending some code that blows up the size of the generated program</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">Can we use property-based testing to avoid this? Scilla is a smart contract language based System F + extensions and is not Turing complete (it has structural recursion).</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">Scilla has an OCaml monadic interpreter. How to generate well-typed terms for System F?</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">Use QuickChick (Coq library):</span></div><div><ul style="text-align: left;"><li><span style="font-family: arial;">easy to define generators for ASTs</span></li><li><span style="font-family: arial;">can generate OCaml code</span></li><li><span style="font-family: arial;">has support for fuzzing</span></li></ul></div><div><span style="font-family: arial;">Generating type applications in a bit tricky. The trick is to use "unsubstitution".</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">Bugs were found in:</span></div><div><ul style="text-align: left;"><li><span style="font-family: arial;">the interpreter: conversion bugs</span></li><li><span style="font-family: arial;">the interpreter: charged gas</span></li><li><span style="font-family: arial;">typeflow analysis</span></li></ul></div><div><span style="font-family: arial; font-size: large;">A Completely Unique Account of Enumeration</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">Enumerators can enumerate all the values of a data type. This can be useful for testing</span></div><div><span style="font-family: arial;">in the way that <a href="https://github.com/Bodigrim/smallcheck" target="_blank">smallcheck</a> and <a href="https://hackage.haskell.org/package/leancheck" target="_blank">leancheck</a> do with the idea that</span></div><div><span style="font-family: arial;">"If a program fails to meet its specification in some cases, it almost always fails in some simple case".</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">This work has found a way to define enumerations that are provably:</span></div><div><ul style="text-align: left;"><li><span style="font-family: arial;">complete: they list all elements</span></li><li><span style="font-family: arial;">unique: they produce each element once</span></li><li><span style="font-family: arial;">fair: they interleave elements of lists</span></li></ul></div><div><span style="font-family: arial;">They have also shown that enumerators can be derived from Generics and keep the same properties. It also works with indexed type families. </span><span style="font-family: arial;">The paper mentions that Generics with true <a href="http://edsko.net/pubs/TrueSumsOfProducts.pdf" target="_blank">sums-of-products</a> would be better (I agree, I think that should be the default way to have generics).</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">After discussing with one of the authors and doing a bit of experimentation I realized</span></div><div><span style="font-family: arial;">that it was hard to do an enumeration of value by size because of the way recursion is handled in recursive data types. </span><span style="font-family: arial;">The way it is done, we get an enumeration per depth in a tree for example. I think I need to come back to something like <a href="https://hackage.haskell.org/package/testing-feat" target="_blank">FEAT</a>.</span></div></div><div><br /></div><div><span style="font-family: arial; font-size: xx-large;">ICFP Day 2</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial; font-size: large;">Keynote: Call-by-Push-Value, Quantitatively</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">The Bang Calculus, introduced in 2016, is a call by push value calculus which encompasses call by name (CBN) and call by value (CBV). </span><span style="font-family: arial;">It is complete and coherent (It took 4 years and some iterations to prove this!)</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">Concretely speaking the bang calculus adds some constructs to the lambda calculus:</span></div><div><ul style="text-align: left;"><li><span style="font-family: arial;"> </span><span style="font-family: courier;">!t</span><span style="font-family: arial;"> "bang", to make a thunk)</span></li><li><span style="font-family: arial;"> </span><span style="font-family: courier;">der</span><span style="font-family: arial;"> "dereliction", to force a thunk</span></li><li><span style="font-family: arial;"> explicit substitution.</span></li></ul></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">Then 3 rules are given for its execution:</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;"> 1. Beta rule: introduces a new explicit substitution (let binding)</span></div><div><span style="font-family: arial;"> 2. substitution: substitutes a bang term</span></div><div><span style="font-family: arial;"> 3. value/computation "dereliction": </span><span style="font-family: courier;">der(!t) -> t</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">This calculus has a resource aware semantic, its complete and confluent (and all evaluation sequences to normal form have the same length </span><span style="font-family: arial;">We have some translations from CBN/CBV to the Bang calculus (due to Girard in '76). This allows to prove some qualitative properties: soundness, completeness.</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">What about quantitative properties: how much time / space?</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;"><u>Idea</u>: use non-idempotent multitypes: types are multisets (duplicated elements -> then you can count things!) </span><span style="font-family: arial;">Those types can type non-terminating terms so there's no decidable type inference.</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">However it is possible to show some properties like time + space <= nb nodes in type derivation.</span></div><div><span style="font-family: arial;">It is even possible to go further and find some properties for time and space independently.</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">Another interesting property is: decide if a type is inhabitable. </span><span style="font-family: arial;">An algorithm has been defined to do this: it terminates, is sound, complete, and finds all the generators for those terms.</span></div><div><br /></div><div><span style="font-family: arial; font-size: large;">Datatype-Generic Programming Meets Elaborator Reflection</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">For each datatype defined in Agda, like 'List' we expect to have corresponding proofs</span></div><div><span style="font-family: arial;">like `here` and `there` to do some lookups:</span></div><div><br /></div><div><span style="font-family: courier;">data Any {p} (P : A → Set p) : List A → Set (a ⊔ p) where</span></div><div><span style="font-family: courier;"> here : ∀ {x xs} (px : P x) → Any P (x ∷ xs)</span></div><div><span style="font-family: courier;"> there : ∀ {x xs} (pxs : Any P xs) → Any P (x ∷ xs)</span></div><div><br /></div><div><span style="font-family: arial;"><u>Problem</u>: this is tedious and looks quite mechanical to derive</span></div><div><span style="font-family: arial;"><u>Idea</u>: use the elaborator reflection to define those data type. This would be a bit like using Generics / TemplateHaskell in Haskell.</span></div><div><span style="font-family: arial;">Main difficulty; writing generic code is difficult and error-prone</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial; font-size: large;">Practical generic programming over a universe of datatypes</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">This is a variation of the previous talk. Can we derive properties like </span><span style="font-family: courier;">deriving Eq</span><span style="font-family: arial;"> in Agda?</span></div><div><span style="font-family: arial;">This team has defined some support for generics (a "universe of descriptions") plus some combinators to be able to do that. </span><span style="font-family: arial;">And now (among other things) decidable equality can be implemented by just deriving it from some data types.</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial; font-size: large;">Structural Versus Pipeline Composition of Higher-Order Functions (experience report)</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;"><u>Context</u>: try to have students solve problems using higher-order functions</span></div><div><span style="font-family: arial;"><u>Problem</u>: most problems are either too easy or to hard</span></div><div><span style="font-family: arial;"><u>Insight</u>: there's a difference between structural composition and pipeline composition</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">structural</span></div><div><span style="font-family: courier;"> example_fun :: [a] -> b</span></div><div><span style="font-family: courier;"> example_fun as = hof1 (\inner -> hof2 arg inner) as</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">pipeline</span></div><div><span style="font-family: courier;"> example_fun :: [a] -> b</span></div><div><span style="font-family: courier;"> example_fun as = hof1 f (hof2 g as)</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">An experiment was run on student and it turns out, contrary to experts intuition, that structural solutions are easier to find.</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">For example "map elements in a list of lists and filter each list", which is `map (filter condition)`.</span></div><div><span style="font-family: arial;">The reason for this might be related to program synthesis and the fact that structural composition imposes more constraints on the possible types.</span></div><div><br /></div><div><span style="font-family: arial; font-size: x-large;">ICFP Day 3</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial; font-size: large;">Keynote: Retrofitting concurrency</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">How to add concurrency to OCaml 4.0?</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">In 2014, OCaml was used a lot but with a GIL like Python. T</span><span style="font-family: arial;">he objective was to add concurrency without breaking anything:</span></div><div><ul style="text-align: left;"><li><span style="font-family: arial;">OCaml has low latency ~10ms tolerance. We want to keep an efficient GC</span></li><li><span style="font-family: arial;">we need to make sure that the maintenance burden stays low by not having 2 runtimes like GHC</span></li><li><span style="font-family: arial;">existing sequential programs should run the same</span></li></ul></div><div><span style="font-family: arial;">Main points:</span></div><div><ul style="text-align: left;"><li><span style="font-family: arial;">there were several iterations and even a full rewrite to implement a concurrent garbage collector (major heap -> mostly concurrent, minor heap -> stop the world parallel)</span></li><li><span style="font-family: arial;">working on data races showed the need for a new memory model with a "DRF-SC" guarantee (data-race-freedom sequential-consistency)</span></li><li><span style="font-family: arial;">instead of baking the scheduler in the runtime system like in GHC extract it as a library (trying to extract the Scheduler from GHC's RTS proved to be really hard)</span></li><li><span style="font-family: arial;">implement delimited continuations, they are easier to understand than shift/reset</span></li><li><span style="font-family: arial;">care for users, don't break their code. Most of the code is likely to stay sequential</span></li><li><span style="font-family: arial;">use build tools to ease the transition: OPAM heath check checks the compatibiliy of package on every 5.0-alpha release</span></li><li><span style="font-family: arial;">benchmark on _real_ programs (Coq, Irmin - database, for example). http://sandmark.tarides.com is a benchmarking site as a service</span></li><li><span style="font-family: arial;">invest in tooling</span></li><li><span style="font-family: arial;">it is hard to maintain a separate fork for 7 years, hard to keep up with main</span></li><li><span style="font-family: arial;">it is hard to get approbation from the main developers</span></li><li><span style="font-family: arial;">peer-reviewed papers add credibility to such and effort</span></li><li><span style="font-family: arial;">tooling + benchmarks help</span></li><li><span style="font-family: arial;">the last 10% to finish Ocaml 5 now requires lots of engineering effort, and less of academic effort</span></li></ul></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">Many more things to come with the vision of being as fast as Rust with a GC and have the type safety of ML.</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;"><span style="font-size: large;">Beyond Relooper: Recursive Translation of Unstructured Control Flow to Structured Control Flow</span></span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">Control flow in WASM is very simple:</span></div><div><ul style="text-align: left;"><li><span style="font-family: courier;">if ... then ... else ...</span><span style="font-family: arial;"> conditionals</span></li><li><span style="font-family: courier;">loop ... end</span><span style="font-family: arial;"> loop (no conditionals)</span></li><li><span style="font-family: courier;">block ... end</span><span style="font-family: arial;"> block of code</span></li><li><span style="font-family: courier;">br k</span><span style="font-family: arial;"> escape the current nesting(s)</span></li></ul></div><div><span style="font-family: arial;"><u>Problem</u>: translate structured programming with functions, variables, conditionals to the to the form above in a compiler. </span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">There's a heuristic algorithm for this kind of thing in the JavaScript world (Relooper) but a complete solution has been described since the '70s! </span><span style="font-family: arial;">That algorithm is hard to understand and has 3 passes. Can we do better?</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">Yes! Use functional programming and techniques from static analysis</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;"> 1. build and ADT</span></div><div><span style="font-family: arial;"> 2. from the outside in</span></div><div><span style="font-family: arial;"> 3. by recursion over dominator trees (which is a notion of which unique nodes need to be executed before executing something else) in the control flow graph</span></div><div><span style="font-family: arial;"> 4. order children by reverse postorder</span></div><div><span style="font-family: arial;"> 5. use control context for branches</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;"><u>Note</u>: within one day Joachim Breitner was able to use his rec-def library to make the code even nicer: https://www.joachim-breitner.de/blog/795-rec-def__Dominators_case_study</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial; font-size: large;">Automatically Deriving Control-Flow Graph Generators From Operational Semantics</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">Various control graphs can be produced from a given program depending on how much information we want to keep. </span><span style="font-family: arial;">Some analyzers / model checkers need different CFGs. </span><span style="font-family: arial;">But what is a CFG anyway? Is there a systematic way to derive a CFG from the semantics of a language?</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">The answer is yes by using abstract machines and abstract evaluation</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial; font-size: large;">Analyzing binding extent in CPS</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">"we are the anti-MLton". MLton is a full program SML compiler for high-performance executables:</span></div><div><ul style="text-align: left;"><li><span style="font-family: arial;">by monomorphization</span></li><li><span style="font-family: arial;">by defunctionalization</span></li></ul></div><div><span style="font-family: arial;">3CPS is a totally different approach:</span></div><div><ul style="text-align: left;"><li><span style="font-family: arial;"> flexible</span></li><li><span style="font-family: arial;"> separate compilation</span></li><li><span style="font-family: arial;"> keep polymorphic terms</span></li><li><span style="font-family: arial;"> aimed at making FP fast</span></li></ul></div><div><span style="font-family: arial;"><u>Problem</u>: </span></div><div><span style="font-family: arial;"> FP is all about keeping track of the environment.</span></div><div><span style="font-family: arial;"> when there are closures some bindings must move from the stack to the heap</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;"><u>Idea</u>:</span></div><div><ul style="text-align: left;"><li><span style="font-family: arial;">define 3 "extents": heap / stack / register</span></li><li><span style="font-family: arial;">every variable has an extent describing its lifetime</span></li><li><span style="font-family: arial;">heap extent: every variable has it but it is the most heavyweight machine resource</span></li><li><span style="font-family: arial;">stack extent: the binding lifetime must be shorter than the stack frame around it. As a syntactic approximation: If there's no reference from x in a lambda it's ok (because we don't need a closure)</span></li><li><span style="font-family: arial;">register extent. As a syntactic approximation: there is no function call between the definition of a variable and its uses</span></li></ul></div><div><span style="font-family: arial;">The paper shows how to do a lot better than syntactic approximations and as a result rewrites some programs so that </span><span style="font-family: arial;">90% of their variables can be moved to the stack. And this analysis is fast to perform.</span></div><div><br /></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">Question from SPJ: "I envy your work, how will it work for lazy evaluation?". </span></div><div><span style="font-family: arial;">Answer "it's always a good thing if SPJ is envying your work etc..." (I did not understand the answer :|)</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;"><u>Note</u>: this work looks actually quite close to the work presented by Stephen Dolan on using global / local modalities to keep values on the stack.</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial; font-size: large;">do Unchained</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;"><u>Context</u>: reimplementing Lean 4 in Lean</span></div><div><span style="font-family: arial;"><u>Problem</u>: some imperative C++ was hard to translate, the do notation had to be used</span></div><div><span style="font-family: arial;"><u>Idea</u>: </span></div><div><ul style="text-align: left;"><li><span style="font-family: arial;">implement some "imperative lean" with a new syntax</span></li><li><span style="font-family: arial;">model effects as monad transformers: mutation becomes State, return becomes Except</span> <span style="font-family: courier;">monad, for in / break / continue become ExceptT + fold + ExceptT</span></li></ul></div><div><span style="font-family: arial;">It was first implemented as a macro in Lean.</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">Consequences:</span></div><div><ul style="text-align: left;"><li><span style="font-family: arial;">"invisible destructive updates"</span></li><li><span style="font-family: arial;">intensively used in Lean 4 but also in 31 out of 43 repositories, people are even starting to use it with the identity monad</span></li></ul></div><div><span style="font-family: arial; font-size: large;">Fusing industry and academia at Github</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">Semantic is a tool which can understand 9 targets languages (no Haskell yet because the syntax is a bit complex, and there are less users than ruby :-)) and:</span></div><div><ul style="text-align: left;"><li><span style="font-family: arial;">helps make a table of contents in PR</span></li><li><span style="font-family: arial;">provides code navigation. This is a high traffic service, no issues or outages</span></li></ul></div><div><span style="font-family: arial;">4 case studies in how academia helped:</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;"> 1. <u>parsing</u>: they are many parsers, one for each language. Running native tooling is not maintainable.</span><span style="font-family: arial;"> </span><span style="font-family: courier;">tree-sitter</span><span style="font-family: arial;"> is an incremental, error-tolerant parser. The grammar is in a JavaScript DSL.</span><span style="font-family: arial;"> It gives a consistent API. It is based on GLR parsing, sufficiently expressive for Ruby and fast enough for Github</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;"> 2. <u>syntax</u>: can we share syntax types? Yes with a data types a la carte approach, including the case where the code can't parse.</span><span style="font-family: arial;"> The team implemented </span><span style="font-family: courier;">fast-sum</span><span style="font-family: arial;">, an "open union" record to work with > 140 syntax nodes for Typescript.</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;"> 3. <u>diffing</u>: can we have syntax-aware diffing to get better diffs? Using recursion schemes helped here because diffing is a recursive operation</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;"> 4. <u>program analysis</u>: understand programs!</span></div><div><span style="font-family: arial;"> Implementing this necessitated a new effects library to mix several state effects + non-determinism => fused-effects</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">Difficulties:</span></div><div><ul style="text-align: left;"><li><span style="font-family: arial;">a la carte syntax is a bit imprecise</span></li><li><span style="font-family: arial;">build times, editor tooling</span></li><li><span style="font-family: arial;">Generic programming is hard</span></li></ul></div><div><span style="font-family: arial;">Successes:</span></div><div><ul style="text-align: left;"><li><span style="font-family: arial;">30k req/min -> one bug one day because of a crash in the GHC event loop (because of one particular issue with Dell servers)</span></li><li><span style="font-family: arial;">tree-sitter became foundational</span></li><li><span style="font-family: arial;">recursion schemes work great</span></li><li><span style="font-family: arial;">algebraic effects are awesome</span></li></ul></div><div><span style="font-family: arial; font-size: large;">Modelling probabilistic models</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">A probabilistic model is aa set of relationships between random variables. For example a linear regression. </span><span style="font-family: arial;">We can use it in 2 ways</span></div><div><ul style="text-align: left;"><li><span style="font-family: arial;">simulation: to simulate some outcome</span></li><li><span style="font-family: arial;">inference: to compute the model parameters based on the observations</span></li></ul></div><div><span style="font-family: arial;">ProbFX is a Haskell library doing this elegantly. The models are:</span></div><div><ul style="text-align: left;"><li><span style="font-family: arial;">first-class citizens</span></li><li><span style="font-family: arial;">compositional</span></li><li><span style="font-family: arial;">typed</span></li></ul></div><div><span style="font-family: arial;">Example</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">HMM: Hidden Markov Model -> takes a transition model, an observable model for one node and replicates it to get a chain. This can be used to model </span><span style="font-family: arial;">an epidemic with the the SIR model (susceptible / infected / recovered).</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial; font-size: large;">Constraint-based type-inference for FreezeML</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">'70 Hindler-Milner type inference (algorithm W)</span></div><div><span style="font-family: arial;">'80 constraint-based type system and inference (more extensible than algo W). Pratically used by GHC</span></div><div><span style="font-family: arial;">'90 ML with first-class polymorphism (supporting forall a. [a] -> [a], polymorphic instantiation forall b. b -> b/a) -> large design space</span></div><div><span style="font-family: arial;">'20 FreezeML tries to use algorithm W, focuses on simplicity</span></div><div><br /></div><div><span style="font-family: courier;">single :: forall a. a -> [a], id :: forall b. b -> b</span></div><div><br /></div><div><span style="font-family: arial;">what is </span><span style="font-family: courier;">single id</span><span style="font-family: arial;">?</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">FreezeML uses "freezing" to distinguish the 2 cases:</span></div><div><br /></div><div><span style="font-family: courier;"> single id : [a -> a]</span></div><div><span style="font-family: courier;"> single |id| : [forall b. b -> b]</span></div><div><br /></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial; font-size: large;">Linearly Qualified Types: Generic inference for capabilities and uniqueness</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">A linear function '</span><span style="font-family: courier;">a %1 -> b</span><span style="font-family: arial;">' should be read as "if the function is consumed exactly once then the resource is consumed exactly once". </span><span style="font-family: arial;">When your code requires a linear function then you can pass a resource to that function and you know that it will be used safely</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;"><u>Problem</u>: some simple cases are still quite ugly to implement</span></div><div><br /></div><div><span style="font-family: courier;">swap :: Int -> Int -> MArray a %1 -> MArray a</span></div><div><span style="font-family: courier;">swap i j as =</span></div><div><span style="font-family: courier;"> let</span></div><div><span style="font-family: courier;"> !(as, Ur ai) = get i as</span></div><div><span style="font-family: courier;"> !(as'', Ur aj) = get j as'</span></div><div><span style="font-family: courier;"> as''' = set i aj as''</span></div><div><span style="font-family: courier;"> as'''' = set j ai as''''</span></div><div><span style="font-family: courier;"> in</span></div><div><span style="font-family: courier;"> as''''</span></div><div><br /></div><div><span style="font-family: arial;"><u>Idea</u>: use GHC constraints (and define "linear constraints")</span></div><div><br /></div><div><span style="font-family: courier;">swap :: RW n %1 => Int -> Int -> MArray n a -> () < RW n</span></div><div><span style="font-family: courier;">swap i j as = let</span></div><div><span style="font-family: courier;"> !Ur ai = get i as</span></div><div><span style="font-family: courier;"> !Ur aj = get j as</span></div><div><span style="font-family: courier;"> !() = set i aj as</span></div><div><span style="font-family: courier;"> !() = set j ai as</span></div><div><span style="font-family: courier;"> in ()</span></div><div><span style="font-family: courier;"><br /></span></div><div><span style="font-family: courier;">newMArray :: Int (MArray a %1 -> Ur b) %1 -> Ur b</span></div><div><span style="font-family: courier;"><br /></span></div><div><span style="font-family: courier;">fromList :: Linearly %1 => [a] -> Array a</span></div><div><span style="font-family: courier;">fromList as = do</span></div><div><span style="font-family: courier;"> let arr = newArray (length as)</span></div><div><span style="font-family: courier;"> let arr' = foldr (uncurry set) (zip [1..] as)</span></div><div><span style="font-family: courier;"> freeze arr'</span></div><div><br /></div><div><span style="font-family: arial; font-size: x-large;">Haskell Symposium Day 1</span></div><div><div><span style="font-size: large;"><br /></span></div><div><span style="font-family: arial; font-size: large;">Keynote: Cause and effects</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">This was mostly a talk about "Why I fell in love with Haskell and effects" and a QandA session around fused-effects.</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">Principle of least privilege: using Alternative instead of Maybe gives you additional power. Because then you can use List, NonDeterm etc... Just require what you need and not more.</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;"><u>Question</u>: checked exceptions can be painful. Did you feel that pain?</span></div><div><span style="font-family: arial;"><u>Answer</u>:</span></div><div><ul style="text-align: left;"><li><span style="font-family: arial;">one possibility: one error type to rule them all -> bad because coupling</span></li><li><span style="font-family: arial;">one effect per error type: annoying to track</span></li><li><span style="font-family: arial;">good: prism to extract a specific type out of a global error type. Note: we can do the same thing with lenses and State where we project only the state we need</span></li></ul></div><div><span style="font-family: arial;"><u>Question</u>: what about purity? what about having to lift?</span></div><div><span style="font-family: arial;"><u>Answer</u>: purity / impurity is not so much the question but value vs computation is more relevant.</span></div><div><span style="font-family: arial;"> For example when getting results if we get a value we get the "least privileged entity" in a sense</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial; font-size: large;">Functional Architecture open space</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">This was not part of the Haskell symposium but I had the opportunity to discuss functional architectures around:</span></div><div><ul style="text-align: left;"><li><span style="font-family: arial;">writing modular applications with records of functions (I quickly introduced registry)</span></li><li><span style="font-family: arial;">are large architectures still functional? (cf "turning the database inside out" and what is done at Standard Chartered with relational data as data structures)</span></li><li><span style="font-family: arial;">the pitfalls of domain modelling (we traded some examples)</span></li></ul><div><span style="font-family: arial;">Kudos to Mike Sperber for organizing this!</span></div></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial; font-size: large;">A Totally Predictable Outcome: An Investigation of Traversals of Infinite Structures</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">Traversable functors have been fully characterized as being polynomial functors.</span></div><div><span style="font-family: arial;">But this proof only works in a finitary setting! What about Haskell with its infinite lists?</span></div><div><span style="font-family: arial;">For example we cannot traverse an infinite list with </span><span style="font-family: courier;">Maybe</span><span style="font-family: arial;"> because we need to know if there is a </span><span style="font-family: courier;">Nothing</span><span style="font-family: arial;"> in the list at some stage.</span></div><div><br /></div><div>Short answer: infinite traversals are productive iff they use "Predictable Applicative functors".</div><div><br /></div><div><span style="font-family: courier;">newtype Later a = Later a deriving (Functor, Applicative)</span></div><div><span style="font-family: courier;">predictable :: Later (f a) -> f (Later a)</span></div><div><br /></div><div><span style="font-family: arial;">There are more Predictables than Representables (ex Writer)</span></div><div><span style="font-family: arial;">There are Predictables that are not Applicatives</span></div><div><span style="font-family: arial;">There are Predictables that are not strictly positive</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">The paper explores all of this including the traversal of bi-infinite lists where you can infinitely append elements on both ends!</span></div><div><br /></div><div><span style="font-family: arial; font-size: large;">Open Transactional Actions: Interacting with non-transactional resources in STM Haskell</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">We want to be able to use IO in STM Haskell. <a href="https://github.com/researchanon/ota" target="_blank">Open Transactional Actions</a> allow this:</span></div><div><br /></div><div><span style="font-family: courier;">newtype OT a</span></div><div><span style="font-family: courier;"><br /></span></div><div><span style="font-family: courier;">liftIO :: IO a -> OT a</span></div><div><span style="font-family: courier;">onCommit :: IO () -> OT ()</span></div><div><span style="font-family: courier;">onAbort :: IO () -> OT ()</span></div><div><span style="font-family: courier;">abort :: OT a</span></div><div><span style="font-family: courier;">runOT :: OT a -> STM a</span></div><div>```</div><div><br /></div><div><span style="font-family: arial;">Once you have described what it means to commit and abort you can mix IO and STM resources.</span></div><div><span style="font-family: arial;"><u>Examples</u>:</span></div><div><ul style="text-align: left;"><li><span style="font-family: arial;">file access with file locks</span></li><li><span style="font-family: arial;">unique id generator</span></li><li><span style="font-family: arial;">concurrent hash set</span></li><li><span style="font-family: arial;">concurrent linked list</span></li></ul></div></div><div><span style="font-family: arial; font-size: x-large;">Haskell / OCaml symposium Day 2</span></div><div><div style="font-family: arial;"><br /></div><div style="font-family: arial;"><span style="font-size: large;">Keynote: Industrial strength laziness</span></div><div style="font-family: arial;"><br /></div><div style="font-family: arial;">Can we build and publish about tools?</div><div style="font-family: arial;"><br /></div><div style="font-family: arial;">Publishing challenges:</div><div style="font-family: arial;"><ul style="text-align: left;"><li>expert users are expensive</li><li>inexperienced programmers are only one target group</li><li>how to do user studies well</li><li>what we can measure might not be the most relevant to practice. "my refactoring does not introduce bugs" is good but not the only thing</li><li>maintenance of tools gives no tenure</li><li>maintenance takes a lot of time (GHC churn etc...)</li></ul></div><div style="font-family: arial;">Some dreams</div><div style="font-family: arial;"><br /></div><div style="font-family: arial;">Dream 1: context aware editor for actions. For example Scala and .</div><div style="font-family: arial;">Dream 2: incremental feedback</div><div style="font-family: arial;">Dream 3: Haskell debuggers</div><div style="font-family: arial;">Dream 4: Add typeclass laws to typeclass definitions -> this gets us automated refactorings! and we can generate tests</div><div style="font-family: arial;">Dream 5: Calculate programs (cf Idris with specification / case splitting), better integrate Liquid Haskell</div><div style="font-family: arial;">Dream 6: Extensible documentation (inspired by scribble) to add diagrams for example. Better integration of tutorials</div><div style="font-family: arial;">Dream 7: Static semantics for software evolution -> checking API compatibility</div><div style="font-family: arial;">Dream 8: How should big teams build big applications with Haskell?</div><div style="font-family: arial;"><br /></div><div style="font-family: arial;">Make an impact</div><div style="font-family: arial;"><br /></div><div style="font-family: arial;">Pay attention to what other communities value. Example design patterns, we can still some in Haskell As a high-pain tolerant community we need to attract low-pain tolerant people</div><div style="font-family: arial;"><br /></div><div style="font-family: arial;">The Haskell Foundation</div><div style="font-family: arial;"><ul style="text-align: left;"><li>funding CI for GHC</li><li>the Haskell Error Index. This made to support other tools, like Cabal, Stack, HLS, etc...</li><li>security advisories: as a data source for cabal, dependabot. To ease ISO 27001 certification</li><li>the haskell interlude podcast</li><li>the haskell optimisation Handbook (organised by Jeffrey Young - from IOG)</li><li>support the community process and audit of the "lottery factor" (better name than the "bus factor" IMO)</li><li>technical Working group, stability working group</li></ul></div><div style="font-family: arial;"><u>Questions</u>:</div><div style="font-family: arial;"><br /></div><div style="font-family: arial;">How to help with accessibility of the language?</div><div><ul style="text-align: left;"><li style="font-family: arial;">filter out the "good" libraries</li><li><span style="font-family: arial;">have a culture of using </span><span style="font-family: courier;">GHC2021</span><span style="font-family: arial;"> as the default extension</span></li><li style="font-family: arial;">promote only 2 books on the front page</li></ul></div><div style="font-family: arial;"><br /></div><div style="font-family: arial;">Suggestion: large scale refactoring tools, have hints of performance optimisations</div><div style="font-family: arial;">Suggestion: talk to M. Snoyman to pick up the case studies they did</div><div style="font-family: arial;">How to help people get on board with developing on GHC, HLS etc...?</div><div style="font-family: arial;">SPJ: how can we push existing programs to use LiquidHaskell?</div><div style="font-family: arial;"><br /></div><div style="font-family: arial;"><span style="font-size: large;">Efficient “out of heap” pointers for multicore OCaml</span></div><div style="font-family: arial;"><br /></div><div style="font-family: arial;">A page table classifies if a pointer is inside the major heap. People starting storing pointers to outside the OCaml heap. In OCaml 4.14: speed-up GC with prefetching.</div><div style="font-family: arial;">This works shows that a page table can speed-up marking even with concurrency and have other uses: huge pages, etc...</div><div style="font-family: arial;"><br /></div><div style="font-family: arial;"><span style="font-size: large;">Memo: an incremental computation library that powers Dune</span></div><div style="font-family: arial;"><br /></div><div style="font-family: arial;">Memo supports Internal incremental building in dune (OCaml build tool) with</div><div style="font-family: arial;"><ul style="text-align: left;"><li>parallelism</li><li>memoization: we don't want to re-read a file many times</li><li>let* is a bind in OCaml</li><li>errors are de-duplicated (it kind of memoize / de-duplicate errors and keeps stack traces)</li><li>it is incremental and if errors are fixed they disappear</li></ul></div><div style="font-family: arial;">Dune is not yet used at Jane Street because 30 millions of lines of code and Dune is not yet fast enough (the graph is too big).</div><div style="font-family: arial;"><br /></div><div style="font-family: arial;"><u>Problem</u>: what about cycles / deadlocks in the build graph? Not trivial to detect deadlocks with concurrency</div><div style="font-family: arial;"><u>Solution</u>: use incremental graph detection. The naive solution doesn't scale. Bender 2015 provides a solution and a working library (even proved in Coq). It is in O(m (square-root m)) which is not yet optimal for Jane Street.</div><div style="font-family: arial;"><br /></div><div style="font-family: arial;"><u>Idea</u>: skip the reading edges (still took 35% of the build time)</div><div style="font-family: arial;"><u>Idea</u>: check only paths that lead to blocking edges (now execution time is negligeable - but no proof of correctness yet)</div><div style="font-family: arial;"><br /></div><div style="font-family: arial;">Further work: graph flattening, bottom-up/top-down traversal (start from the leaves), storing memoization tables on disk, generalize to other concurrency monads.</div><div style="font-family: arial;"><br /></div><div style="font-family: arial;"><span style="font-size: large;">Stack allocation for OCaml</span></div><div style="font-family: arial;"><br /></div><div style="font-family: arial;">Short lived allocations are cheap but not free:</div><div style="font-family: arial;"><ul style="text-align: left;"><li>space is not reused quickly</li><li>poor L1 cache usage</li><li>GC advances towards the next release</li></ul></div><div style="font-family: arial;">We could write all our code in the same function, but how can we safely use several functions? </div><div style="font-family: arial;"><br /></div><div style="font-family: arial;">Rust with lifetime annotations allows this but is syntactically overweight. It can be polymorphic and higher-order functions become higher-rank which means that type inference stops working.</div><div style="font-family: arial;"><br /></div><div style="font-family: arial;">This work uses a different approach with modal types:</div><div><ul style="text-align: left;"><li><span style="font-family: courier;">local</span><span style="font-family: arial;"> or </span><span style="font-family: courier;">global</span><span style="font-family: arial;"> applied to variable bindings</span></li><li><span style="font-family: courier;">local</span><span style="font-family: arial;"> bindings never escape their region</span></li><li><span style="font-family: courier;">global</span><span style="font-family: arial;"> bindings can never refer to stack-allocated values</span></li><li style="font-family: arial;">less expressive than region variables but simpler</li></ul></div><div style="font-family: arial;">The modalities show in the function types</div><div style="font-family: arial;"><br /></div><div><span style="font-family: courier;"> string -> unit vs local_ string -> unit</span></div><div style="font-family: arial;"><br /></div><div style="font-family: arial;">Functions can also return values on the stack. They get allocated on the parent stack frame</div><div style="font-family: arial;"><br /></div><div style="font-family: arial;">Examples</div><div style="font-family: arial;"><br /></div><div><span style="font-family: courier;">val iter : local_ ('a -> unit) -> 'a list -> unit</span></div><div><span style="font-family: courier;"> the closure can not be captured by iter</span></div><div><span style="font-family: courier;"> let count = ref</span></div><div><span style="font-family: courier;"> List.iter ... (use count)</span></div><div style="font-family: arial;"><br /></div><div><span style="font-family: arial;"> in that case the count ref can be local because it won't escape </span><span style="font-family: courier;">iter</span></div><div style="font-family: arial;"><br /></div><div style="font-family: arial;">What about tail-recursion? To make sure it works we need to make sure that we don't grow the stack.</div><div style="font-family: arial;"><br /></div><div><span style="font-family: arial;">What about currying? the first iter is not the same as </span><span style="font-family: courier;">val iter : local_ ('a -> unit) -> ('a list -> unit)</span></div><div><span style="font-family: arial;">it more like </span><span style="font-family: courier;">val iter : local_ ('a -> unit) -> local_ ('a list -> unit)</span><span style="font-family: arial;"> so eta expansion might be necessary</span></div><div style="font-family: arial;"><br /></div><div><span style="font-family: arial;">What about the </span><span style="font-family: courier;">with_resource</span><span style="font-family: arial;"> pattern?</span></div><div style="font-family: arial;"><br /></div><div><span style="font-family: courier;">val with_file : filename: string -> local_ (local_ filehandle -> 'a) -> 'a</span></div><div style="font-family: arial;"><br />This means that this simple addition to the type system could be useful in other contexts (proving that the file handle is not captured, modifying mutable arrays etc...)</div><div style="font-family: arial;"><br /></div><div style="font-family: arial;"><span style="font-size: large;">Continuous monitoring of runtime events</span></div><div style="font-family: arial;"><br /></div><div style="font-family: arial;">Run health checks in production, do some analysis. New in OCaml 5.0</div><div style="font-family: arial;"><ul style="text-align: left;"><li>most probes in the default runtime</li><li>APIs (OCaml, C)</li><li>controllable with env. variables</li><li>very low overhead</li></ul></div><div style="font-family: arial;">Implemented with:<br /><ul style="text-align: left;"><li>per-domain ring buffers: one producer / many consumers (this means that events will be overwritten and can be missed)</li><li>file backed memory mapped so accessible from outside processes</li></ul></div><div style="font-family: arial;">30ns when enabled, 0.8% in retired instructions</div><div style="font-family: arial;"><br /></div><div style="font-family: arial;">Next:</div><div style="font-family: arial;"><ul style="text-align: left;"><li>new runtime probes</li><li>custom events</li><li>more work in libraries and tooling</li></ul></div><div><div><span style="font-family: arial; font-size: large;">Programming is (should be) fun!</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">Programming is not coding</span></div><div><ul style="text-align: left;"><li><span style="font-family: arial;">there's often no good specification to meet, just a vague understanding</span></li><li><span style="font-family: arial;">joint exploration of achievable specifications, possible implementations</span></li><li><span style="font-family: arial;">it rarely relies on physical parameters for mechanical tolerance etc...</span></li><li><span style="font-family: arial;">we are limited by our ideas and the arising complexity</span></li></ul></div><div><span style="font-family: arial;">This means that b</span><span style="font-family: arial;">ugs are not just programming errors, they are an opportunity to learn:</span></div><div><ul style="text-align: left;"><li><span style="font-family: arial;">they should have names and mitigation strategies</span></li><li><span style="font-family: arial;">it's ok to have bugs. We start with simple assumptions and we explore the space of what's possible</span></li></ul></div><div><span style="font-family: arial;">Gerald shared some of his insights with Maxwell's equations, eval/apply in a Scheme interpreter, electric circuits, classical mechanics or how they (re)discovered automatic forward differentiation with a colleague with the use of differential objects</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">Philosophy is also never far from programming with questions about: referent expressions, identity, mutation, the Ravens paradox, etc...</span></div><div><span style="font-family: arial;"><br /></span></div><div><span style="font-family: arial;">In summary programming is fun because it brings :</span></div><div><ul style="text-align: left;"><li><span style="font-family: arial;">the pleasure of analogies</span></li><li><span style="font-family: arial;">philosophical contemplation</span></li><li><span style="font-family: arial;">the pleasure of debugging, of the hunt</span></li><li><span style="font-family: arial;">the pleasure of discovery of good ideas</span></li><li><span style="font-family: arial;">the pleasure of clarity: make difficult subjects clear</span></li></ul></div><div><span style="font-family: arial;">and it's even better when we share it as fre (libre) software: cf gnu.org/philosophy/free-sw.html</span></div><div style="font-family: arial; font-size: xx-large;"><br /></div></div></div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5336273.post-32028669154858734412019-10-11T23:15:00.000+09:002022-09-17T03:21:46.920+09:00A better "add" operator for HLists<p>At Haskell eXchange 2019, Yves Parès was presenting his “porcupine” library, a library to help scientists run data pipelines using the power of Haskell’s arrows. At some stage, he said, “you know if you’re using a ‘records’ library, like Vinyl, you have to build your HList by appending <code class="prettyprint">RNil</code> at the end”. And I thought: No!</p>
<p>This is a very small thing that has been bugging me for some time. If I want to build a <code class="prettyprint">HList</code>, why do I have to append <code class="prettyprint">HNil</code> at the end? As soon as I’m appending 2 things together to form an <code class="prettyprint">HList</code> the whole type should be determined, isn’t it? Let’s work on a bit of code.</p>
<p>Here is the standard definition of a <code class="prettyprint">HList</code> in Haskell:</p>
<pre><code class="prettyprint">data HList (l::[*]) where
HNil :: HList '[]
HCons :: e -> HList l -> HList (e ': l)
-- example
myHList :: HList [Int, Text]
myHList = HCons 1 (HCons "Hello" HNil)</code></pre>
<p>I can define a <code class="prettyprint">:+</code> operator to make the operation of appending an element a bit nicer:</p>
<pre><code class="prettyprint">infixr 5 +:
(+:) :: a -> HList as -> HList (a : as)
(+:) = HCons
myHList1 :: HList [Int, Text]
myHList1 =
1
+: "Hello"
+: HNil</code></pre>
<p>I can also define an <code class="prettyprint"><+></code> operator to append 2 HLists together:</p>
<pre><code class="prettyprint">-- :++ is a type-level operator (not defined here)
-- for appending 2 lists of types together (see the Appendix)
infixr 4 <+>
(<+>) :: HList as -> HList bs -> HList (as :++ bs)
(<+>) HNil bs = bs
(<+>) (HCons a as) bs = HCons a (as <+> bs)
list1 :: HList [Int, Text]
list1 = 1 +: "Hello" +: HNil
list2 :: HList [Double, Bool]
list2 = 2.0 +: True +: HNil
lists :: HList [Int, Text, Double, Bool]
lists = list1 <+> list2</code></pre>
<p>All good so far, that’s a reasonable API. However we still need to specify <code class="prettyprint">HNil</code> every time we create a new <code class="prettyprint">HList</code>. Can we avoid it?</p>
<h2 id="a-more-polymorphic-operator">A more polymorphic operator</h2>
<p>In order to avoid using <code class="prettyprint">HNil</code> we need to have an operator, let’s call it <code class="prettyprint"><:</code>, to know what to do when:</p>
<ul>
<li>adding one element to another: <code class="prettyprint">a <: b</code></li>
<li>adding one element to a <code class="prettyprint">HList</code>: <code class="prettyprint">a <: bs</code></li>
</ul>
<p/>
<p>But even better we should be able to:</p>
<ul>
<li>append 2 <code class="prettyprint">HList</code> together: <code class="prettyprint">as <: bs</code></li>
<li>append an element at the end of a <code class="prettyprint">HList</code>: <code class="prettyprint">as <: b</code></li>
</ul>
<p/>
<p>We can already see that this operator can not be a straightforward Haskell functions, because the types of its first and second arguments are not always the same. Annoying. Wait, there’s a tool in Haskell to cope with variations in types like that: typeclasses!</p>
<pre><code class="prettyprint">infixr 5 <:
class AddLike a b c | a b -> c where
(<:) :: a -> b -> c
instance {-# OVERLAPPING #-} (asbs ~ (as :++ bs)) =>
AddLike (HList as) (HList bs) (HList asbs) where
(<:) = (<+>)
instance (abs ~ (a : bs)) => AddLike a (HList bs) (HList abs) where
(<:) = (+:)
instance AddLike a b (HList [a, b]) where
(<:) a b = a +: b +: HNil
instance (asb ~ (as :++ '[b])) => AddLike (HList as) b (HList asb) where
as <: b = as <+> (b +: HNil)
</code></pre>
<p>This <code class="prettyprint">AddLike</code> typeclass will deal with all the cases and now we can write:</p>
<pre><code class="prettyprint">a = 1 :: Int
b = "hello" :: Text
c = 2.0 :: Double
d = True :: Bool
ab = a <: b
bc = b <: c
abc = a <: bc
bca = bc <: a
abcd = ad <: cd</code></pre>
<p>That’s it, one operator for all the reasonable cases.</p>
<h2 id="appendix">Appendix</h2>
<p>Here is the full code:</p>
<pre><code class="prettyprint">{-# LANGUAGE DataKinds #-}
{-# LANGUAGE UndecidableInstances #-}
{-# LANGUAGE PolyKinds #-}
{-# OPTIONS_GHC -fno-warn-unticked-promoted-constructors #-}
module AddLikeApi where
import Protolude
data HList (l :: [Type]) where
HNil :: HList '[]
HCons :: e -> HList l -> HList (e ': l)
myHList :: HList [Int, Text]
myHList = HCons 1 (HCons "Hello" HNil)
infixr 5 +:
(+:) :: a -> HList as -> HList (a : as)
(+:) = HCons
myHList' :: HList [Int, Text]
myHList' =
1
+: "Hello"
+: HNil
-- * Appendix
infixr 4 <+>
(<+>) :: HList as -> HList bs -> HList (as :++ bs)
(<+>) HNil bs = bs
(<+>) (HCons a as) bs = HCons a (as <+> bs)
infixr 5 <:
class AddLike a b c | a b -> c where
(<:) :: a -> b -> c
instance {-# OVERLAPPING #-} (asbs ~ (as :++ bs)) =>
AddLike (HList as) (HList bs) (HList asbs) where
(<:) = (<+>)
instance (abs ~ (a : bs)) => AddLike a (HList bs) (HList abs) where
(<:) = (+:)
instance AddLike a b (HList [a, b]) where
(<:) a b = a +: b +: HNil
instance (asb ~ (as :++ '[b])) => AddLike (HList as) b (HList asb) where
as <: b = as <+> (b +: HNil)
type family (:++) (x :: [k]) (y :: [k]) :: [k] where
'[] :++ xs = xs
(x : xs) :++ ys = x : (xs :++ ys)
-- examples
list1 :: HList [Int, Text]
list1 = 1 +: "Hello" +: HNil
list2 :: HList [Double, Bool]
list2 = 2.0 +: True +: HNil
lists :: HList [Int, Text, Double, Bool]
lists = list1 <+> list2
a = 1 :: Int
b = "hello" :: Text
c = 2.0 :: Double
d = True :: Bool
ab :: HList [Int, Text]
ab = a <: b
bc :: HList [Text, Double]
bc = b <: c
cd :: HList [Double, Bool]
cd = c <: d
abc' :: HList [Int, Text, Double]
abc' = ab <: c
abc :: HList [Int, Text, Double]
abc = a <: bc
abcd :: HList [Int, Text, Double, Bool]
abcd = ab <: cd
</code></pre>
<p><br/><br/><br/> Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5336273.post-85928289053898171452019-09-09T11:16:00.000+09:002022-09-17T03:21:54.545+09:00Processing CSV files in Haskell<p>This blog post is the result of a little experiment. I wanted to check how hard it would be to use Haskell to write a small program to help me solve a “real-life” problem. I have always been pretty bad at doing accounting for the family, with a mix of Excel spreadsheets filled with random amounts and dates.</p>
<p>In order to improve our budgeting I decided to give a go at an application called <a href="https://www.youneedabudget.com/">YNAB</a> “You Need A Budget”. Many applications of that nature require you to import your bank transactions in order to really precise about your income and expenses. And now we have an IT problem, right at home. I have, for historical and practical reasons, a bunch of different bank accounts. Of course not one of them exports my transaction data in a format that’s compatible with what YNAB expects.</p>
<p>Is this going to stop a software engineer? No, a software engineer and devoted householder would find the right combination of <code class="prettyprint">awk</code> and <code class="prettyprint">sed</code> to do the job. But I am also a Haskeller and I wonder how difficult it is to solve that task using Haskell. More precisely I want to gauge what amount of Haskell knowledge is required to do this. Since I am not a beginner anymore (yay!) this is a bit biased but I think that it is very important to do our best to remember we were once beginners.</p>
<p>In the following sections I want to explain what I did and give some pointers to help beginners getting started with Haskell and being able to code a similar application:</p>
<ol type="1">
<li>set-up the project</li>
<li>create data types</li>
<li>decode CSV lines</li>
<li>write tests for the decoders</li>
<li>parse and process a full file</li>
<li>parse options from the command line</li>
<li>tie it all together in an application</li>
</ol>
<p>For each section I will recommend some things to start learning first and some others to learn later.</p>
<p>The full code can be found <a href="https://github.com/etorreborre/ledgit-blogpost">here</a>.</p>
<h2 id="setting-up-a-haskell-project">Setting-up a Haskell project</h2>
<p>This is something I didn’t have to do from scratch since I already had the Haskell build tool <a href="https://docs.haskellstack.org/en/stable/README/"><code class="prettyprint">stack</code></a> installed on my machine. From now on I am going to assume that you have installed <code class="prettyprint">stack</code> already. Creating your first Haskell project is not that obvious. You need to learn how to declare a few things according to the “Cabal” format:</p>
<ul>
<li>where to put your sources, your tests?</li>
<li>are you going to produce a library, an executable?</li>
<li>which libraries do you need as dependencies?</li>
</ul>
<p>Fear not, there is a great Haskell command-line tool helping you with all of this: <a href="https://kowainik.github.io/projects/summoner"><code class="prettyprint">summoner</code></a>. Go <code class="prettyprint">stack install summoner</code>. Just follow the prompts and create your first project in no time, with the corresponding Github project and CI configuration. I think this is the best way to get started on some immediate coding. You will have plenty of time later to learn Cabal/Stack/Hpack/nix and become a pro at setting up projects.</p>
<p>Funny enough, this step took me a bit of time. Indeed I am frequently using the <code class="prettyprint">ghci</code> <a href="https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop">REPL</a> (with <code class="prettyprint">stack ghci</code>, we will talk about it later) when programming in Haskell and I have a global set-up for it in <code class="prettyprint">.ghci</code></p>
<pre><code class="prettyprint">:set prompt "λ> "
import Prelude
:def hoogle \str -> return $$ ":!hoogle --count=50 \"" ++ str ++ "\""
:def pointfree \str -> return $$ ":!pointfree \"" ++ str ++ "\""
:def pointful \str -> return $$ ":!pointful \"" ++ str ++ "\""</code></pre>
<p>This configuration file gives me a cute <code class="prettyprint">ghci</code> prompt but it also gives me access to some very useful Haskell tools like <a href="https://hoogle.haskell.org/"><code class="prettyprint">hoogle</code></a> for searching type signatures, right in my REPL. Unfortunately when I started my <code class="prettyprint">ghci</code> session, <code class="prettyprint">stack</code> informed me that it didn’t know about <code class="prettyprint">Prelude</code>. The reason is that the project created by <code class="prettyprint">summoner</code> is created with <a href="https://kowainik.github.io/projects/relude">a custom <code class="prettyprint">prelude</code></a> which removes the standard <code class="prettyprint">Prelude</code> from the search path. Custom preludes are definitely important to know in Haskell but they are also something which is best left for a bit later, when you want to get serious about Haskell development and make sure you are using “safe” functions as much as possible (no <code class="prettyprint">head :: [a] -> a</code> for example). In my case I decided to switch to another custom prelude, <a href="http://www.stephendiehl.com/posts/protolude.html"><code class="prettyprint">Protolude</code></a>.</p>
<p><em>Learn now</em></p>
<ul>
<li>install <a href="https://docs.haskellstack.org/en/stable/README/"><code class="prettyprint">stack</code></a></li>
<li>install <a href="https://kowainik.github.io/projects/summoner"><code class="prettyprint">summoner</code></a></li>
<li>create a project</li>
<li>start <a href="https://tech.fpcomplete.com/haskell/tutorial/stack-play"><code class="prettyprint">stack ghci</code></a></li>
<li>the <a href="https://kowainik.github.io/projects/relude"><code class="prettyprint">Relude</code></a> prelude and its main functions</li>
</ul>
<p><em>Learn later</em></p>
<ul>
<li>the <a href="https://cabal.readthedocs.io/en/latest/cabal-projectindex.html"><code class="prettyprint">cabal</code></a> format</li>
<li>the <a href="https://github.com/sol/hpack"><code class="prettyprint">hpack</code></a> format, as an alternate format</li>
<li>the <code class="prettyprint">cabal</code> or <code class="prettyprint">stack</code> commands for building/testing a project</li>
<li>other custom preludes: <a href="http://www.stephendiehl.com/posts/protolude.html"><code class="prettyprint">protolude</code></a>, <a href="https://github.com/snoyberg/mono-traversable/tree/master/classy-prelude#readme"><code class="prettyprint">classy-prelude</code></a>
<p/></li>
</ul>
<h2 id="create-data-types">Create data types</h2>
<p>This is a real cool part of Haskell, cheap (to create) and powerful data types. For this application we want to have a datatype representing the input data and a data type for the output data. Wait, actually no. We just need a data type modelling what it means to be a transaction for YNAB and ways to:</p>
<ul>
<li>create values of that type from a CSV line (next section)</li>
<li>output a CSV line from values of that type from a CSV line (next section)</li>
</ul>
<p>Each transaction (or line in a ledger) must contain at least a date, an amount, a payee and possibly a category.</p>
<pre><code class="prettyprint">import Data.Text (Text)
import Data.Time (Day)
data LedgerLine = LedgerLine {
date :: Day
, amount :: Amount
, reference :: Reference
, category :: Maybe Category
} deriving (Eq, Show)
data Category = Category Text
deriving (Eq, Show)
data Reference = Reference Text
deriving (Eq, Show)
data Amount = Amount Double
deriving (Eq, Show)</code></pre>
<p>Here we are re-using some standard data types like <code class="prettyprint">Text</code> and <code class="prettyprint">Day</code> but wrapping them with custom data types. This is quite useful because we can’t make the mistake of putting a <code class="prettyprint">Reference</code> into a <code class="prettyprint">Category</code> for example. This also better documents the <code class="prettyprint">LedgerLine</code> fields. The <code class="prettyprint">deriving</code> clauses give us ways to display and compare values out of the box (think <code class="prettyprint">toString</code> and <code class="prettyprint">equals</code> in Java).</p>
<p>My actual datatypes are a bit more complicated:</p>
<pre><code class="prettyprint">data LedgerLine = LedgerLine {
_date :: Maybe Day
, _amount :: Amount
, _reference :: Reference
, _category :: Maybe Category
} deriving (Eq, Show)
newtype Category = Category Text
deriving (Eq, Show, IsString)
deriving newtype FromField
deriving newtype ToField
newtype Reference = Reference Text
deriving (Eq, Show, IsString)
deriving newtype FromField
deriving newtype ToField
newtype Amount = Amount Double
deriving (Eq, Show)
deriving newtype FromField
deriving newtype ToField
deriving newtype Num</code></pre>
<p>First of all some CSV lines might not have a date yet if the transactions have been created today. Then:</p>
<ul>
<li>I use <code class="prettyprint">newtype</code> instead of <code class="prettyprint">data</code> for <code class="prettyprint">Category</code>, <code class="prettyprint">Amount</code>, <code class="prettyprint">Reference</code> to avoid paying the cost at runtime of wrapping a type</li>
<li>an <code class="prettyprint">IsString</code> instance is used for <code class="prettyprint">Category</code> and <code class="prettyprint">Reference</code> to be able to use strings directly in tests (to write <code class="prettyprint">"restaurant"</code> instead of <code class="prettyprint">Category "restaurant"</code>)</li>
<li>I am declaring a <code class="prettyprint">Num</code> instance to be able to <code class="prettyprint">+</code>, <code class="prettyprint">negate</code>,… amounts later as if they were <code class="prettyprint">Double</code>s</li>
<li>the field names are prefixed with <code class="prettyprint">_</code> to avoid potential clashes with variables having similar names <code class="prettyprint">amount</code>, <code class="prettyprint">category</code> etc…</li>
<li>there are some instances for <code class="prettyprint">FromField</code> and <code class="prettyprint">ToField</code> for… see the next section :-)</li>
</ul>
<p><em>Learn now</em></p>
<ul>
<li>how to create data types and the difference between <code class="prettyprint">data</code> and <code class="prettyprint">newtype</code></li>
<li>typeclasses and instances: <code class="prettyprint">Show</code>, <code class="prettyprint">Eq</code>, <code class="prettyprint">Num</code>,…</li>
</ul>
<p><em>Learn later</em></p>
<ul>
<li>record fields, how they appear as top-level functions and how to <a href="http://www.parsonsmatt.org/overcoming-records/#/">avoid</a> <a href="https://gist.github.com/mtesseract/1b69087b0aeeb6ddd7023ff05f7b7e68">clashes</a></li>
<li>different ways of <a href="https://typeclasses.com/ghc/deriving-strategies">deriving instances for data types</a></li>
<li><a href="https://hackage.haskell.org/package/base-4.12.0.0/docs/Data-String.html#t:IsString"><code class="prettyprint">IsString</code></a> for text-like data types
<p/></li>
</ul>
<h2 id="decode-a-csv-line">Decode a CSV line</h2>
<p>This is becoming more involved. We need to find a library knowing how to parse CSV lines. The standard library for CSV files in Haskell is <a href="https://hackage.haskell.org/package/cassava"><code class="prettyprint">cassava</code></a>. Like many other libraries for encoding / decoding data structures it uses type classes:</p>
<ul>
<li><code class="prettyprint">FromField</code> to specify how to parse a value in a CSV column and transform it to a data type value</li>
<li><code class="prettyprint">FromNamedRecord</code> to specify how to parse a full CSV row and how to assemble the parsed values</li>
</ul>
<p>In our case we want parse at least 3 formats, from different banks: <code class="prettyprint">Commerzbank</code>, <code class="prettyprint">N26</code>, <code class="prettyprint">Revolut</code> so we need an auxiliary data type:</p>
<pre><code class="prettyprint">data InputLedgerLine =
CommerzbankLine LedgerLine
| N26Line LedgerLine
| RevolutLine LedgerLine
deriving (Eq, Show)</code></pre>
<p>and we can start defining parsers for each format:</p>
<pre><code class="prettyprint">instance FromNamedRecord InputLedgerLine where
parseNamedRecord r =
parseCommerzBank
<|> parseN26
<|> parseRevolut
where
parseCommerzBank = fmap CommerzbankLine $$
LedgerLine
<$$> (fmap unCommerzbankDay <$$> r .: "Transaction date")
<*> r .: "Amount"
<*> r .: "Booking text"
<*> r .: "Category"
parseN26 = panic "todo N26"
parseRevolut = panic "todo Revolut"
newtype CommerzbankDay = CommerzbankDay { unCommerzbankDay :: Day } deriving (Eq, Show)
instance FromField CommerzbankDay where
parseField f = CommerzbankDay <$$>
parseTimeM True defaultTimeLocale "%d.%m.%Y" (toS f)</code></pre>
<p>This is whole jump in complexity all of a sudden, but also quite some power! Think about it, in a few lines of code we have:</p>
<ul>
<li>specified how to parse rows for the <code class="prettyprint">Commerzbank</code> file format</li>
<li>specified how to parse each field and what are the field names in the CSV file</li>
<li>specified a date format for dates like <code class="prettyprint">26.08.2019</code></li>
<li>specified that other parsers must be tried if the first parser fails (when we are parsing another format)</li>
</ul>
<p>I am not going to unpack everything here but give you some pointers what to learn.</p>
<p><em>Learn now</em></p>
<ul>
<li><a href="https://hackage.haskell.org/package/cassava"><code class="prettyprint">cassava</code></a> for parsing CSV files</li>
<li>what is an<a href="https://hackage.haskell.org/package/base-4.12.0.0/docs/Control-Applicative.html"><code class="prettyprint">Applicative</code></a> how to <a href="https://eli.thegreenplace.net/2017/deciphering-haskells-applicative-and-monadic-parsers/">parse data types with <code class="prettyprint">MyData <$> p1 <*> p2 <*> p3</code></a></li>
<li>what is an <a href="https://hackage.haskell.org/package/monadplus-1.4.2/docs/Control-Applicative-Alternative.html"><code class="prettyprint">Alternative</code></a> to try another parser with <code class="prettyprint"><|></code></li>
<li>the <a href="https://hackage.haskell.org/package/time-1.9.3/docs/Data-Time.html"><code class="prettyprint">Data.Time</code></a> library to parse dates and times</li>
<li>why <a href="https://wiki.haskell.org/Newtype">newtypes</a> are sometimes necessary to tell Haskell to use a specific type instance (this is why I introduced a special <code class="prettyprint">CommerzbankDay</code> here)</li>
</ul>
<p><em>Learn later</em></p>
<ul>
<li>other parsers: <a href="https://hackage.haskell.org/package/attoparsec"><code class="prettyprint">attoparsec</code></a>, <a href="https://hackage.haskell.org/package/megaparsec"><code class="prettyprint">megaparsec</code></a>
<p/></li>
</ul>
<h2 id="decode-a-csv-line-1">Decode a CSV line</h2>
<p>Pretty cool, if you understand how the parsers in the above section work, you should be able to open a GHCi session and try them out (read the doc of the <code class="prettyprint">cassava</code> library for the <code class="prettyprint">decodeByName</code> function).</p>
<pre><code class="prettyprint">λ> import Data.Csv
λ> let commerzbankHeader = "Transaction date,Value date,Transaction type,Booking text,Amount,Category
λ> let line = "30.08.2019,30.08.2019,debit,\"mobilcom-debitel Kd\",-15.00,Home Phone and Internet"
λ>
λ> fmap snd $$ decodeByName @InputLedgerLine $$ commerzbankHeader <> "\n" <> line
Right [CommerzbankLine (LedgerLine {
_date = Just 2019-08-30,
_amount = Amount (-15.0),
_reference = Reference "mobilcom-debitel Kd",
_category = Just (Category "Home Phone and Internet")})]</code></pre>
<p>It works!</p>
<p>Perhaps we still want to make sure this code will still work if we make further modifications, so it is time to… write tests! There are many alternatives for writing tests in Haskell and I have my own preferences :-). I reached for my own library, <code class="prettyprint">registry-hedgehog</code> which is a layer on top of several libraries:</p>
<ul>
<li><code class="prettyprint">hedgehog</code> for writing property-based tests</li>
<li><code class="prettyprint">registry</code> for assembly data generators without using typeclasses</li>
<li><code class="prettyprint">tasty-hedgehog</code> for executing <code class="prettyprint">hedgehog</code> properties as <code class="prettyprint">Tasty</code> tests</li>
<li><code class="prettyprint">tasty-discover</code> to automatically find tests in files and assemble them into a large suite</li>
</ul>
<p>This is totally overblown for that little project since I haven’t written a single property so far. But I know the API well and like it since I made it to my taste :-). What do the tests look like?</p>
<pre><code class="prettyprint">test_parse_commerzbank_with_date = test "we can parse the commerzbank format with a date" $$ do
let line = "30.08.2019,30.08.2019,debit,\"mobilcom-debitel Kd\",-15.00,Home Phone and Internet"
let result = fmap snd $$ decodeByName (toS $$ unlines [header, line])
result === Just (CommerzbankLine $$ LedgerLine {
_reference = "mobilcom-debitel Kd"
, _date = Just (fromGregorian 2019 8 30)
, _amount = Amount (-15.0)
, _category = Just ("Home Phone and Internet")
})</code></pre>
<p>A test is simply a piece of text describing the intention, some action (<code class="prettyprint">decodeByName</code>) and an assertion (with <code class="prettyprint">===</code>). This is very similar to what I tried on the command line earlier.</p>
<p><em>Learn now</em></p>
<ul>
<li><a href="https://hspec.github.io/"><code class="prettyprint">hspec</code></a>: an easy library to start writing unit tests</li>
</ul>
<p><em>Learn later</em></p>
<ul>
<li><a href="https://hackage.haskell.org/package/QuickCheck"><code class="prettyprint">quickcheck</code></a>/<a href="https://hackage.haskell.org/package/hedgehog"><code class="prettyprint">hedgehog</code></a>: for writing property tests</li>
<li><a href="https://hackage.haskell.org/package/tasty"><code class="prettyprint">tasty</code></a>: a test framework dedicated to the structuring and the running of test suites</li>
<li><a href="https://hspec.github.io/hspec-discover.html"><code class="prettyprint">hspec-discover</code></a>/[<code class="prettyprint">tasty-discover</code>]: to avoid having to manually create test suites from tests in test modules</li>
<li><a href="https://github.com/etorreborre/registry-hedgehog"><code class="prettyprint">registry-hedgehog</code></a>: for an alternative to typeclasses when creating data generators
<p/></li>
</ul>
<h2 id="parse-and-process-a-full-file">Parse and process a full file</h2>
<p>Now we are entering serious territory. When we parse files we have to be conscious about:</p>
<ul>
<li>memory usage: it is not advised to read the full content of a file before processing it</li>
<li>resource usage: files must be properly closed after use to avoid leaking resources</li>
</ul>
<p>None of this really counts for my application since the files I am processing are quite small (< 1 Mb) and the application exits right after processing. Anyway I wanted to see if it was as easy to do the “right thing” rather than go for a quick and dirty solution.</p>
<p>There is a beautiful library for streaming data in Haskell, <code class="prettyprint">streaming</code>, which I used before. I am in luck since someone created a <code class="prettyprint">streaming-cassava</code> library to stream rows decoded by <code class="prettyprint">cassava</code>. It provides a function <code class="prettyprint">decodeByName</code> which is the equivalent of <code class="prettyprint">Data.Csv.decodeByName</code> I have used in the tests but it now operates on “streams” of data. A similar function, <code class="prettyprint">encodeByName</code>, also exists to encode values to CSV rows. That’s fine ut we also need to read and write those rows. I am going to decompose the whole processing in 6 parts and explain what are the data types involved in each step:</p>
<ol type="1">
<li>read an input file to get a <code class="prettyprint">ByteString m ()</code> which is a stream of bytes</li>
<li>decode the rows with <code class="prettyprint">decodeByName</code> to get a <code class="prettyprint">Stream (Of InputLedgerLine) m ()</code></li>
<li>deal with decoding errors</li>
<li>process the input ledger lines and transform them to <code class="prettyprint">LedgerLine</code></li>
<li>encode the lines as CSV rows with <code class="prettyprint">encodeByName</code> to get back a stream of bytes <code class="prettyprint">ByteString m ()</code></li>
<li>write those bytes to an output file</li>
</ol>
<h3 id="read-a-file-as-a-stream-of-bytes">Read a file as a Stream of bytes</h3>
<p>Again we are lucky, the <code class="prettyprint">streaming-with</code> library gives us a function, <code class="prettyprint">withBinaryFileContents</code> to read the contents of a file as a stream:</p>
<pre><code class="prettyprint">withBinaryFileContents filePath $$ \(contents :: ByteString m ()) ->
doSomething contents</code></pre>
<p>Not only the contents are being streamed using the <code class="prettyprint">ByteString m ()</code> data structure, but also <code class="prettyprint">withBinaryFileContents</code> is going to make sure the file is closed when the processing (<code class="prettyprint">doSomething</code>) is finished, even if there are exceptions.</p>
<h3 id="decode-the-lines">Decode the lines</h3>
<p>The <code class="prettyprint">Streaming.Cassava.decodeByName</code> function does the job for us. It takes a <code class="prettyprint">ByteString m ()</code> and returns a <code class="prettyprint">Stream (Of InputLedgerLine) m ()</code>, provided we have a <code class="prettyprint">FromNamedRecord</code> typeclass instance for <code class="prettyprint">InputLedgerLine</code>. Now is a good time to talk about those streaming data types: <code class="prettyprint">ByteString</code> and <code class="prettyprint">Stream</code>.</p>
<h4 id="what-is-a-stream-of-data">What is a stream of data?</h4>
<p>Indeed I owe a bit of explanation on the “streaming” types: <code class="prettyprint">ByteString m r</code> and <code class="prettyprint">Stream (Of a) m r</code>. Why so many types parameters to represent streams? I will just explain <code class="prettyprint">Stream</code> here because <code class="prettyprint">ByteString m r</code> is just a specialization of <code class="prettyprint">Stream</code> when we are streaming bytes.</p>
<p><em>NOTE</em>: The <code class="prettyprint">ByteString</code> name in Haskell (found in <code class="prettyprint">Data.ByteString</code> or <code class="prettyprint">Data.Lazy.BytesString</code>) could make you believe that we are dealing with strings and their underlying bytes. It is better to think about it as just a collection of bytes. Same thing for <code class="prettyprint">Data.ByteString.Streaming.ByteString m ()</code> but streaming bytes.</p>
<p>So, what is a <code class="prettyprint">Stream (Of a) m r</code>? If you run <code class="prettyprint">:info Stream</code> in GHCi, you will more or less read (I’m simplifying a bit here) that it is either:</p>
<ul>
<li><code class="prettyprint">Return r</code>: returning a value <code class="prettyprint">r</code>, nothing more to do. If you use the <code class="prettyprint">fmap</code> operation you can “map” this value to something else (so <code class="prettyprint">Stream</code> is a <code class="prettyprint">Functor</code>)</li>
<li><code class="prettyprint">Effect (m (Stream (Of a) m r))</code>: creating a stream with the effect <code class="prettyprint">m</code>. For example <code class="prettyprint">m = IO</code> when we read from a file</li>
<li><code class="prettyprint">Step (Of a (Stream (Of a) m r))</code>: producing a value <code class="prettyprint">a</code> and another stream of values: “what comes next”. Think about <code class="prettyprint">Of</code> as pair where the first element is strictly evaluated</li>
</ul>
<p>I found it a bit confusing at first because of the various type variables (“do we really need a type for the return value? Yes we do”) but after a while I realized that it was the simplest thing to do to stream values and already super-powerful!</p>
<h3 id="deal-with-decoding-errors">Deal with decoding errors</h3>
<p>I think this part is difficult for beginners. I wrote that <code class="prettyprint">Streaming.Cassava.decodeByName</code> was returning <code class="prettyprint">Stream (Of InputLedgerLine) m ()</code>. No error in sight there. How are the parsing errors signaled then? On the monad <code class="prettyprint">m</code>. The <code class="prettyprint">decodeByName</code> full signature is:</p>
<pre><code class="prettyprint">decodeByName :: (MonadError CsvParseException m, FromNamedRecord a) =>
ByteString m r -> Stream (Of a) m r</code></pre>
<p>Meaning that the monad <code class="prettyprint">m</code> must support errors which are <code class="prettyprint">CsvParseException</code>. For example <code class="prettyprint">m</code> can be <code class="prettyprint">ExceptT CsvParseException n</code> where n is another monad. On one hand this is quite nice because we get back a data type <code class="prettyprint">Stream (Of a) m r</code> where we don’t have to think too much about errors, it is mostly a stream of parsed values. It is easier to work with than <code class="prettyprint">Stream (Of (Either CsvParseException a)) m r</code> for example. On the other hand the constraint on <code class="prettyprint">m</code> is going to propagate to the rest of the application and things can become awkward for example if another part of the application is requiring <code class="prettyprint">MonadError OtherException m</code>. Then the compilation errors can become confusing and it is not immediately obvious how the error types can be aligned. In this application we nip the problem in the bud by doing to following:</p>
<ul>
<li>catch the error as soon as possible</li>
<li>rethrow it as an exception in <code class="prettyprint">IO</code></li>
</ul>
<pre><code class="prettyprint">rethrow :: (Exception e, MonadIO m) => ExceptT e m a -> m a
rethrow ma = do
r <- runExceptT ma
case r of
Left e -> throwIO e
Right a -> pure a</code></pre>
<p><code class="prettyprint">rethrow</code> assumes that we are working with values <code class="prettyprint">a</code> in a monad which is <code class="prettyprint">ExceptT e m</code>. It catches the errors of type <code class="prettyprint">e</code> and, assuming that <code class="prettyprint">m</code> is capable of doing <code class="prettyprint">IO</code> it is going to <code class="prettyprint">throwIO</code> the errors. What we do here is essentially transforming a constraint <code class="prettyprint">MonadError CsvParseException m</code> into <code class="prettyprint">MonadIO m</code>. We lose a bit in terms of abstraction, <code class="prettyprint">m</code> is less general than it could be. But we gain in terms of inter-operability with other parts of the application.</p>
<p>Well, that is, if we can even apply <code class="prettyprint">rethrow</code> on our stream! What we need is a function <code class="prettyprint">Stream (Of a) m r -> Stream (Of a) n r</code> where <code class="prettyprint">m</code> is <code class="prettyprint">ExceptT CsvParseException n</code>. This function exists in much more general cases than <code class="prettyprint">Stream</code>. It is called <code class="prettyprint">hoist</code>. This function works on data types of the form <code class="prettyprint">t m a</code> (<code class="prettyprint">t = Stream</code> here) and is defined in the <code class="prettyprint">mmorph</code> library. This is probably the most complicated transformation of this whole project. However situations with nested “monads/containers” (<code class="prettyprint">t</code> and <code class="prettyprint">m</code>) appear quite frequently in Haskell so after a while you will reach for <code class="prettyprint">hoist</code> quite naturally.</p>
<h4 id="what-if-i-hadnt-done-any-of-this">What if I hadn’t done any of this?</h4>
<p>The <code class="prettyprint">MonadError CsvParseException m</code> constraint would have “bubbled-up” to the top-level, up to the <code class="prettyprint">main</code> function where Haskell would have asked me to do something like <code class="prettyprint">runExceptT</code> to make sure I dealt with parsing errors.</p>
<h3 id="process-values">Process values</h3>
<p>The values we read are of type <code class="prettyprint">InputLedgerLine</code> but we want to a single format <code class="prettyprint">LedgerLine</code>. We are not that far since each parser is already normalizing the input values to a <code class="prettyprint">LedgerLine</code>. We only need to extract that line from each <code class="prettyprint">InputLedgerLine</code> case:</p>
<pre><code class="prettyprint">toLedgerLine :: InputLedgerLine -> LedgerLine
toLedgerLine (CommerzbankLine l) = l
toLedgerLine (N26Line l) = l
toLedgerLine (RevolutLine l) = l</code></pre>
<p>Now, how can we use <code class="prettyprint">toLedgerLine</code> to convert the lines in a <code class="prettyprint">Stream (Of InputLedgerLine) m ()</code> to get <code class="prettyprint">Stream (Of LedgerLine) m ()</code>? By using the <code class="prettyprint">map</code> function in <code class="prettyprint">Streaming.Prelude</code>:</p>
<pre><code class="prettyprint">import Streaming.Prelude as SP
let decoded = decodeByName contents :: Stream (Of InputLedgerLine) m ()
let processed = SP.map toLedgerLine decoded :: Stream (Of LedgerLine) m ()</code></pre>
<p>I really encourage you to read the documentation on <code class="prettyprint">Streaming.Prelude</code> because you will find there most of the operations you generally use on lists but this time on streams.</p>
<h3 id="encode-the-lines-as-csv">Encode the lines as CSV</h3>
<p>Again <code class="prettyprint">streaming-cassava</code> helps us here. <code class="prettyprint">encodeByName</code> encodes our values, <code class="prettyprint">Stream (Of LedgerLine) m ()</code> to a <code class="prettyprint">ByteString m ()</code>, provided we have a <code class="prettyprint">ToNamedRecord</code> instance:</p>
<pre><code class="prettyprint">instance ToNamedRecord LedgerLine where
toNamedRecord (LedgerLine date amount reference category) =
namedRecord [
"Date" .= date
, "Amount" .= amount
, "Payee" .= reference
, "Memo" .= category
]</code></pre>
<p>Since all our fields have <code class="prettyprint">ToField</code> instances which are derived automatically because they are newtypes of well-known types like <code class="prettyprint">Text</code> and <code class="prettyprint">Double</code>, we just have to specify the name of the fields in the output file, so that <code class="prettyprint">cassava</code> knows in which column to put the values.</p>
<h3 id="write-a-stream-to-a-file">Write a <code class="prettyprint">Stream</code> to a file</h3>
<p><code class="prettyprint">streaming-with</code> gives us <code class="prettyprint">writeBinaryFile</code> which takes a <code class="prettyprint">ByteString m ()</code> and writes to an output file, again making sure that resources are properly cleaned-up even if there is an exception in the meantime.</p>
<p>To sum-up all those transformations in a block of code:</p>
<pre><code class="prettyprint">processAll =
withBinaryFileContents inputFilePath $$ \contents -> do
let decoded = decodeByName contents
let processed = Streaming.map toLedgerLine $$ hoist rethrow decoded
let encoded = encodeByName ynabHeader processed
writeBinaryFile outputFilePath encoded</code></pre>
<p>Thanks to all those libraries we have a nice isolation of responsibilities, and guarantees about memory and file handle usage!</p>
<p><em>Learn now</em></p>
<ul>
<li>all the type classes and functions of <a href="https://hackage.haskell.org/package/cassava"><code class="prettyprint">cassava</code></a></li>
<li>the <a href="https://hackage.haskell.org/package/streaming"><code class="prettyprint">streaming</code></a>, <a href="https://hackage.haskell.org/package/streaming-bytestring"><code class="prettyprint">streaming-bytestring</code></a>, <a href="https://hackage.haskell.org/package/streaming-with"><code class="prettyprint">streaming-with</code></a> libraries to stream data</li>
<li>the <a href="https://github.com/kqr/gists/blob/master/articles/gentle-introduction-monad-transformers.md">rudiments</a> of <a href="https://www.schoolofhaskell.com/user/commercial/content/monad-transformers">monad transformers</a> like <code class="prettyprint">ExceptT</code></li>
</ul>
<p><em>Learn later</em></p>
<ul>
<li>other streaming libraries: there are many great alternatives (but slightly more complex) to <code class="prettyprint">streaming</code>: <a href="https://hackage.haskell.org/package/conduit"><code class="prettyprint">conduit</code></a>, <a href="https://hackage.haskell.org/package/pipes"><code class="prettyprint">pipes</code></a>, <a href="https://hackage.haskell.org/package/streamly"><code class="prettyprint">streamly</code></a></li>
<li>the <a href="https://hackage.haskell.org/package/resourcet"><code class="prettyprint">ResourceT</code></a> <a href="https://www.fpcomplete.com/blog/2017/06/understanding-resourcet">monad transformer</a> to deal with resources</li>
<li>the <a href="https://hackage.haskell.org/package/managed"><code class="prettyprint">Managed</code></a> monad to simplify situation where the <code class="prettyprint">withXXX</code> pattern is nesting too many calls</li>
<li>different ways of <a href="https://wiki.haskell.org/Error_vs._Exception">dealing with errors in Haskell</a>
<p/></li>
</ul>
<h2 id="parse-options-from-the-command-line">Parse options from the command line</h2>
<p>At the minimum we need to be able to read the name of the input file. This can be done with <code class="prettyprint">System.getArgs :: IO[String]</code> and would be sufficient for this application. However you are going to need more elaborate parsing of command line options for a non-trivial CLI application. I have used a very well-known library for this: <a href="https://hackage.haskell.org/package/optparse-applicative"><code class="prettyprint">optparse-applicative</code></a>.</p>
<p>With this library, we define a data type for the data we want to read from the command line:</p>
<pre><code class="prettyprint">data CliOptions = CliOptions {
inputFile :: Text
, outputFile :: Maybe Text
} deriving (Eq, Show)</code></pre>
<p>The output file is left optional, since we can provide either a hard-coded name for the ouput file <code class="prettyprint">result.csv</code> or append a piece of text to the input file name. The parser for <code class="prettyprint">CliOptions</code> looks like this:</p>
<pre><code class="prettyprint">cliOptionsParser :: Parser CliOptions
cliOptionsParser = CliOptions
<$$> strArgument
( metavar "INPUT FILE"
<> help "Input CSV file" )
<*> option auto
( long "output-file"
<> short 'o'
<> value Nothing
<> help "Output CSV file" )</code></pre>
<p>This style of parser definition is very similar to the one we used for <code class="prettyprint">FromNamedRecord</code> to parse CSV fields. It relies on the notion of an <code class="prettyprint">Applicative</code> (hence the library name) and on a series of helper functions to specify the options:</p>
<ul>
<li><code class="prettyprint">strArgument</code> parses a string given as an argument (so it is not optional)</li>
<li><code class="prettyprint">option</code> parses an option (starting with <code class="prettyprint">--</code> on the command line) and the exact type of parser is <code class="prettyprint">auto</code> meaning that it will parse everything with a <code class="prettyprint">Read</code> instance</li>
</ul>
<p>You can also see some additional information, like the option name (long and short) for the output file. This information is used both for parsing and for documenting the command line options. Talking about documenting, how do we provide a <code class="prettyprint">--help</code> option? <code class="prettyprint">optparse-applicative</code> gives a way to “wrap” a Parser with more information</p>
<pre><code class="prettyprint">defineCliOptions :: ParserInfo CliOptions
defineCliOptions =
info (cliOptionsParser <**> helper) $$
header "ledgit - massage ledger files" <>
progDesc "Transform a CSV ledger file into a suitable YNAB file"</code></pre>
<p>In <code class="prettyprint">defineCliOptions</code> we enrich the <code class="prettyprint">CliOptions</code> with a <code class="prettyprint">helper</code> option and provide additional information to our parser with “modifiers”:</p>
<ul>
<li><code class="prettyprint">progDesc</code> adds a text description of the program displayed under the “Usage” section showing a summary of the options</li>
<li><code class="prettyprint">header</code> adds an additional header when we display the help</li>
<li>those 2 modifiers are being “appended” into one with <code class="prettyprint"><></code> (yes they form a <code class="prettyprint">Monoid</code>)</li>
</ul>
<p>While the whole library is quite powerful, there is quite a lot to explain if you really want to understand how it works: parsers, <code class="prettyprint">Applicative</code>, <code class="prettyprint">Monoid</code>, <code class="prettyprint">Read</code>,… Yet I more or less took the examples from the documentation, changed a few things and it worked immediately.</p>
<p><em>Learn now</em></p>
<ul>
<li>the <a href="https://hackage.haskell.org/package/optparse-applicative"><code class="prettyprint">optparse-applicative</code></a> library</li>
</ul>
<p><em>Learn later</em></p>
<ul>
<li>other options libraries: <a href="https://hackage.haskell.org/package/options"><code class="prettyprint">options</code></a>, <a href="https://hackage.haskell.org/package/cmdargs"><code class="prettyprint">cmdargs</code></a>
<p/></li>
</ul>
<h2 id="tie-it-all-together">Tie it all together</h2>
<p>In reality you could put all the code in one Haskell file (you could even create a <a href="https://tech.fpcomplete.com/haskell/tutorial/stack-script">stack script</a>) and you would be done. For fun I decided to create small components to isolate the different pieces of the application, using “records-of-functions”:</p>
<ul>
<li><code class="prettyprint">Data.hs</code> contains all the data types + the CSV encoders/decoders</li>
<li><code class="prettyprint">Importer.hs</code> contains the <code class="prettyprint">Importer</code> component tasked with reading the file and decoding it</li>
<li><code class="prettyprint">Exporter.hs</code> contains the <code class="prettyprint">Exporter</code> component which takes a stream of lines and outputs it to a file</li>
<li><code class="prettyprint">App.hs</code> just connects the 2</li>
<li><code class="prettyprint">Ledgit.hs</code> calls the options parser and create the <code class="prettyprint">App</code></li>
</ul>
<h4 id="the-importer">The <code class="prettyprint">Importer</code></h4>
<p>Let’s have a closer look at those components. The <code class="prettyprint">Importer</code> is defined as:</p>
<pre><code class="prettyprint">data Importer m = Importer {
importCsv :: (Stream (Of LedgerLine) m () -> m ()) -> m ()
}</code></pre>
<p>It is kind of weird. Instead of just exposing an interface like <code class="prettyprint">importCsv :: Stream (Of LedgerLine) m ()</code> returning the decoded lines, it takes a “consumer” of <code class="prettyprint">Stream (Of LedgerLine) m ()</code> and executes it. This is because of a limitation of the <code class="prettyprint">Streaming</code> library and the libraries we have been using with so far.</p>
<p>The <code class="prettyprint">Streaming</code> library does not support any resources management. The resource management (properly closing file handles) is done with <code class="prettyprint">withBinaryFileContents</code> which take a function consuming the file contents. If we want to use that library and define a component we need to propagate the same pattern.</p>
<p>There is actually quite a profound principle at play here. In programming, some “things” can be either defined by how they are produced or how they are consumed. For example you can define the <code class="prettyprint">Maybe</code> datatype by either</p>
<pre><code class="prettyprint">data Maybe a = Just a | Nothing</code></pre>
<p>or</p>
<pre><code class="prettyprint">data Maybe a = forall b . ((a -> b), b)</code></pre>
<p>In the second case you specify how to “consume” values that are <code class="prettyprint">Just a</code> or values that are <code class="prettyprint">Nothing</code>.</p>
<p>If you squint a bit you will also recognize a “continuation-like” type in <code class="prettyprint">importCsv :: (a -> r) -> r</code>. The computer science literature is full of such transformations, from “direct style” to “continuation-passing style”. This is a lot of hand-waving, just to justify the weird shape of the <code class="prettyprint">Importer</code> interface :-).</p>
<p>Otherwise you will notice that the <code class="prettyprint">Importer</code> does not mention its “configuration”, there is no <code class="prettyprint">inputFilePath</code> to read from in its interface. This is because this data will be provided by the wiring we do in <code class="prettyprint">Ledgit.hs</code>.</p>
<h4 id="the-exporter">The <code class="prettyprint">Exporter</code></h4>
<p>Nothing special here, we take a stream of lines and export each of them to a file. Underneath the implementation is using the functions we have seen before: <code class="prettyprint">writeBinaryFile</code>, <code class="prettyprint">encodeByName</code>.</p>
<pre><code class="prettyprint">data Exporter m = Exporter {
exportCsv :: Stream (Of LedgerLine) m () -> m ()
}</code></pre>
<h4 id="the-app">The <code class="prettyprint">App</code></h4>
<p>The <code class="prettyprint">App</code> just connects the 2 main components, its implementation is super-simple</p>
<pre><code class="prettyprint">data App m = App {
runApp :: m ()
}
newApp :: Importer m -> Exporter m -> App m
newApp Importer {..} Exporter {..} = App {..} where
runApp = importCsv exportCsv</code></pre>
<h4 id="the-wiring">The “wiring”</h4>
<p>Now we need a way to make an <code class="prettyprint">App</code> with its <code class="prettyprint">Importer</code>, its <code class="prettyprint">Exporter</code> and the <code class="prettyprint">CliOptions</code> parsed from the command-line. For this we use the <code class="prettyprint">registry</code> library and define a <code class="prettyprint">registry</code> like so:</p>
<pre><code class="prettyprint">newRegistry :: CliOptions -> Registry _ _
newRegistry cliOptions =
fun (newImporter @IO)
<: fun (newExporter @IO)
<: fun (newApp @IO)
<: val cliOptions</code></pre>
<p>We put all the values and components constructors into a <code class="prettyprint">Registry</code> and later ask for an all-wired application:</p>
<pre><code class="prettyprint">runApplication :: IO ()
runApplication = do
cliOptions <- execParser defineCliOptions
let registry = newRegistry cliOptions
let app = make @(App IO) registry
runApp app</code></pre>
<p>That’s it, <code class="prettyprint">registry</code> automatically calls all the constructor functions and wires the <code class="prettyprint">App</code>. You can also write this code by hand, there’s no real need to use registry for such a simple application.</p>
<p><em>Learn now</em></p>
<ul>
<li>learn about the <a href="https://jaspervdj.be/posts/2018-03-08-handle-pattern.html"><code class="prettyprint">Handler</code> pattern</a></li>
</ul>
<p><em>Learn later</em></p>
<p>There are many other ways to define and wire Haskell applications:</p>
<ul>
<li><a href="https://www.reddit.com/r/haskell/comments/9crtdd/questions_about_mtlstyle_structuring_applications/">MTL and monad transformers</a></li>
<li>effect libraries: <a href="https://hackage.haskell.org/package/polysemy"><code class="prettyprint">polysemy</code></a>, <a href="https://hackage.haskell.org/package/fused-effects"><code class="prettyprint">fused-effects</code></a></li>
<li>The <a href="https://www.fpcomplete.com/blog/2017/06/readert-design-pattern">ReaderT pattern</a></li>
<li><a href="https://github.com/etorreborre/registry"><code class="prettyprint">registry</code></a>
<p/></li>
</ul>
<h2 id="summary">Summary</h2>
<p>This blog post presents a simple Haskell application which can be seen as the “template” for many CLI applications. We have</p>
<ul>
<li>command-line options parsing</li>
<li>“business” data types</li>
<li>files input / output</li>
<li>streaming</li>
<li>encoding / decoding</li>
</ul>
<p>There is nonetheless a learning curve which we should not under-estimate, we need to:</p>
<ul>
<li>know how to set-up a new project</li>
<li>know how to compile, run tests, install the application</li>
<li>know how to find relevant libraries in the Haskell ecosystem</li>
<li>learn about <code class="prettyprint">data</code> and <code class="prettyprint">newtype</code></li>
<li>learn about type classes and instances</li>
<li>be comfortable with the <code class="prettyprint">Applicative</code> typeclass and combinators</li>
<li>understand a minimum of monad transformers</li>
</ul>
<p>I hope this blog post will contribute to making this learning curve less steep by giving pointers on things to start learning right, then other things to read / practice later.</p>
<h2 id="concluding-thoughts">Concluding thoughts</h2>
<p>It occurred to me that being computer literate will be an important part of the “citizen-toolkit” in the future. There is no reason why we should not be able to access all of our data in the future through well-crafted APIs. When this happens, I hope someone will use Haskell and write a similar blogpost about REST access (or whatever API standard), blockchain auditing, security libraries etc… <br/><br/><br/>Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-5336273.post-86639167190405443052019-08-24T02:54:00.001+09:002022-09-17T03:22:00.178+09:00ICFP 2019<p>ICFP 2019 in Berlin is now over. It was a fantastic week packed with great content. I also particularly enjoyed the “Hallway track” this year and the numerous discussions I had around effects, Haskell, Functional Programming, modularity.</p>
<p>You will find below my notes for some of the talks I went to, they are most likely to be useful to me alone but don’t hesitate to ask me for more details or to head to the <a href="https://icfp19.sigplan.org/program/program-icfp-2019">conference program</a> to find links to specific papers that have sparked your interest.</p>
<h1 id="pre-conference">Pre-conference</h1>
<h2 id="idris-2">Idris 2</h2>
<p><a href="https://github.com/edwinb/Idris2">Idris 2</a> is the reimplementation of the Idris language using</p>
<ol type="1">
<li>Idris 1 as its implementation language</li>
<li>a new type theory: QTT (Quantified Type Theory)</li>
<li>Chez Scheme to compile the QTT language
<p/></li>
</ol>
<p>Outcomes</p>
<ol type="1">
<li><p>implementing Idris in Idris shows if the language scales to large programs (answer: not entirely, hence Idris 2). Some other “productivity” features have been added, for example “generate” to automatically split cases and find an implementation. And thanks to QTT more implementations can now be found.</p></li>
<li><p>QTT is nice because it allows the safe erasing of types which are just proofs at run-time. Such an “erasable” proof is annotated as <code class="prettyprint">0</code>. Some other types are dependent types which are needed at run-time for pattern matching (annotated as <code class="prettyprint">1</code>). This also a way to introduce linear typing: <code class="prettyprint">lmap : (1 a -> b) -> (1 xs : List a) -> List b</code> (a linear <code class="prettyprint">map</code> function). That’s still limited for now, there is no polymorphism for linearity annotations and no “borrow checker” but Edwin Brady said he would love to work on that (and even safely support imperative programming with types!)</p></li>
<li><p>The compilation time and generated code is now competitive with GHC (except for high-order functions, more work to do there) thanks to the compilation to Chez Scheme (other Schemes are also supported with different trade-offs, like the ease of creating an executable). In the future OCaml could be a good backend as well (in the ML workshop there was a talk on how to FFI Idris with OCaml to access a HTTP library)</p></li>
</ol>
<h2 id="shifted-names">Shifted names</h2>
<p>You remember all the presentations on the lambda calculus talking about renaming and substitution and the difficulty of dealing with free variables to make sure that names don’t clash?</p>
<p>The usual answer is “say we introduce a ‘fresh’ name here”, meaning we need to keep some state around. “Shifted names” are a simple idea: prior to do anything with something named <code class="prettyprint">x</code> put an index on all variables named <code class="prettyprint">x</code> in a term and increment the index (so <code class="prettyprint">x</code> becomes <code class="prettyprint">x0</code> the first time, <code class="prettyprint">x1</code> next and so on). This means that we can substitute variables for values, rename variables, introduce names, and so on without requiring any “freshness condition”. All those operations have been formalized and proven correct in Coq, which then helps when proving programs, like proving the correctness of a CPS transformation (because a proof might just have to establish that a variable named <code class="prettyprint">x</code> is not the same as a variable named <code class="prettyprint">k</code> which fortunately Coq doesn’t have trouble in believing).</p>
<h2 id="deferring-the-details-and-deriving-programs">Deferring the details and deriving programs</h2>
<p>How to program in Agda, do imperative programming (swap 2 variables, loop over a list to sum elements,…) but separate proofs from the program to make the code easier to read?</p>
<p>Liam introduces an Applicative to delay proofs. Those proofs can be provided by some preconditions on programs for example by the loop invariant when summing a list, <code class="prettyprint">i < length list</code> , then you know that it is safe to access an element of the list <code class="prettyprint">list !! i</code> inside the loop</p>
<p>The code is available at <a href="https://github.com/liamoc/dddp" class="uri">https://github.com/liamoc/dddp</a></p>
<h2 id="cubes-cats-effects">Cubes, cats, effects</h2>
<p>It didn’t get much of that talk but it was entertaining nonetheless!</p>
<p>Conor McBride explains that the equality used by type theorists is usually quite boring because it starts from <code class="prettyprint">x = x</code>. Not only this is quite obvious but the real <em>proof</em> that those 2 labels ‘x’ are the same depends on the implementation of your language. Then the equality proofs that you get in proofs for type systems are generally quite weak.</p>
<p>So the main idea here is to develop a notion of types where we know much better how a value of a given type was built. It must have a “path” from a type <code class="prettyprint">T0</code> to a type <code class="prettyprint">T1</code>. Not only that but we track which part of the typing context we used to produce a given type judgment. Conor hopes that this will give us a much tighter way to write proofs because some of the functions we use to write typing judgments will now be accompanied with corresponding proofs showing which type equalities hold. He makes a link with category theory in the sense that it is like putting our hands on a value proving that “the diagram commutes”.</p>
<h2 id="generic-enumerators">Generic Enumerators</h2>
<p>How to enumerate constrained data for testing? We can use type indexed families to constrain data, but how can we enumerate them?</p>
<p>It is also easy to generically enumerate recursive types (think “sum of products”). We can go further by using dependent types and generate indexed families (types indexed by other types) by seeing an indexed family as a function from an index to the description of a data type. A generalized coproduct can be enumerated by providing a selector and a list of enumerators.</p>
<p>This is a dependent pair because the chosen selector will trigger a different data type. The paper show an example of how to generate sized trees (with a statically-known size). You need to supply an enumerator to enumerate all the ways to split the size <code class="prettyprint">l</code> in <code class="prettyprint">n + m</code> and the library does the rest.</p>
<h2 id="augmenting-type-signatures-for-program-synthesis">Augmenting type signatures for program synthesis</h2>
<p>This is a way to add more information to type signatures to get more programs to be synthesized. The application is to be able to program for more back-ends: CPUs, GPUs,… by modelling libraries for those backends and automatically use the best of them in user code.</p>
<p>The author shows how to synthesize code for a C function <code class="prettyprint">void gemv(int m, int n, float *a, float *x, float *y)</code> once a few properties on <code class="prettyprint">m, n, a, x, y</code> are given. Then the program synthesizer uses <code class="prettyprint">code templates</code> like a <code class="prettyprint">for loop</code>, an <code class="prettyprint">if then else</code> to try to find the code fulfilling the properties. And the user can use a small logic language with facts, conjunction, unification, negation,… to drive heuristics for the synthesis.</p>
<p>They got results for rewriting some existing code to better libraries for ML, signal processing, string processing, etc…</p>
<h2 id="freezeml-complete-and-easy-type-inference-for-first-class-polymorphism">FreezeML complete and easy type inference for first-class polymorphism</h2>
<p>Can we have something between ML and System F in terms of polymorphism and inference?</p>
<ul>
<li>ML inference is decidable but doesn’t have first class polymorphism</li>
<li>that’s the opposite for System F</li>
</ul>
<p>For example if you have <code class="prettyprint">singleton :: forall a . a -> [a]</code>, and <code class="prettyprint">id :: forall a -> a</code>, what is the type of <code class="prettyprint">singleton id</code>?</p>
<ul>
<li>in ML: <code class="prettyprint">[a -> a]</code> (<code class="prettyprint">id</code> is instantiated).</li>
<li>in System F: <code class="prettyprint">[forall a . a -> a]</code>
<p/></li>
</ul>
<p>The idea? “freeze” which instantiation you want with 2 operators <code class="prettyprint">@</code> or <code class="prettyprint">$</code> to indicate if you want to keep “forall” polymorphism or not. This brings back type inference.</p>
<h1 id="icfp-2019-day-1">ICFP 2019 Day 1</h1>
<h2 id="blockchains-are-functional">Blockchains are functional</h2>
<p>Manual Chakravarty did a great presentation of Blockchains and how functional programming and formal methods apply to that space. The general approach at IOHK is to go from formal specifications which are verified using Coq (+Adga and Isabelle) to a direct translation to Haskell.</p>
<p>Since the resulting Haskell code is not performant and lacks concrete considerations like networking and persistence, they program the real implementation in Haskell and use property-based testing to test that implementation against the model.</p>
<p>The other important point is that they use a model of transactions which is called the UTxO (Unspent transaction output) model. That model is a more functional approach than the “account model” which can be found in Ethereum (where you just model the state of wallets). Thanks to this model and a restricted functional language used for on-chain code, they can predict more precisely how much fees will have to be paid for the execution of a contract, contrary to Ethereum again where some contracts can not be executed because they haven’t got enough “gas” (gas is a retribution system to incentivise nodes in the system to validate transactions).</p>
<p>Which language are they using for those validations? Plutus Core: this is based on System F-omega with recursion, which is a type system very well studied. That being said they realized that going from the System F papers to a real implementation raised many interesting questions that they cover in some upcoming papers this year. They formalized the language in Agda and have an abstract interpreter for it.</p>
<p>The whole “smart contract” language is Plutus where you write Haskell programs, delimiting what needs to be executed on the chain with TemplateHaskell (enabling the generation of Plutus core code thanks to a GHC plugin).</p>
<p>It is still hard to program in Plutus for regular “application developers” so they have a DSL dedicated to financial contracts called Marlowe. Similar DSLs for other domains are in preparation.</p>
<h2 id="compiling-with-continuations-or-without-whatever.">Compiling with continuations, or without? Whatever.</h2>
<p><a href="https://www.cs.purdue.edu/homes/rompf/papers/cong-preprint201811.pdf" class="uri">https://www.cs.purdue.edu/homes/rompf/papers/cong-preprint201811.pdf</a></p>
<p>Should we use an intermediate language with continuations or not?</p>
<p>The idea of this paper is that it is possible to introduce a “control” operator, which will decide what is translated to “2nd class continuations” when it makes sense. Experiments on subsets of Scala show that more labels and goto are emitted this way and get more stack allocations than before (so less heap allocations).</p>
<p>Upcoming work: adding parametric polymorphism and first class delimited continuations in the user language (by building a type-and-effect system).</p>
<h2 id="lambda-calculus-with-algebraic-simplification-for-reduction-parallelization-by-equational-reasoning">Lambda calculus with algebraic simplification for reduction parallelization by equational reasoning</h2>
<p>How to parallelize the evaluation of complex expressions?</p>
<p>For example sums over an array, folds of a tree? A lambda calculus with “algebraic simplification” can help there. It consists of:</p>
<ul>
<li>the normal lambda calculus + semi-ring operations (addition / multiplication)</li>
<li>a “delta abstraction” where the evaluation of the body can be simplified with the algebraic properties of <code class="prettyprint">+</code> and <code class="prettyprint">*</code> before passing arguments</li>
</ul>
<p>This allows the parallel evaluation of functions and arguments. This even works for more complex expressions containing conditionals:</p>
<pre><code class="prettyprint">-- "breaking" sum
sum (x : xs) acc =
if x < 0 then acc else sum xs (acc + x)</code></pre>
<p>This function, applied to <code class="prettyprint">[5, 3, 1, 7, -6, 2]</code> for example, will be split in two evaluations, one for <code class="prettyprint">[5, 3, 1]</code> and one for <code class="prettyprint">[7, -6, 2]</code>. The first list is then partially evaluated with a continuation so that we try to evaluate <code class="prettyprint">\f -> if ... else if 5 > 0 && 3 > 0 && 1 > 0 then f (5 + 3 + 1)</code>. Then we can simplify <code class="prettyprint">5 + 3 + 1</code> to <code class="prettyprint">9</code>.</p>
<p>The other important part of this work is that there is a type system based on linearity conditions to say when it is “safe” to expand or factor expressions. Interestingly the best expansions or factoring of expressions happens when variables (in polynomials for example) are used linearly.</p>
<h2 id="fairness-in-responsive-parallelism">Fairness in Responsive Parallelism</h2>
<p>For cooperative scheduling of threads a notion of fairness is developed and an algorithm has a provable estimation of the expected execution time of high-priority tasks.</p>
<h2 id="narcissus-correct-by-construction-derivation-of-decoders-and-encoders-from-binary-formats">Narcissus: Correct-By-Construction Derivation of Decoders and Encoders from Binary Formats</h2>
<p>Encoding and decoding binary formats can have lots of bugs, in particular in network stacks. Those bugs can be exploited by attackers. Some issues: formats are context sensitive (parsing the rest depends on a length field). They are also under-specified.</p>
<p>This framework allows the definition of non-deterministic specifications of formats in Coq and then generates encoders and decoders from that specification.</p>
<p>Interesting considerations:</p>
<ul>
<li>an encoder can fail! Indeed encoders can have invariants which must be respected. If you pass a buffer to write in, the buffer might be too small, or the number of elements in a list can be too big.</li>
<li>a decoder might decode more formats than what the encoder can encode</li>
<li>encoders need to handle state to support things like the building of compression tables
<p/></li>
</ul>
<p>Key features of Narcissus:</p>
<ul>
<li>readibility of the format specifications</li>
<li>they are extensible, context-sensitive</li>
<li>there can be multiple choices in encoding (to version the payload for example)</li>
<li>there’s an extension for checksum
<p/></li>
</ul>
<p>This work was validated for many encoders for the TCP/IP stack: Ethernet, ARP, IPv4, TCP, UDP and used in MirageOS (and Coq can extract OCaml code). A new format like UDP is 10-20 lines of format + 10-20 of special data types. The performance loss is around 10% (compared to manual code)</p>
<p>More work to come: certified extensions to extract code in other languages without an OCaml dependency.</p>
<p>Question: how hard would it be to apply this to <code class="prettyprint">protobuf</code>? There is a paper on verifying <code class="prettyprint">protobuf</code> (<a href="https://www.cs.purdue.edu/homes/bendy/Narcissus/protobuf.pdf" class="uri">https://www.cs.purdue.edu/homes/bendy/Narcissus/protobuf.pdf</a>)</p>
<h2 id="closure-conversion-is-safe-for-space">Closure Conversion is Safe for Space</h2>
<p>Formally verified compilers (like CakeML) can not reason about intentional properties: is my generated program safe for space and time?</p>
<p>There are some theorems for memory-managed language already. However they don’t work for closures and there are some notorious examples where V8 javascript closures are leaking memory.</p>
<p>This work: first formal proof that closure conversion is safe for space by adding profiling semantics for source and target.</p>
<p>The main drawback of this approach IMO: the closure environment they use is proved to be safe but it is also not memory-efficient. On the other hand the closure environment chosen in V8 is unsafe (and fully mutable). This is a known issue with V8 and no one knows how to do it better yet.</p>
<h2 id="selective-applicative-functors">Selective Applicative Functors</h2>
<p>I’m a big fan of Andrey Mokhov’s work (for example what he has started with <a href="https://hackage.haskell.org/package/algebraic-graphs">algebraic graphs</a>). Andrey presents “Selective Applicative Functors” which are kind of between:</p>
<ul>
<li>applicative functors where you have independent computations that you can combine</li>
<li>and Monads where you can have dependent, arbitrary computations where the result of one computation creates a new one.
<p/></li>
</ul>
<p>Selective applicative functors are for the case where you want to start independent computation but you can the result of one of them if it finishes earlier and cancel the other. So on one hand the “shape” of computations is known in advanced but you still have “conditional” computations. He dubs this super-power “Speculative execution”. The many applications of selective functors can be found in the <a href="http://delivery.acm.org/10.1145/3350000/3341694/icfp19main-p121-p.pdf">paper</a>.</p>
<h2 id="coherence-of-type-class-resolution">Coherence of Type Class Resolution</h2>
<p>A very well rehearsed presentation showing how to prove coherence of typeclasses with no overlapping instances nor orphan instances. In the presence of super-classes can we be sure that the same function is always picked when it comes from 2 different type classes inheriting from the same parent? Is the resolution algorithm also working in the presence of the <code class="prettyprint">FlexibleContexts</code> extension where it is possible to locally specify what should be one type if a type class has several type parameters?</p>
<p>The answer seems intuitively “yes” but it turns out to be <em>really</em> hard to prove! <a href="https://arxiv.org/pdf/1907.00844.pdf" class="uri">https://arxiv.org/pdf/1907.00844.pdf</a></p>
<h1 id="icfp-2019-day-2">ICFP 2019 Day 2</h1>
<h2 id="keynote-rosette---solver-aided-tools">Keynote: Rosette - solver-aided tools</h2>
<p>Rosette is built on top of Racket and is a platform for implementing DSLs which can be translated to SMT constraints (and solved with Z3 for example). The trick they employ for this is a variant of “bounded model-checking” where they can efficiently explore, using symbolic variables, the possible executions of a program without incurring an exponential number of generated constraints (still, polynomials can be large!).</p>
<p>They have all sort of applications, from languages used for education, OS verification, type systems proofs, and so on. They can show that implementations are correct (they found several bugs in a sub-module of Linux), find counter-examples, do program synthesis to fix programs and so on.</p>
<h2 id="demystifying-differentiable-programming-shiftreset-the-penultimate-backpropagator">Demystifying differentiable programming: shift/reset the penultimate backpropagator</h2>
<p>When doing machine learning and computing gradient descent we need to compute derivatives of functions with multi-variables.</p>
<p>There are 2 ways of doing this: Forward Differentiation which is simple but not very efficient when we compute a large number of variables to get a single result, and Backward Differentiation which is more complicated because more intermediate results have to be kept around (but more performant).</p>
<p>It turns out that the structure of those computations form continuations. The paper shows how to use Scala continuations to write code which can be backward-differentiated automatically using the <code class="prettyprint">shift/reset</code> continuation operators for its implementation.</p>
<h2 id="efficient-differentiable-programming-in-a-functional-array-processing-language">Efficient differentiable programming in a functional array-processing language</h2>
<p>This talk is almost the opposite of the previous one! It takes a Forward Differentiation approach for a functional language (a subset of F# with efficient array operations) and shows that a whole lot of techniques can be used to optimise the generated code (sub-expressions elimination, fusion, loop fission, partial evaluation,…) and be on par with the best imperative libraries.</p>
<h1 id="bobkonf-summer-edition">BobKonf summer edition</h1>
<h2 id="relative-functional-reactive-programming">Relative Functional Reactive Programming</h2>
<p>This is a very promising idea which I see possibly replacing actors with something which is better typed. The idea is to extend functional reactive programming which uses events and behaviours across time to events and behaviours across space <em>and</em> time.</p>
<p>Then messages become “perceptions” which travel across space and time (until they reach a node) and a “dialog” between a client and a server becomes a list of perceptions. Using this modelling + CRDTs the talk shows that we can develop a full peer-to-peer todo application.</p>
<p>A library is currently being implemented on top of the reflex framework in Haskell (to show that it works for real :-)) but I would like to see how this could be integrated with Ivan Perez “Monadic streams”: <a href="http://www.cs.nott.ac.uk/~psxip1/papers/2016-HaskellSymposium-Perez-Barenz-Nilsson-FRPRefactored-short.pdf" class="uri">http://www.cs.nott.ac.uk/~psxip1/papers/2016-HaskellSymposium-Perez-Barenz-Nilsson-FRPRefactored-short.pdf</a>. They are also a few hard questions to solve which are the subject of the thesis of the presenter, for example how do you “garbage-collect” events which are not necessary anymore (the current approach supposes an infinite memory!).</p>
<h2 id="statistical-testing-of-software">Statistical testing of software</h2>
<p>Can we measure software quality really? Two development processes, Cleanroom software engineering and Software reliability engineering give us some answers:</p>
<ul>
<li><a href="https://en.wikipedia.org/wiki/Cleanroom_software_engineering">Clean room</a> aims to totally prevent bugs, not fix them. You start from a formal specification and the team separates between developers refining the spec while the testers write tests just to measure the quality of the product. The developers can not test, not even access a syntax checker!</li>
<li><a href="http://johnmusa.com/">Software reliability engineering</a>: describe and quantify the expected use of your system to spend most of your resources on the cases which really matter.
<p/></li>
</ul>
<p>This gives us some ideas for doing for statistical testing:</p>
<ul>
<li>model the usage of a product</li>
<li>what is really called? How often?</li>
<li>how often calls are related to each other?
<p/></li>
</ul>
<p>From that you can create QuickCheck/Hedgehog generators which can vary their behaviour based on some current state. This means that we are effectively working on Markov Chains (a state diagram containing the probabilities of firing transitions). And there is lot of literature on how to cover Markov Chains, how often we get to a given state and so on.</p>
<p>Then we can measure reliability by counting the number of tests passing before the first failure. Even better we can measure how many times an edge in the Markov chain graph is failing, so that failing paths which are not called very often will not count for much.</p>
<h2 id="liquidate-your-assets">Liquidate your assets</h2>
<p>This is an example of using Liquid Haskell with a “Tick” monad the count the number of times a resource is being accessed. From there we can statically enforce that some algorithms have specific bounds and check some properties such as how long it should take to sort an already sorted list. Nicky Vazou also showed how you can more or less interactively write proofs using Liquid Haskell and a small proof library.</p>
<h2 id="a-functional-reboot-for-deep-learning">A functional reboot for deep learning</h2>
<p>Any talk by Conal Elliot is worth attending (cf “The essence of automatic differentiation”, or “Compiling to categories”). This time Conal wants to reboot Machine Learning by removing all non-essential complexity and generalize things where they actually are un-necessarily specialized.</p>
<p>For now this is more the intent of a program that an accomplished vision but he already has some of the brick thanks to his previous work on automatic differentiation and his Haskell compiler for “compiling to categories”.</p>
<p>Conal argues that modelling neural networks as graph is wrong. This hides the fact that we are effectively working on functions and forces a sequentiality of computations (in “layers”) which doesn’t have to be. We also have much better way to represent the search space we are exploring than forcing everything into multi-dimensional arrays. In particular representable functors offer a much better way to represent our data, while still being isomorphic to arrays which permits efficient computing on GPUs.</p>
<p>He is asking for help to pursue his vision, both from the Haskell community (there is still work to do on his plugin) and from the ML community (to better understand some statistical aspects and ways of thinking of ML problems).</p>
<h1 id="icfp-day-3">ICFP Day 3</h1>
<h2 id="call-by-need-is-clairvoyant-call-by-value">Call by need is clairvoyant call by value</h2>
<p>It is surprising to me that there are still things to say about the lambda calculus. In particular here we address the difficulty of dealing with the operational semantics of call by need.</p>
<p>Because call by need memoizes the function parameters once they are called, in order to do a proper description of their semantics you need to maintain a heap for those values. It makes it hard to predict the runtime behaviour of programs as any Haskell debugging a memory leak would know. It turns out that it gets much easier to have an evaluation strategy making as if you had evaluated only the arguments which <em>will</em> be used later on. This is a form on non-determinism: what if we knew the result in the future? I don’t fully understand the details but it seems that it is like programming in a monad evaluating a tree of possibilities, eventually being able to say which path should be taken. Ultimately this simplifies proving the properties of call by need languages.</p>
<h2 id="lambda-the-ultimate-sublanguage">Lambda: the ultimate sublanguage</h2>
<p>What if you taught functional programming using a much simpler language than Haskell? That language is System F-omega, which is effectively the core of Haskell.</p>
<p>By using only this language Jeremy Yallop is teaching an 8 weeks course, progressively introducing lambda-calculus, polymorphism, modules, existential types,… One of the assigments for example presents the Leibnitz equality: “2 things are the same if you cannot distinguish them” and the Church equality: “2 things are equal if they can always be reduced to the same thing” (I don’t think I got this right) and use the calculus to show that they are the same notion of equality (one gives you the other).</p>
<p>Most participants report than the course was pleasingly challenging and gave them a thorough understanding of the field. You can find the course online here: <a href="https://www.cl.cam.ac.uk/teaching/1617/L28/materials.html" class="uri">https://www.cl.cam.ac.uk/teaching/1617/L28/materials.html</a></p>
<h1 id="workshops">Workshops</h1>
<h2 id="minikanren-tutorial">MiniKanren tutorial</h2>
<p>MiniKanren is a “relational programming” library with only 4 operations. It started in Scheme but has now been ported to 15 languages.</p>
<p>It can be put to many uses, from unification in type systems, to program synthesis (it can find quines!), to better querying pharmaceutical data. This should probably be my first thought if I ever stumble on a problem requiring difficult unification in the wild.</p>
<h2 id="modular-effects-in-haskell-through-effect-polymorphism-and-explicit-dictionary-applications---a-new-approach-and-the-μverifast-verifier-as-a-case-study">Modular effects in Haskell through effect polymorphism and explicit dictionary applications - A new approach and the μVeriFast verifier as a case study</h2>
<p>Dominique Devriese proposed last year an extension to Haskell called <a href="https://icfp18.sigplan.org/details/haskellsymp-2018-papers/13/Coherent-Explicit-Dictionary-Application-for-Haskell"><code class="prettyprint">DictionaryApplications</code></a>. With this extension you can locally pass an explicit dictionary where a type class instance is required. More importantly the paper proves that it is possible to have this feature without compromising coherence and global uniqueness of instances. How? By making “as if” you were creating a “wrapper” newtype on the fly and coercing it back and forth with the data type for which you want an instance.</p>
<p>Ok, that gives you super powers, but in practice how useful is this? In order to show the usefulness of this feature Dominique has taken an existing OCaml application which was actually using lots of effects lumped together: reader, writer, non-determinism, state, etc… and has rewritten it to typeclasses and dictionary applications.</p>
<p>The resulting code is a lot clearer and quite flexible since it allows the local creation and passing of instances which can rely on <code class="prettyprint">IO</code> for their creation. For example you can pass a <code class="prettyprint">State s</code> effect around (to computations needing <code class="prettyprint">MonadState s</code>) which is effectively implemented with a mutable ref. The paper goes further and even abstract on the creation of such “mutable cell” by making it an effect in itself.</p>
<h1 id="haskell-implementors-workshop">Haskell Implementors Workshop</h1>
<h2 id="haskell-use-and-abuse">Haskell use and abuse</h2>
<p>Haskell was used on a secret project at X, the Google incubator. They had 3 months to prove the viability of a system containing both hardware and software (that looked a lot like accelerating machine learning to me).</p>
<p>The result was 800 files, 200.000 lines of code, more than 70 extensions used. What worked, what did not work?</p>
<ul>
<li><p>the good: lens library to index complex data structures, a good data model representing the hardware, an array library with statically-typed shapes, smart people attracted to the project</p></li>
<li><p>the bad: one person full-time to integrate the project to the rest of the infrastructure of Google, too many programming styles, too many DSLs that only a few team members understand, ugly type errors, ugly type signatures (because dependent typing is too weak)</p></li>
</ul>
<p>In passing they showed a trick. How to pass parameters to a pattern synonym in Haskell? Use implicit paramters! (and one more extension,…)</p>
<h2 id="configuration-but-without-cpp">Configuration, but without CPP</h2>
<p>How to deal with <code class="prettyprint">#ifdef</code> in many Haskell libraries?</p>
<p>Problems: it is hard to test, we rarely test all the configurations, it’s text-based (tooling is out), it’s dynamically scoped so hard to write correctly.</p>
<p>A quick analysis shows it is used for:</p>
<ul>
<li>not configurable: ghc version, base version, platform</li>
<li>configurable: package dependencies, user defined predicates
<p/></li>
</ul>
<p>How to replace the configurable part?</p>
<p>Solutions: there are already solutions (100 papers) to this problem! One promising idea: “Variational typing”, describe in a type the different possibilities for an expression:</p>
<pre><code class="prettyprint"> e :: D <Int, Char>
e = D <5, 'a'></code></pre>
<p>For example the syntax for having GHCJS specific code would look like:</p>
<pre><code class="prettyprint"> GHCJS< liftIO $ ..., return Nothing></code></pre>
<p>This type system encodes static choices about a program, a choice applies to the all the program. A few rules can be formalized</p>
<ul>
<li>distribution: <code class="prettyprint">D < A -> B, C -> D> = D < A, C > -> D < B, D ></code></li>
<li>idempotence: <code class="prettyprint">D <A, A> = A</code></li>
<li>domination: <code class="prettyprint">D < D <A, B>, C> = D <A, C></code> (if you choose the left side, always choose the left side)
<p/></li>
</ul>
<p>In terms of performance, typechecking should still be fast (in particular because practically speaking there are few people nesting <code class="prettyprint">#ifdefs</code>).</p>
<p>There are some variants of this idea for module systems, for example: “Variation aware module system” where import and exports can depend on the variation.</p>
<h2 id="hie-files-in-ghc-8.8">HIE files in GHC 8.8</h2>
<p><code class="prettyprint">hiedb</code> indexes all the source files. Gives fast documentation, types, jump to definition (local or not), find usages, etc… (and there’s a Haskell API).</p>
<p>Future work:</p>
<ul>
<li>show typeclass evidence (generate that into hie and make it searchable)</li>
<li>integrate with .hi files to keep only one format</li>
<li>capture more type information in the typed AST (some leaves are not currently typed)
<p/></li>
</ul>
<p>References:</p>
<ul>
<li><a href="http://github/wz1000/HieDb" class="uri">http://github/wz1000/HieDb</a></li>
<li><a href="http://github.com/wz1000/hie-lsp" class="uri">http://github.com/wz1000/hie-lsp</a></li>
<li><a href="https://github.com/wz1000/vscode-hie-server" class="uri">https://github.com/wz1000/vscode-hie-server</a></li>
</ul>
<h2 id="explicit-dictionary-application-from-theory-to-practice">Explicit dictionary application: from theory to practice</h2>
<p>With explicit dictionary application we</p>
<ul>
<li>remove the need to have <code class="prettyprint">*By</code> functions like <code class="prettyprint">nubBy</code></li>
<li>avoid the creation of <code class="prettyprint">And</code>, <code class="prettyprint">Or</code>, … newtypes to get different monoids with <code class="prettyprint">foldMap</code></li>
<li>we can use an <code class="prettyprint">IORef s</code> to implement a <code class="prettyprint">MonadState s</code> instance</li>
<li>we can replace <code class="prettyprint">deriveVia</code>, for example use a function <code class="prettyprint">semigroupFromMonoid</code> to implement a <code class="prettyprint">Semigroup</code> instance</li>
</ul>
<p>Potential problems:</p>
<ul>
<li>global uniqueness of instances, this should not be allowed</li>
</ul>
<pre><code class="prettyprint">insert :: Ord a => a -> Set a -> Set a
insert @{reverseOrd} 1 (insert 1 (insert 2 empty))`
-> [1, 2, 1]</code></pre>
<ul>
<li>coherence: we should get the same parent typeclass for <code class="prettyprint">Eq</code> and <code class="prettyprint">Ord</code></li>
</ul>
<pre><code class="prettyprint">whichEq :: (Eq a, Ord a) => a -> Bool
whichEq x = x == x</code></pre>
<p>The previous work was showing that the use of representational roles allows the type checker to exclude problematic programs. This year a plan for a better implementation is proposed, but that proposal needs some (wo)man-power to be completed.</p>
<p>So the talk is also a big ask for help!</p>
<h2 id="state-of-ghc">State of GHC</h2>
<p>There’s a lot of activity in GHC-lang (in no particular order and missing many more)</p>
<ul>
<li>8.8 (Aug 19)</li>
<li>visible kinds application. <code class="prettyprint">g :: t @Type a -> ...</code></li>
<li>pattern signature to bring some types in scope <code class="prettyprint">Just (x :: a) -> [x, x] :: [a]</code></li>
<li>around 70 new contributors each year (for the past 6 years)</li>
<li>10 PRs per day</li>
<li>1400 new issues per year (not sure if that’s a good sign!)</li>
<li>linear types are coming!</li>
<li>stand alone kind signatures</li>
<li>hole-fit plugin</li>
<li>pattern matching with betteer overlap/coverage checks</li>
<li>more compile time performances</li>
<li>visualisations of GHC core
<p/></li>
</ul>
<p>More things cooking:</p>
<ul>
<li>class morphisms: proposal coming up</li>
<li>GHCJS: plugins, more like normal GHC, profiling</li>
<li>8.10: better codegen</li>
<li><code class="prettyprint">NoFieldSelectors</code> proposal</li>
</ul>
<p/>
<p>GHC proposals:</p>
<ul>
<li>56 proposals accepted, some of them still need to implemented</li>
<li>more accepted proposals than rejected because of the good discussions before proposal</li>
<li>but not always good. For example <code class="prettyprint">ExtraCommas</code>: 166 messages on the thread, unclear outcome, people unhappy
<p/></li>
</ul>
<p>GHC devops:</p>
<ul>
<li>5 releases since last year</li>
<li>everything moved to gitlab</li>
<li>build on Hadrian</li>
<li>8.8.1 this week-end</li>
<li>8.10.1 release next February (small release)</li>
<li>8.12.1 branch in May 20</li>
<li>more platforms tests i.e Alpine Linux, integer-simple is better tested -> releases of core libraries is more of a challenge</li>
<li>Hackage repository to check ~ 150 packagees against a GHC head (<code class="prettyprint">head.hackage</code>)</li>
<li>compiler performance is now tested in CI
<p/></li>
</ul>
<p>If Hadrian goes well we can deprecate <code class="prettyprint">make</code> in 8.12</p>
<p><strong>AND THAT’S IT!</strong></p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5336273.post-6828670569051892642018-09-29T08:32:00.000+09:002022-09-17T03:22:05.862+09:00ICFP 2018<p>This year again ICFP and the Haskell Symposium are full of interesting talks that I want to shortly present. I highlighted a single sentence in each section to try to summarize the main idea so that you can skim the whole thing and stop at what looks interesting to you.</p>
<h2 id="icfp-day-1">ICFP Day 1</h2>
<p/>
<h4 id="capturing-the-future-by-replaying-the-past-functional-pearl"><a href="https://arxiv.org/pdf/1710.10385.pdf">Capturing the Future by Replaying the Past (Functional Pearl)</a></h4>
<p>Delimited continuations are incredibly powerful because they allow us to implement any monadic effect. They also allow some elegant implementations, for example the <a href="https://en.wikipedia.org/wiki/Eight_queens_puzzle">8 queens problem</a> can be implemented using a non-determinism effect. Unfortunately delimited continuations are not supported by every programming language. Or are they? Actually the authors of this paper show that <strong>mutable state + exceptions is all that is required to implement delimited continuations</strong> (except in C, read the paper for the details). There is a performance hit if we compare their implementation with a direct support from languages which support delimited continuations but this is not that bad.</p>
<p>For me this paper was the opportunity, one more time, to try to wrap my head around continuations. For example I realized that the notation with <code class="prettyprint">reset</code> and <code class="prettyprint">shift</code> was a bit confusing. In an expression such as <code class="prettyprint">reset (shift (\k -> k 2))</code> I actually need to focus my attention on the body of the <code class="prettyprint">shift</code> rather than the body of the <code class="prettyprint">reset</code>. And the body of the shift says “do <code class="prettyprint">k 2</code>”. What’s the <code class="prettyprint">k</code>? Well it is whatever is outside of the <code class="prettyprint">shift</code> and inside the <code class="prettyprint">reset</code> so that’s <code class="prettyprint">1 + _</code> and eventually the result is <code class="prettyprint">3</code>. It becomes a bit trickier to interpret expressions like <code class="prettyprint">reset (shift (1 + k 2) * shift (1 + k 3))</code> but still possible. “Do <code class="prettyprint">1 + k 2</code>”. What is <code class="prettyprint">k</code>? It is <code class="prettyprint">x * shift (1 + k 3)</code>. Which by the previous reasoning is <code class="prettyprint">k = 1 + x * 3</code>. Now applied to <code class="prettyprint">1 + k 2</code> gives <code class="prettyprint">1 + 1 + 2 * 3</code> which equals to <code class="prettyprint">8</code>. This also explains the term “thermometer continuations” where the authors store a list of executions created by the various <code class="prettyprint">shifts</code> in a <code class="prettyprint">reset</code> expression.</p>
<h4 id="versatile-event-correlation-with-algebraic-effects"><a href="http://www.informatik.uni-marburg.de/~seba/publications/event-correlation-algebraic-effects.pdf">Versatile Event Correlation with Algebraic Effects</a></h4>
<p>This paper presents the design of language supporting “versatile joins”, that is the many ways we would like to <strong>correlate different sources of (asynchronous) events</strong>: “when you receive a ‘stock price update’ event and ‘user buy event’ for the same quantity and a ‘market is opened event’ then emit a ‘stock buy event’”. Conceptually this means making a cartesian product of all the event sources and restricting the resulting tuples to the ones that are “interesting”. Said another way, this can be interpreted as having a way to enumerate this cartesian product, filter out some tuples, combine them in some way and <code class="prettyprint">yield</code> the result. It turns out that all those actions can be implemented as effects in a language supporting algebraic effects and effect handlers.</p>
<p>For example selecting the last event for a given event stream corresponds to the handling of a <code class="prettyprint">push</code> effect for that stream. and joining several streams corresponds to the interpretation of several <code class="prettyprint">push 1, push 2, ..., push n</code> effects. This provides a very modular way to describe combinations like “with the latest occurrence of stock and price for a given article emit an availability event”.</p>
<p>The paper shows that with a limited set of effects, like <code class="prettyprint">push</code> but also <code class="prettyprint">trigger</code> to trigger the materialization of a tuple or <code class="prettyprint">get/set</code> to temporarily store events, we can reproduce most of the behaviours exposed by so-called “complex event processing” (CEP) libraries: windowing, linear/affine consumption, zipping, and so on.</p>
<h4 id="the-simple-essence-of-automatic-differentiation"><a href="https://arxiv.org/abs/1804.00746">The simple essence of automatic differentiation</a></h4>
<p>One of the distinguished papers of the conference, classical Conal Elliot work, a work of art. The explosion of machine learning methods using neural networks brought back the need to have efficient ways to compute the derivative of functions. When you try to fit the parameters of a neural network to match observed data you typically use “gradient descent” methods which require to compute the partial derivative of functions with thousands of input variables and generally just one output!</p>
<p>The basic idea behind “automatic differentiation” is to build functions with their associated derivative functions. Since many functions are not differentiable, what we can do is to build the ones that are! You start by creating simple functions for which you know the derivative and use some rules for creating more complex functions from the simple ones, calculating the corresponding derivative as you go, using well-known rules for deriving functions. For example the “chain rule” for the derivative of the composition of 2 functions. In practice the derivative of a function can be built out of a few operations: function composition, parallel product, cartesian product, and linear functions (like <code class="prettyprint">+</code>).</p>
<p>Actually those operations are pretty universal. If we abstract a bit by using category theory concepts we can define the derivative of a function in terms of operations from a category, a “monoidal” category, a “cartesian” category. Then by varying the category, taking other categories than Haskell functions for example, we can derive very useful programs. This is not new and was presented in <a href="http://bit.ly/2xRjptV">“compiling to categories”</a>, there is even a <a href="https://github.com/conal/concat/tree/master/plugin">Haskell plugin</a> supporting this construction automatically!</p>
<p>The paper builds on this idea for automatic differentiation and shows that using the “continuation of a category”, or the “dual of a category” or taking matrices as a category we get straightforward implementations of the differentiation of functions. In particular we get <strong>a simple implementation of the “reverse mode” of differentiation which does not mutate state like traditional algorithms and which can hence be easily parallelizable</strong>.</p>
<h2 id="icfp-day-2">ICFP Day 2</h2>
<p/>
<h4 id="competitive-parallelism-getting-your-priorities-right"><a href="https://arxiv.org/abs/1807.03703">Competitive Parallelism: Getting Your Priorities Right</a></h4>
<p>This is yet another “let’s create a language to solve a specific problem” but an important one which is the definition of priorities in a concurrent program. What we typically want is to be able to specify a partial order to define which thread “has a higher priority”, make sure that high-priority threads don’t depend on low-priority ones (the <a href="https://en.wikipedia.org/wiki/Priority_inversion#Consequences">“priority inversion” problem</a>), get an efficient way to schedule those threads on processors and get some bounds on the total computation time/latency of a high priority thread for a given program.</p>
<p>The authors have defined and implemented a language called “PriML” extending Standard ML with some new primitives, <code class="prettyprint">priority</code>, <code class="prettyprint">order</code>, a modal type system and a scheduler to support all these objectives. I wonder how we could design languages and compilers so that anyone can benefit from those features rather sooner than later but this seems to provide <strong>a good solution to the priority inversion problem that jeopardized the Mars Pathfinder mission</strong>.</p>
<h4 id="fault-tolerant-functional-reactive-programming-functional-pearl"><a href="https://dl.acm.org/citation.cfm?id=3236791">Fault Tolerant Functional Reactive Programming (Functional Pearl)</a></h4>
<p>Ivan Perez and his team have already shown how to use a <a href="http://bit.ly/2xOSMG1">“Monadic Stream” representation</a> to implement all the operators of various FRP (Functional Reactive Programming) libraries in Haskell. FRP can be used to represent various components of a Mars rover for example where there are various sensors and processors controling the behaviour of the rover.</p>
<p>Since this “monadic streams” representation allows you to change the “base” monad for streaming values they now use a variant of the <code class="prettyprint">Writer</code> monad to represent fault information in computations: “what’s is the probability that the value returned by a sensor is incorrect?”, “if it is incorrect, what is the likely cause for the failure?”. Then <strong>combining different reactive values will cumulate failure probabilities</strong>. However you can also introduce redundant components which will reduce the failure rate! Their library also tracks the failure causes at the type level so that you can have ways to handle a failure and the compiler can check that you have handled all possible failures for a given system.</p>
<h4 id="report-on-icfp-and-climate-change">Report on ICFP and Climate Change</h4>
<p>Not a technical paper but a report on what SIGPLAN plans to do to reduce carbon emissions. Benjamin Pierce presented the rationale for <em>doing something</em> and announced that next year <strong>ICFP 2019 might become carbon neutral by raising the price of tickets to buy carbon offsets</strong>. I personally <a href="https://www.carbonfootprint.com">bought my own carbon offset</a> for my trip to ICFP this year and I hope this will inspire other people to do the same, we just don’t have much time left to act on climate change.</p>
<h4 id="what-you-needa-know-about-yoneda-profunctor-optics-and-the-yoneda-lemma-functional-pearl"><a href="https://www.cs.ox.ac.uk/jeremy.gibbons/publications/proyo.pdf">What You Needa Know about Yoneda: Profunctor Optics and the Yoneda Lemma (Functional Pearl)</a></h4>
<p>What category theory is good for again? Well some results from category theory show up in many contexts and give us practical ways to transform some problems into others that are easier to deal with. The Yoneda lemma is one such result. It kind of says that the result of “transforming” a value <code class="prettyprint">a</code>, noted <code class="prettyprint">f a</code> is entirely determined by how all the objects that have a relation with <code class="prettyprint">a</code> are being transformed: <code class="prettyprint">forall x . (a -> x) -> f x</code>.</p>
<p>Using Haskell and assuming that <code class="prettyprint">f</code> is a functor the equivalence is pretty obvious. If I have <code class="prettyprint">forall x . (a -> x) -> f x</code> I can just set <code class="prettyprint">x</code> to be <code class="prettyprint">a</code> to get a <code class="prettyprint">f a</code>. In the other direction, if I have an <code class="prettyprint">f a</code> I also have a <code class="prettyprint">forall x . (a -> x) -> f x</code>, that’s the <code class="prettyprint">fmap</code> operation! In retrospect that’s kind of meh, what’s the big deal? Well we can prove tons of stuff with this lemma. For example monads have a <code class="prettyprint">bind</code> operation: <code class="prettyprint">bind :: m a -> (a -> m b) -> m b</code>. By using the Yoneda lemma we can prove that this is totally equivalent to having a <code class="prettyprint">join</code> operation: <code class="prettyprint">join :: m (m a) -> m a</code>!</p>
<p><strong>Another application of this lemma</strong> and the meat of this paper <strong>is the derivation of profunctor optics as found in the Haskell lens library, from the “classical” representation as a getter and setter</strong>. Why is that even useful? It is useful because the profunctor formulation uses a simple function to support lens operations. This means that to compose 2 lenses you can just use function composition. The Yoneda lemma helped us going from one representation to a more useful one!</p>
<h4 id="generic-deriving-of-generic-traversals"><a href="https://arxiv.org/pdf/1805.06798.pdf">Generic Deriving of Generic Traversals</a></h4>
<p>This is a new Haskell library which gives us lenses for any datatype deriving <code class="prettyprint">Generic</code>. One application is to solve the “record problem” in Haskell where there can be clashes when 2 different records have the same name for one of their fields. In that case <code class="prettyprint">person ^. field@"name"</code> will return the name of a <code class="prettyprint">Person</code> and <code class="prettyprint">dog ^. field@"name"</code> will work similarly. But the library can do much more. <strong>It allows you to select/update elements in a data structure by type</strong>, position, constraint (all the elements of the <code class="prettyprint">Num</code> typeclass for example) and even by structure! So you can focus on all the fields of a <code class="prettyprint">Scrollbar</code> that makes it a <code class="prettyprint">Widget</code> and apply a function to modify the relevant <code class="prettyprint">Widget</code> fields. This will return a <code class="prettyprint">Scrollbar</code> with modified fields.</p>
<h2 id="icfp-day-3">ICFP Day 3</h2>
<p/>
<h4 id="partially-static-data-as-free-extension-of-algebras"><a href="https://www.cl.cam.ac.uk/~jdy22/papers/partially-static-data-as-free-extension-of-algebras.pdf">Partially-Static Data as Free Extension of Algebras</a></h4>
<p>Some programs can use “multi-stage” programming to create much faster versions of themselves when we know some of the parameters. For example a <code class="prettyprint">power</code> function for a given power, say 4, can be specialized into <code class="prettyprint">x * x * x * x</code> which requires no conditionals, no recursion. However this misses another optimisation. Once we have computed the first <code class="prettyprint">x * x</code> we could reuse it to compute the final result <code class="prettyprint">let y = x * x in y * y</code>.</p>
<p>This paper shows how to <strong>use the properties of some algebras to optimize code generated by staged programs</strong>. This is a nice improvement and general framework for a domain that is not mainstream yet (staged programming) but we know that this is probably the future for writing programs which are both <a href="https://arxiv.org/abs/1612.06668">nicely abstract and very efficient</a>.</p>
<h4 id="relational-algebra-by-way-of-adjunctions"><a href="http://www.cs.ox.ac.uk/jeremy.gibbons/publications/reladj-dbpl.pdf">Relational Algebra by Way of Adjunctions</a></h4>
<p>Wonderful presentation by Jeremy Gibbons and Nicolas Wu where they decide to present their paper as a conversation around a cup of coffee and some equations and diagrams on napkins. The essence of their paper is to present <strong>operations and optimisations on sql tables (the “relational algebra”) as given by adjunctions</strong>. Adjunctions are a tool from category theory showing how some “problems” in a given domain can be translated and solved in another domain, then the solution can be “brought back” to the original domain (this is all of course a lot more formal than what I just wrote :-)).</p>
<p>Something to be noted, in their description of table joins they need to use a monad which is a “graded” monad where each <code class="prettyprint">bind</code> operation aggregates some “tags”, here the keys to the elements of the tables. This is way over my head but circling around the subject over and over I might understand everything one day :-).</p>
<h4 id="strict-and-lazy-semantics-for-effects-layering-monads-and-comonads"><a href="http://www.cs.cornell.edu/~ross/publications/sleffects/">Strict and Lazy Semantics for Effects: Layering Monads and Comonads</a></h4>
<p>If you consider throwing exceptions as having a monadic effect and laziness (consuming or not a value) a comonadic effect then you can transform expressions having those effects into a pure language having monads and comonads. Unfortunately they don’t always compose, unless when they are “distributive”, if there’s a way to permute the monad <code class="prettyprint">m</code> and the comonad <code class="prettyprint">c</code>. The paper proposes to “force” one or the other interpretation to solve this dilemma. Either have a “monadic priority” which gives us “strict semantic” for those expressions having both effects or use a “comonadic priority” which gives us a “lazy semantic” for those expressions.</p>
<p><strong>The cool theorem is that if the monad and comonad distribute then both choices give us the same result</strong>. The corollary is that if they don’t distribute then choosing a lazy or a strict semantic always give us different results!</p>
<h4 id="whats-the-difference-a-functional-pearl-on-subtracting-bijections"><a href="http://ozark.hendrix.edu/~yorgey/pub/GCBP-author-version.pdf">What’s the Difference? A Functional Pearl on Subtracting Bijections</a></h4>
<p>This is more of a mathematical puzzle than something that’s useful for day to day programming. When trying to evaluate some number of combinations (“how many ways can 3 people of a group have the same birthday?”), it can be useful to “remove” parts of some sets that we can count because they are in bijection with other well known sets (like the set of all the natural numbers). This is also a tricky operation.</p>
<p>The paper proves that <strong>there is an algorithm for substracting bijections which makes sense and which always terminates</strong>. I still have the naive feeling that the algorithm they presented could be simplified but I’m probably very wrong.</p>
<h4 id="ready-set-verify-applying-hs-to-coq-to-real-world-haskell-code-experience-report"><a href="https://arxiv.org/abs/1803.06960">Ready, Set, Verify! Applying hs-to-coq to Real-World Haskell Code (Experience Report)</a></h4>
<p>Good news: the Haskell containers library is bug free! How was that proven? By <strong>taking the Haskell code and translating it to Coq code, using as a specification typeclass laws and quickcheck properties from the test suite</strong>. This tool <code class="prettyprint">hs-to-coq</code> is now used in other contexts to write some specifications in Haskell and prove them with Coq. It can not yet be used to prove concurrency properties but the team is thinking about it.</p>
<h2 id="haskell-symposium-day-1">Haskell Symposium Day 1</h2>
<p/>
<h4 id="neither-web-nor-assembly">Neither Web nor Assembly</h4>
<p>Indeed <strong>WebAssembly is designed to be a low-level VM, performant and secure</strong>. The design of WA is supported by a formal specification, and a mechanical proof (in Isabel, Coq and K are underway). It is entirely standardised and supported, that’s a first, by all major browser vendors. So much that the proposal process mandates that any proposal must be implemented by 2 major implementations at least.</p>
<p>So far mostly C/C++ and Rust are supported because some higher-level language features required by other languages are still not available like tail calls (kind of important for functional languages :-)), exception management or garbage collection. A Haskell compiler (Asterius) is underway, being done by Tweag, and the vision is to be able to replace GHCJS on the mobile client wallets which are executing code for the Cardano blockchain.</p>
<h4 id="autobench"><a href="https://github.com/mathandley/AutoBench">AutoBench</a></h4>
<p>This is a neat <strong>tool coupling QuickCheck and Criterion to benchmark different functions</strong> and see which ones have the best runtime behaviour. Some limitations though: benchmarks are only performed across one notion of “size” whereas general functions might depend on various parameters and also this is limited to pure functions for now.</p>
<h4 id="improving-typeclass-relations-by-being-open"><a href="https://www.fceia.unr.edu.ar/~mauro/pubs/cm-conf.pdf">Improving Typeclass Relations by Being Open</a></h4>
<p>A great idea. Introduce a <strong>typeclass “morphism” to automatically create instances</strong>. For example if you have a typeclass for a partial order you would like to automatically get an instance for it everytime you have an <code class="prettyprint">Ord a</code> instance for any type <code class="prettyprint">a</code>. This new declaration <code class="prettyprint">class morphism Ord a => PartialOrd a</code> would have solved many compatibility issues with the introduction of the infamous <code class="prettyprint">Functor / Applicative / Monad</code> modification in Haskell.</p>
<h4 id="rhine-frp-with-type-level-clocks"><a href="https://www.manuelbaerenz.de/sites/default/files/Rhine_0.pdf">Rhine: FRP with Type-Level Clocks</a></h4>
<p>This is an addition to the “monadic streaming” framework introduced by Ivan Perez and his team. This provides some clock abstraction and synchronization mechanisms to make sure that different event sources emitting events at different rates can properly work together. <strong>It also provides a conceptual unification of “events” and “behaviours” from FRP: “events are clocks and behaviours are clock-polymorphic”</strong>.</p>
<h4 id="ghosts-of-departed-proofs-functional-pearl"><a href="http://kataskeue.com/gdp.pdf">Ghosts of departed proofs (Functional Pearl)</a></h4>
<p>“Existential types” are a great way to hide information in an API. Indeed a function can give you a value of a given type without you being able to do anything with that value instead of returning it to the API, proving that you are really returning something you have been given, not something that you have fabricated. But you don’t necessary need a value to enforce constraints on your API, you can use a “phantom type” to tag a computation with a specific type, for which there are no corresponding value.</p>
<p>One cool thing you can do is <strong>encode proofs with these phantom types</strong>. For example you can embed the proof that a list has been sorted. This is all free at runtime because those proofs only exist at compile time. Even better you can create a type representing proofs, with all the classical logical combinators <code class="prettyprint">and</code>, <code class="prettyprint">or</code>, <code class="prettyprint">implies</code>,… and associate them with values. This is all implemented in the gdp library.</p>
<h2 id="haskell-symposium-day-2">Haskell Symposium Day 2</h2>
<p/>
<h4 id="deriving-via-or-how-to-turn-hand-written-instances-into-an-anti-pattern"><a href="https://dl.acm.org/citation.cfm?doid=3242744.3242746">Deriving Via: or, How to Turn Hand-Written Instances into an Anti-pattern</a></h4>
<p><code class="prettyprint">DerivingVia</code> is a way to reduce boilerplate which just landed in GHC 8.6.1. <strong>With DerivingVia you can declare trivial instances for data types by reusing known instances for other types</strong>. For example you can declare a Monoid instance for the datatype <code class="prettyprint">newtype Pair = Pair Int Int</code> with <code class="prettyprint">deriving Monoid via (Sum Int, Mul Int)</code>. The generated instance will add the first element of the pair and multiply the second one.</p>
<h4 id="type-variables-in-patterns"><a href="https://arxiv.org/pdf/1806.03476.pdf">Type Variables in Patterns</a></h4>
<p>This is the complement to type applications in expressions: <strong>type applications in patterns</strong>. Being able to “extract” types from a pattern allows to properly name a given type to use it to type another expression. For example it will be possible to bind the existential type <code class="prettyprint">a</code> in <code class="prettyprint">data MyGadt b where Mk :: (Show a) => a -> MyGadt Int</code> in a pattern match by writing</p>
<pre><code class="prettyprint">case expr of MyGadt @a a ->
print a where
-- here the type `a` has been bound in the pattern match
print :: a -> Text
print = T.pack (show a)</code></pre>
<p>The implementation should start before the end of the year.</p>
<h4 id="the-thoralf-plugin-for-your-fancy-type-needs"><a href="https://dl.acm.org/citation.cfm?id=3242754">The Thoralf Plugin: For Your Fancy Type Needs</a></h4>
<p>This Haskell plugin uses <strong>a SMT solver to solve some of the obvious type constraints which GHC is not able to solve by itself</strong>.</p>
<h4 id="suggesting-valid-hole-fits-for-typed-holes-experience-report"><a href="https://mpg.is/papers/gissurarson2018suggesting.pdf">Suggesting Valid Hole Fits for Typed-Holes (Experience Report)</a></h4>
<p>Another useful tool for programming in Haskell. In Haskell we can use “type holes” giving the type that is expected in a given place and also some values which can fit that hole. Unfortunately type holes can be pretty useless with some expressions, using lenses for example. Now <strong>the suggestions for what can be put in the hole have been vastly improved by looking at all the functions in scope</strong> which could create a value for the hole. For example <code class="prettyprint">_ :: [Int] -> Int</code> will suggest <code class="prettyprint">Prelude.head</code> and <code class="prettyprint">Prelude.last</code>. But we can go further and ask for some “refinement”: which function, having itself a hole could be used to fill in the hole? In the example above <code class="prettyprint">foldr</code> could also be used provided that <code class="prettyprint">foldr</code> has the right parameters.</p>
<p>And… this is all available in GHC 8.6.1 already and even able to use the new <code class="prettyprint">doc</code> feature adding documentation to the suggestions.</p>
<h4 id="a-promise-checked-is-a-promise-kept-inspection-testing"><a href="https://arxiv.org/pdf/1803.07130.pdf">A Promise Checked Is a Promise Kept: Inspection Testing</a></h4>
<p>When the documentation of a library like <code class="prettyprint">generic-lens</code> says “the generated code is performant because it is the same as manually written code”, can you really trust it? It turns out that version <code class="prettyprint">0.4.0.0</code> of that library was breaking that promise and it was fixed is <code class="prettyprint">0.4.0.1</code>.</p>
<p>Fortunately, <code class="prettyprint">inspection-testing</code> is a GHC plugin to save the day. <strong>With that plugin you can instruct GHC to check the generated GHC Core code</strong>: <code class="prettyprint">inspect $('genericAgeLens === 'manualAgeLens)</code>. You can also test if you have unwanted allocations in your code. For example the <code class="prettyprint">Data.Text</code> package has many functions which are supposed to be “subject to fusion” but inspection testing proves that they are not.</p>
<h4 id="branching-processes-for-quickcheck-generators"><a href="http://www.cse.chalmers.se/~russo/publications_files/haskell2018.pdf">Branching Processes for QuickCheck Generators</a></h4>
<p>Dragen is library generating QuickCheck’s <code class="prettyprint">Arbitrary</code> instances for recursive datastructures. The cool thing is that <strong>you can specify the “depth” and the distribution of all constructors in the generated values</strong>. So you can ask for trees of depth 5 with a proportion of <code class="prettyprint">Int</code> nodes which is twice the number of <code class="prettyprint">Text</code> nodes.</p>
<h4 id="coherent-explicit-dictionary-application-for-haskell"><a href="https://arxiv.org/pdf/1807.11267.pdf">Coherent Explicit Dictionary Application for Haskell</a></h4>
<p>Finally <strong>a way to avoid the “one instance of a typeclass per type only!” limitation</strong>. This implemented proposal allows to materialize a typeclass instance as a “Dictionary” and then do an explicit “dictionary application” to specify which dictionary you want to apply. For example you can write <code class="prettyprint">nubCI = nub @{ eqOn (map toLower) }</code> where <code class="prettyprint">eqOn</code> produces a dictionary.</p>
<p>The main concern with this type of proposal is that we could break coherence which is guaranteed to happen where there’s only one instance per type. <code class="prettyprint">Set</code> operations for example can not be used with 2 different <code class="prettyprint">Ord</code> dictionaries. This can actually be checked by looking at the “role” of the type variable <code class="prettyprint">a</code> in <code class="prettyprint">Set a</code>. It is “nominal” whereas this proposal only works with “representational” roles. Otherwise if you try to do a dictionary application where there’s any possibility that you then get 2 different instances for the same typeclass then you get a warning.</p>
<h4 id="theorem-proving-for-all-equational-reasoning-in-liquid-haskell-functional-pearl"><a href="https://arxiv.org/pdf/1806.03541.pdf">Theorem Proving for All: Equational Reasoning in Liquid Haskell (Functional Pearl)</a></h4>
<p>Such a brilliant idea, since theorems are types and proofs are code, let’s <strong>write the theorems as Liquid Haskell types and proofs as Haskell code</strong>! Then LiquidHaskell is able check if your proof of <code class="prettyprint">reverse (reverse xs) == xs</code> is correct. You will not get all the support you can get from a general theorem prover environment but I think this is a neat teaching tool at least. And the proofs disappear at runtime, nice.</p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5336273.post-78441257171844669532018-03-17T10:50:00.002+09:002022-09-17T03:22:11.848+09:00(Haskell) modules for the masses<i><b>Update! A library providing this approach based on a different implementation is available on Github: https://github.com/etorreborre/registry</b></i>
<p>I have now been working with Haskell for around 3 months and I can say that it is a pretty enjoyable
experience. Using Functional Programming is a lot more straightforward than with Scala and I rarely have to fight the
syntax. I haven't been severely bitten by laziness yet (meaning I cannot call myself an "experienced" Haskell
developer :-)) and I am even starting seeing the advantages of it in terms of mental flow: "I don't have to care about this,
if it's not evaluated it will cost me nothing".</p>
<p>I have also had my share of head scratching with the <code class="prettyprint lang-hs">mtl</code> (Monad Transformer Library) and with lenses. But this is not
what I want to write about today. My big preoccupation is not so much how we can use technique X or Y to write
something in the small, but rather how we can write code in the large. Already at the level of a service using a
database, writing to S3 and publishing/subscribing to an event broker the question of code organisation becomes a major
one.</p>
<h3>Organising Haskell code</h3>
<p>In languages like Scala you have packages, traits and classes and various dependencies injection libraries to give you
some guidance. In Haskell you will have to read a handful of blog posts and make your own opinion. Should you use
<a href="http://degoes.net/articles/modern-fp-part-2">Free Monads</a> and an "onion architecture"? Should you use the <a href="https://hackage.haskell.org/package/mtl">mtl</a>, the
<a href="https://github.com/gwils/next-level-mtl-with-classy-optics">"next level mtl with classy optics"</a> or a
<a href="https://ocharles.org.uk/blog/posts/2016-01-26-transformers-free-monads-mtl-laws.html">bit of both</a>? Or maybe the
<a href="https://www.schoolofhaskell.com/user/meiersi/the-service-pattern">Service Pattern</a>? Should we call it the
<a href="https://jaspervdj.be/posts/2018-03-08-handle-pattern.html#handle-polymorphism">Handle Pattern</a>? There is also a library
for <a href="https://hackage.haskell.org/package/hs-di">dependency injection</a> using Template Haskell.</p>
<p>With my previous Scala project I experimented with new ideas for dependency injection and we open-sourced a library
called
<a href="https://github.com/zalando/grafter">Grafter</a> to support this approach. This library has served us well even if it
could definitely be improved. In Grafter we use a cool technique called "tree rewriting" to help with the wiring and
re-wiring of our application. But that library relies on some "tricks" which are not available to Haskell, like the
capability
to collect objects of a given type at runtime (effectively doing an <code class="prettyprint lang-hs">isInstanceOf</code> test) or reflection to access the
properties of a given case class. Not only we cannot do that in Haskell but we don't even have objects to begin with!</p>
<p>So with Haskell I was forced to revisit my choices for structuring code. Maybe I should start with what I want to
achieve?</p>
<h3>My objectives</h3>
<p>My overall goal is to be fairly productive producing code across several applications and teams, possibly sharing libraries,
in an environment where objectives can shift really fast.</p>
<p>This means:</p>
<ol>
<li>
<p>be able to declare components having a public interface which is just a list of functions. "Information hiding" is not
just something we learn in textbooks, it is very important to isolate from changes and prevent everything to become
a huge bowl of spaghetti. Also in the kind of systems I am working on, code reuse and evolution does not happen at
the function level. One day you use service 1 to get product definitions, the next you have to use service 2 and up
to 10 functions need to be reimplemented. So the good old idea of "component" still makes sense.</p>
</li>
<li>
<p>have simple function definitions, with mostly <code class="prettyprint lang-hs">IO</code> for an easy interoperability of different components. In
Haskell, functions,
especially the effectful ones, can easily be abstracted to be returning some monad <code class="prettyprint lang-hs">m</code> with all sorts of
constraints:
<code class="prettyprint lang-hs">MonadReader</code>, <code class="prettyprint lang-hs">MonadBaseControl</code>, <code class="prettyprint lang-hs">MonadCatch</code>, <code class="prettyprint lang-hs">MonadFileSystem</code>, <code class="prettyprint lang-hs">MonadLogger</code> and so on. At first glance this doesn't
look like a major issue but my (limited) experience tells me that this leads to: 1. complex type signatures 2. playing
lots of type tetris when interacting with several such functions. Not fun.</p>
</li>
<li>
<p>have components declare their dependencies and configuration at their declaration site. This way they are kind of
"self-contained" and it should be easy to just extract one of them to a new library for sharing</p>
</li>
<li>
<p>have a way to easily wire a full application from a list of all wanted components and configurations.
If a new dependency is added to a component, we shouldn't have to change the wiring code. This really helps with
maintaining the code. You need a new service? Just add a dependency. You want to split an existing component in 2?
This is just 2 or 3 steps.</p>
</li>
<li>
<p>have a way to replace specific components in an existing application. This is useful in all sorts of situations. For
example you want to test how well your application performs if one endpoint responds faster, or always respond "ok".
Just write <code class="prettyprint lang-hs">replace component</code> or something similar and you're done. When you want to test a complex piece of
business logic orchestrating
different services you can just mock them out, all of them, or just some of them.</p>
</li>
</ol>
<p>My hunch is that all of this is crucial to scale development to grow a medium size application or to grow a team working
on interrelated services.</p>
<h3>A design space</h3>
<p/>
<h4>Interfaces</h4>
<p>Objective n.1 can be satisfied by either using typeclasses or creating a record of functions. Typeclasses can declare
their requirements: I can do <code class="prettyprint lang-hs">MonadFileSystem</code> if I know how to do <code class="prettyprint lang-hs">MonadIO</code> for example. And typeclasses can have
instances, even with some rules of overriding them. Having some language support is nice! Or is it?</p>
<p>First of all you need to understand how typeclass resolution works and understand the difference between <code class="prettyprint lang-hs">Overlapping</code>
and <code class="prettyprint lang-hs">Overlappable</code>. Then you might have to jump through some hoops when mixing different components together having
different constraints. Haskell is so polymorphic that it is only when you are really executing an expression that
the underlying data structure supporting the constraints "appears". For example if I <code class="prettyprint lang-hs">runReaderT action r</code> then the
<code class="prettyprint lang-hs">MonadReader r</code> constraint on <code class="prettyprint lang-hs">action</code> gets resolved and <code class="prettyprint lang-hs">action</code> appears to be a <code class="prettyprint lang-hs">ReaderT r m a</code>. This possibility of
delaying the selection of the concrete data structure in Haskell is a great force, but it makes the code also hard to
understand because you have to run the typeclass resolution algorithm in your head to understand what's going on.</p>
<p>Worse, I experienced a strange bug 2 weeks ago with the <a href="https://github.com/mtesseract/nakadi-client"><code class="prettyprint lang-hs">nakadi-client</code></a>
library.
After a refactoring I was making requests which were unauthenticated. How is that possible?
I am using <code class="prettyprint lang-hs">local</code> to modify the environment of <code class="prettyprint lang-hs">MonadNakadi</code> which is some sort of <code class="prettyprint lang-hs">MonadReader</code> to access Zalando event bus.
And I am indeed setting an authentication token on the requests. It turns out that I had 2 layers of <code class="prettyprint lang-hs">Reader</code> in my stack
and I was probably not modifying the right one. I eventually refactored the code to set authentication earlier in
the process without <code class="prettyprint lang-hs">local</code> and things were back to normal. My inability to understand what was going on lead to a bug
which could not be caught by the compiler. Not cool. (the <code class="prettyprint lang-hs">nakadi-client</code> library is very cool though, I'm so grateful
it exists :-)).</p>
<p>When I read about the <a href="https://jaspervdj.be/posts/2018-03-08-handle-pattern.html#handle-polymorphism">Handle Pattern</a>,
that was an illumination, this is what I want, a simple collection of functions with a simple interface.</p>
<p>For example I might want to declare a <code class="prettyprint lang-hs">Calculator</code> as:</p>
<pre><code class="prettyprint lang-hs">data Module =
Module
{ add :: Int -> Int -> IO Int
, multiply :: Int -> Int -> IO Int
}
</code></pre>
<p>When I get such a record, the implementation is completely hidden, I am protected from any evolution of the implementation.
How do I even create such a component? Like anything in Haskell, with a function:</p>
<pre><code class="prettyprint lang-hs">new :: Adder.Module -> Multiplier.Module -> Module
new adder multiplier = Module (addWith adder) (multiplyWith multiplier)
</code></pre>
<p>Besides the guilty pleasure of reusing a well-known OO keyword this gives us a way to declare the dependencies for our
component. And <code class="prettyprint lang-hs">addWith</code>, <code class="prettyprint lang-hs">multiplyWith</code> are (private) part of the module implementation.</p>
<h4>Interoperability</h4>
<p>Each function returns <code class="prettyprint lang-hs">IO</code> so the interoperability is maximal, no crazy constraints to accommodate.
There is a small issue though. A small issue which drove me mad.</p>
<p>We need to operate our services, not just develop them.
When something goes wrong in production it is incredibly useful to have a <code class="prettyprint lang-hs">FlowId</code> identifying requests coming from
clients and flowing through all of our services. But if you have components returning <code class="prettyprint lang-hs">IO</code> values there is no way to pass
this <code class="prettyprint lang-hs">FlowId</code> from one component to another. This necessitates a <code class="prettyprint lang-hs">MonadReader FlowId</code> constraint, or maybe a
<code class="prettyprint lang-hs">ReaderT FlowId IO a</code> return type. Then we are back to more complex type signatures for something which is actually a
very small concern in the scale of things. And that concern is "polluting" our whole ecosystem! Same if we want to use
a logging library like <a href="https://hackage.haskell.org/package/katip">katip</a> we need to add something like <code class="prettyprint lang-hs">MonadReader KatipContext</code>
everywhere. Just for a single small concern!</p>
<p>I tried many different ideas to get around this issue and end back in IO but nothing worked. Because I believe that nothing
can work in a proper functional programming context. If you want the callee to know about its context and its caller, you
<em>need</em> to pass some information! So you need a form of <code class="prettyprint lang-hs">Reader</code> and we decided to extend the <code class="prettyprint lang-hs">IO</code> type to <code class="prettyprint lang-hs">RIO</code>, just one
more letter:</p>
<pre><code class="prettyprint lang-hs">newtype RIO a = RIO { runRIO :: Env -> IO a }
</code></pre>
<p>where <code class="prettyprint lang-hs">Env</code> is defined as</p>
<pre><code class="prettyprint lang-hs">data Env =
Env
{ context :: Maybe Context
, namespace :: Namespace
} deriving (Eq, Show)
</code></pre>
<p>This means that in addition to doing IO the functions of a component can require a bit of knowledge from the caller:</p>
<ul>
<li>
<p>a <code class="prettyprint lang-hs">Context</code> for example to pass the <code class="prettyprint lang-hs">FlowId</code></p>
</li>
<li>
<p>a Namespace (a list of names) for example to describe some nested processes:
<code class="prettyprint lang-hs">processing event { "eventId" = "123" } > getting master data > authenticating { "role" = "admin" }</code></p>
</li>
</ul>
<p><code class="prettyprint lang-hs">Context</code> and <code class="prettyprint lang-hs">Namespace</code> are essentially "stringly" data just modelling the context of the caller with the following
properties:</p>
<ul>
<li>contexts can be replaced, setting a context removes the previous one</li>
<li>namespaces are appended, like breadcrumbs on a website</li>
</ul>
<p>I hear some of you saying that we could get fancy and use something like:</p>
<pre><code class="prettyprint lang-hs">newtype RIO r a = RIO { runRIO :: r -> IO a }
</code></pre>
<p>Now we don't have to be concrete in the environment type. However we lose a lot in terms of simplicity and we risk
having to deal with <code class="prettyprint lang-hs">RIO r1 a</code> and <code class="prettyprint lang-hs">RIO r2 b</code> and have to find ways to unify <code class="prettyprint lang-hs">r1</code> and <code class="prettyprint lang-hs">r2</code>. No problem, let's use
lenses! And then we go through even more complex type signatures.</p>
<p>I am clearly making a compromise here. By not using the most typed signature I expose myself to the danger of programming
with strings. But I think this is worth it because we get easier code for things which matter a lot more than flow ids or
contextual logging.</p>
<h4>Configuration</h4>
<p>Inside each "Module" file there should be a declaration for the module configuration:</p>
<pre><code class="prettyprint lang-hs">data Config =
Config
{ invertSigns :: Bool }
</code></pre>
<p>And <code class="prettyprint lang-hs">new</code> becomes:</p>
<pre><code class="prettyprint lang-hs">new :: Config -> Adder.Module -> Multiplier.Module -> Module
new (Config invert) adder multiplier =
Module (addWith invert adder)
(multiplyWith multiplier)
addWith :: Bool -> Adder.Module -> Int -> Int -> RIO Int
addWith True adder a b = pure (adder & add a b)
addWith False adder a b = pure (- (adder & add a b))
</code></pre>
<p>This way components are self-contained and easy to extract to libraries</p>
<h3>Replacing components</h3>
<p>What do we do about wiring a bunch of components and in particular how to replace one of them, right at the bottom of
the stack? Do you need to recreate the full application, calling <code class="prettyprint lang-hs">new</code> all over the place?</p>
<p>I found a neat trick to do this: a "registry", and ... the <code class="prettyprint lang-hs">State</code> monad.</p>
<p>Let's create a data structure holding all the components we want to build:</p>
<pre><code class="prettyprint lang-hs">data Modules =
Modules
{ _adder :: Maybe Adder.Module
, _multiplier :: Maybe Multiplier.Module
, _calculator :: Maybe Calculator.Module
, _config :: Config
}
</code></pre>
<p>How do we make a <code class="prettyprint lang-hs">calculator</code>? We get it from <code class="prettyprint lang-hs">Modules</code>. If missing we create it with <code class="prettyprint lang-hs">new</code> by first getting its dependencies:</p>
<pre><code class="prettyprint lang-hs">makeCalculator :: State Modules Calculator.Module
makeCalculator = do
modules <- get
case _calculator modules of
Just c -> pure c
Nothing -> do c <- new <$> makeConfig <*> makeAdder <*> makeMultiplier
_ <- put c
pure c
</code></pre>
<p>Then we put the newly created component in the registry, done. (recursively for all the dependencies!)
With <code class="prettyprint lang-hs">makeCalculator</code> we can pass a <code class="prettyprint lang-hs">Modules</code> value where all the components are set to <code class="prettyprint lang-hs">Nothing</code>:</p>
<pre><code class="prettyprint lang-hs">prodModules =
Modules
{ _adder = Nothing
, _multiplier = Nothing
, _calculator = Nothing
, _config = prodConfig
}
</code></pre>
<p>Replacing a component is super easy, just set it in the registry:</p>
<pre><code class="prettyprint lang-hs">testModules =
prodModules
{ _adder = Just (testAdder)
}
</code></pre>
<p>We can automate a bit of that with 2 typeclasses:</p>
<pre><code class="prettyprint lang-hs">-- means that there is a possibility to get the Module from a registry s
-- (and possibly get nothing) and also to set it in the registry s
class Register s m where
access :: s -> Maybe m
register :: s -> m -> s
-- Make s m means that there is a way to build the module m given an initial configuration s
class Make s m where
make :: State s m
</code></pre>
<p>Then for a given module a <code class="prettyprint lang-hs">Makeable</code> instance can be created like this:</p>
<pre><code class="prettyprint lang-hs">instance ( Register s Module
, Make s Config
, Make s Adder.Module
, Make s Multiplier.Module
) => Make s Module where
makeIt = create3 new register make make make
</code></pre>
<p>This declares all the dependencies for a given component and <code class="prettyprint lang-hs">make3</code> is just a function generalizing the <code class="prettyprint lang-hs">makeCalculator</code>
above to any constructor using the implicit typeclass instances so there's nothing to implement. Adding/Removing a new dependency to a
component becomes pretty trivial.</p>
<p>Inside the top level application we also need to create some <code class="prettyprint lang-hs">Registry</code> instances to describe how to interact with the
"registry" for each component and how to "make" the various <code class="prettyprint lang-hs">Config</code> values from it:</p>
<pre><code class="prettyprint lang-hs">-- | find the Adder module in the registry
instance Register Modules Adder.Module where
access m = _adder s
register s m = s { _adder = Just m }
-- | declare that `Adder.Config` can be made directly
-- by extracting the `Adder.Config` from the `Modules`
-- data structure.
-- This uses the `config` and `adderConfig` lenses generated for `Modules`
instance Make Modules Adder.Config where
make = get <&> (^. config . adderConfig)
</code></pre>
<p>This part will be generated using Template Haskell in the future.</p>
<h3>Starting services</h3>
<p>The real story is always more complicated :-). Some components are "stateful" because they hold a database connection
or a cache, so they have to be started. The solution to this new requirement is not difficult: use a <code class="prettyprint lang-hs">new</code> function
returning a <code class="prettyprint lang-hs">RIO Module</code> and starting things. Then when doing the application wiring we operate in <code class="prettyprint lang-hs">StateT RIO</code> instead
of just <code class="prettyprint lang-hs">State</code>. This is also what makes the big difference between <code class="prettyprint lang-hs">Config</code> and <code class="prettyprint lang-hs">Module</code> in a component file: <code class="prettyprint lang-hs">Config</code>
is for pure data and <code class="prettyprint lang-hs">Module</code> can trigger some side effects for its creation.</p>
<h3>Conclusion</h3>
<p>This all very new and currently only tested on a medium-size service. But I really like this approach: not a lot of type
magic, enough flexibility, clear guidance on how to write code. It is probable that we will get more
questions along the road (otherwise why would the world need so big dependency injection libraries?), I will report
on them. In the meantime please share your ideas, add your comments. Especially if you are thinking that we are making
a huge mistake somewhere that should be addressed right now!</p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5336273.post-78034963782758693042017-11-23T00:09:00.001+09:002022-09-17T03:22:50.440+09:00If at first it doesn't succeed...<h5 id="if-at-first-it-doesnt-succeed-try-something-else">If at first it doesn’t succeed, try something else</h5>
<p>Today we start from a very common programmer need and we end up checking laws! With some considerations about software modularity.</p>
<p>I am currently implementing a command-line application in Haskell. This application needs a <code class="prettyprint">OAuthToken</code> to access some webservice. There are 2 ways to get a token:</p>
<ul>
<li>with an environment variable: <code class="prettyprint">OAUTH_TOKEN</code></li>
<li>with a call to another command-line tool, <code class="prettyprint">ztoken</code>, which gives you a token based on your credentials
<p/></li>
</ul>
<p>This translates to the following Haskell functions</p>
<pre><code class="prettyprint">getTokenFromEnvironment :: ExceptT GetTokenError IO OAuthToken
getTokenFromCommandLine :: ExceptT GetTokenError IO OAuthToken</code></pre>
<p>Here I am using:</p>
<ul>
<li><p><code class="prettyprint">IO</code> as my base monad for interacting with the external world</p></li>
<li><p><code class="prettyprint">ExceptT GetTokenError</code> as a way to declare an error if I cannot get a token because either I don’t have an <code class="prettyprint">OAUTH_TOKEN</code> variable in my environment or <code class="prettyprint">ztoken</code> does not accept my credentials. I leave other issues, like network access problems to exceptions in the <code class="prettyprint">IO</code> monad</p></li>
</ul>
<pre><code class="prettyprint">data GetTokenError =
CommandLineError String |
EnvironmentError String</code></pre>
<p>What I really want now is a way to combine those 2 calls into one</p>
<pre><code class="prettyprint">getToken :: ExceptT GetTokenError IO OAuthToken</code></pre>
<p>The first thing which came to my mind was the <code class="prettyprint">Alternative</code> type class and in particular the <code class="prettyprint"><|></code> operator:</p>
<pre><code class="prettyprint">(<|>) :: f a -> f a -> f a</code></pre>
<p>This typeclass has an instance for <code class="prettyprint">ExceptT e</code> provided that there is a <code class="prettyprint">Monoid e</code>, meaning that I can accumulate errors. Actually what I just need is the <code class="prettyprint">Alt</code> typeclass which just requires a <code class="prettyprint">Semigroup e</code>. For this I need to adapt my error type a little bit:</p>
<pre><code class="prettyprint">data GetTokenError =
CommandLineError String |
EnvironmentError String |
RepeatedErrors GetTokenError GetTokenError
deriving (Eq, Show)</code></pre>
<p>The <code class="prettyprint">RepeatedErrors</code> case gives us the possibility to accumulate errors in the case of repeated failed calls to retrieve an <code class="prettyprint">OAuthToken</code>. You could argue that it is not particularly well modelled because <code class="prettyprint">RepeatedErrors</code> stores errors more like a tree rather than a list. But let’s leave it at that for now.</p>
<p>The <code class="prettyprint">Semigroup</code> instance for <code class="prettyprint">GetTokenError</code> looks like this:</p>
<pre><code class="prettyprint">instance Semigroup GetTokenError where
e1 <> e2 = RepeatedErrors e1 e2</code></pre>
<p>I can finally define:</p>
<pre><code class="prettyprint">getToken :: ExceptT GetTokenError IO OAuthToken
getToken =
getTokenFromEnvironment <|>
getTokenFromCommandLine</code></pre>
<p>This is really nice because this is exactly what I want to express!</p>
<ul>
<li>get the token from the environment</li>
<li>if that doesn’t work get it from the command line</li>
</ul>
<h3 id="more-abstraction-on-the-way">More abstraction on the way</h3>
<p>This looks all good but something is annoying. My functions are using <code class="prettyprint">ExceptT GetTokenError IO</code> which is a very concrete monad stack. I am going to make further calls to other monad stacks to make HTTP calls and I might have to <code class="prettyprint">lift</code> all over the place to align all the different stacks. This is even more annoying if I create a small library for getting tokens because I impose my stack choices on all clients of the API.</p>
<p>There is a way out of this, the monad transformers library (<code class="prettyprint">mtl</code>). In the mtl there are all sorts of typeclasses abstracting features provided by monad transformers. One of them is <code class="prettyprint">MonadError</code>:</p>
<pre><code class="prettyprint">class Monad m => MonadError e m | m -> e where
throwError :: e -> m a
catchError :: m a -> (e -> m a) -> m a</code></pre>
<p>If a <code class="prettyprint">Monad</code> has a <code class="prettyprint">MonadError</code> instance then you can catch and throw errors. Nice. However there is a catch (no pun intended :-)). The <code class="prettyprint">| m -> e</code> part of <code class="prettyprint">MonadError</code> means that the Monad <code class="prettyprint">m</code> you are going to eventually select to support the <code class="prettyprint">MonadError</code> functionality is dependent on the error type <code class="prettyprint">e</code>.</p>
<p>Concretely this means that this type signature propagates to the top of the application:</p>
<pre><code class="prettyprint">getToken :: (MonadError GetTokenError m, MonadIO m) => m OAuthToken</code></pre>
<p>Indeed, if I mix other calls to <code class="prettyprint">getToken</code>, for example</p>
<pre><code class="prettyprint">getPartitions :: (MonadError GetTokenError m, MonadIO m) => m [Partition]
getPartitions =
do token <- getToken
callService token partitionsRequest</code></pre>
<p>Then the <code class="prettyprint">MonadError GetTokenError m</code> constraints stays because <code class="prettyprint">m</code> is tied to it forever. And if <code class="prettyprint">callService</code> declares its own error type, I won’t be able to mix both <code class="prettyprint">getToken</code> and <code class="prettyprint">callService</code> calls:</p>
<pre><code class="prettyprint">callService :: (MonadError HttpError m, MonadIO m) :: OauthToken -> Request a -> a</code></pre>
<p>Maybe one solution would be to define <code class="prettyprint">getPartitions</code> as:</p>
<pre><code class="prettyprint">getPartitions :: (MonadError (GetTokenError Either ServiceError) m, MonadIO m) => m [Partition]</code></pre>
<p>In a way we are back to the previous problem. We now have a “concrete stack” of errors where we just want “a” structure capable of holding both <code class="prettyprint">GetTokenError</code> and <code class="prettyprint">HttpError</code>.</p>
<h3 id="lenses-to-the-rescue">Lenses to the rescue</h3>
<p>A partial solution to the problem is given by lenses and is wonderfully explained by Georges Wilson in <a href="https://github.com/gwils/next-level-mtl-with-classy-optics/blob/master/Slides.pdf">“next level mtl with classy optics”</a> (I also recommend <a href="https://gist.github.com/nkpart/c3bcb48c97c5ded6e277">this gist</a> by Nick Partridge):</p>
<pre><code class="prettyprint">-- create prisms for GetTokenError
makeClassyPrisms ''GetTokenError
-- now e is anything having a Prism allowing us to extract or inject a GetTokenError
getToken :: (MonadError e m, MonadIO m, AsGetTokenError e) => m OAuthToken</code></pre>
<p>So it becomes kind of easier to mix calls having different error types:</p>
<pre><code class="prettyprint">getPartitions :: (MonadError e m, AsGetTokenError e, AsHttpError e, MonadIO m) => m [Partition]</code></pre>
<p>This is probably a better way to mix different error types with <code class="prettyprint">MonadError</code>. We don’t get rid of the functional constraint though. The error types still “bubble-up” to the top and the method used to do the authentication is exposed to the clients of <code class="prettyprint">getPartitions</code>. So if we switch the authentication mechanism and get different error types we will have to change all the functions calling <code class="prettyprint">getPartitions</code>.</p>
<p>I think we need to really take care of this kind of situation because it makes software a lot harder to evolve when a small change has ripple effects across all the software layers.</p>
<h3 id="errors-translation-encapsulation">Errors translation / encapsulation</h3>
<p>What we can do is to define a new error type subsuming <code class="prettyprint">GetTokenError</code> and <code class="prettyprint">HttpError</code>:</p>
<pre><code class="prettyprint">data ServiceError =
AuthenticationError GetTokenError |
CallError HttpError
makeClassy ''ServiceError
instance AsGetTokenError ServiceError
instance AsHttpError ServiceError</code></pre>
<p>And we need a bit of boilerplate to do the translation between <code class="prettyprint">getToken</code> and <code class="prettyprint">callService</code>. To be able to “liberate” those 2 functions from their <code class="prettyprint">MonadError e m, MonadIO m</code> constraints we can run them with the minimum stack which satisfies these constraints, this is <code class="prettyprint">ExceptT <error type> IO a</code>:</p>
<pre><code class="prettyprint">runOAuthToken :: (MonadError e m, AsCliError e, MonadIO m) => ExceptT GetTokenError IO a -> m a
runOAuthToken = runExceptTIO (review _AuthenticationError)
runCallService :: (MonadError e m, AsCliError e, MonadIO m) => ExceptT HttpError IO a -> m a
runCallService = runExceptTIO (review _HttpError)
runExceptTIO :: (MonadError e m, MonadIO m) => (f -> e) -> ExceptT f IO a -> m a
runExceptTIO mapError ioa =
do valueOrError <- liftIO (runExceptT ioa)
fromEitherM (throwError . mapError) valueOrError</code></pre>
<p>(<code class="prettyprint">fromEitherM</code> comes from the <code class="prettyprint">from-sum</code> package)</p>
<p>Then we can make both calls and “unify” them under new constraints:</p>
<pre><code class="prettyprint">getPartitions :: (MonadError e m, AsServiceError e, MonadIO m) => m [Partition]
getPartitions =
do token <- runToken getToken
runService (callService token partitionsRequest)</code></pre>
<p/>
<h3 id="alternatives">Alternatives?</h3>
<p>Unfortunately being more abstract with <code class="prettyprint">getToken</code> breaks the use of <code class="prettyprint"><|></code> which was so convenient. One option for fixing it is to define a similar operator for <code class="prettyprint">mtl</code> classes:</p>
<pre><code class="prettyprint">(<|>) :: (MonadError e m, Semigroup e) => m a -> m a -> m a
(<|>) ma1 ma2 =
catchError ma1 (\e1 ->
catchError ma2 (\e2 ->
throwError (e1 <> e2)))</code></pre>
<p>But this still doesn’t work given the way we have defined our errors as prisms over a general error type! We need to define a extension of <code class="prettyprint"><|></code> which uses the appropriate <code class="prettyprint">Prism</code> to aggregate errors:</p>
<pre><code class="prettyprint">(<|?>) :: (MonadError e m, Semigroup o) => Prism' e o -> m a -> m a -> m a
(<|?>) p ma1 ma2 = catchError ma1 (\e1 ->
case e1 ^? p of
Nothing -> throwError e1
Just o1 ->
catchError ma2 (\e2 ->
case e2 ^? p of
Nothing -> throwError e1
Just o2 -> throwError (review p (o1 <> o2))))</code></pre>
<p>And finally</p>
<pre><code class="prettyprint">getToken :: (MonadError e m, MonadIO m, AsGetTokenError e) => m OAuthToken
getToken =
(<|?>) _GetTokenError
getTokenFromEnvironment
getTokenFromCommandLine</code></pre>
<p>This is not syntactically as nice as before but we can improve that:</p>
<pre><code class="prettyprint">infix 4 <!>
infix 3 <?>
data Alternating a = Alternating a a
(<!>) :: MonadError e m => m a -> m a -> Alternating (m a)
(<!>) = Alternating
(<?>) :: (MonadError e m, Semigroup o) => Prism' e o -> Alternating (m a) -> m a
(<?>) = error "left to the reader"
getToken =
_GetTokenError <?>
getTokenFromEnvironment <|>
getTokenFromCommandLine</code></pre>
<p/>
<h3 id="what-about-laws">What about laws?</h3>
<p>This is not the happy end of the story. The <code class="prettyprint">Alternative</code> type class specifies that <code class="prettyprint"><|></code> must be an associative operation. This means that we probably need to prove that our <code class="prettyprint"><|></code> and its evil twin <code class="prettyprint"><|?></code> are associative operators. What does that mean for instances of <code class="prettyprint">MonadError e m</code>? By the way what is even a <code class="prettyprint">MonadError e m</code>?</p>
<p>I was pretty shocked to realize that <code class="prettyprint">MonadError</code> doesn’t come up with any laws in Haskell! We need to turn to <a href="https://github.com/purescript/purescript-transformers/blob/master/src/Control/Monad/Error/Class.purs#L18">Purescript</a> for this:</p>
<pre><code class="prettyprint">-- | - Left zero: `throwError e >>= f = throwError e`
-- | - Catch: `catchError (throwError e) f = f e`
-- | - Pure: `catchError (pure a) f = pure a`</code></pre>
<p>And we probably need an additional law for our <code class="prettyprint"><|></code> operator:</p>
<pre><code class="prettyprint">-- | - Catch associativity: `catchError (catchError ma f1) f2 =
-- catchError ma (\e -> catchError (f1 e) f2)`</code></pre>
<p>Since many instances already exist for <code class="prettyprint">MonadError</code> the best we can do is to check if those laws hold for these instances.</p>
<p>I was at first very enthusiastic about that. There are 2 libraries in Haskell which can be used to retroactively find properties for a give API: <code class="prettyprint">quickspec</code> and <code class="prettyprint">speculate</code>. However I was unable to get any of them to compile / find laws. Since I had been shaving too many yaks on this application I decided to stop there. But I hope I will get some time to get those libraries to find laws.</p>
<p>Anyway, the best I could do was to check the laws for all the instances of <code class="prettyprint">MonadError</code>: <code class="prettyprint">EitherT</code>, <code class="prettyprint">ListT</code>, <code class="prettyprint">MaybeT</code>, <code class="prettyprint">ExceptT</code> and so on. I verified a bunch of them using QuickCheck and they seem to hold. Actually if that wasn’t the case I would be surprised and would probably question the implementation. I also <em>proved</em> that the laws were holding for the <code class="prettyprint">Either</code> instance of <code class="prettyprint">MonadError</code>. It goes like this, for the associativity of <code class="prettyprint">catch</code>:</p>
<pre><code class="prettyprint">-- this is what we want to prove
catchError (catchError m k2) k1 == catchError m (\\e -> catchError (k2 e) k1)
--> case 1: m = Left t
catchError (catchError (Left t) k2) k1 == catchError (Left t) (\\e -> catchError (k2 e) k1)
-- we apply on each side the definition of catchError for Either when the (m a)
-- value is a Left value which is to call the handler, CQFD
catchError (k2 t) k1 = catchError (k2 t) k1
--> case 2: m = Right a
catchError (catchError (Right a) k2) k1 == catchError (Right a) (\\e -> catchError (k2 e) k1)
-- we apply on each side the definition of catchError for Either when the (m a) value
-- is a Right value which is to leave it as it is, CQFD
catchError (Right a) k1 = Right a
-- we do it once more on the left side of the equation
Right a = Right a</code></pre>
<p>I suspect that it is possible to prove the laws for other <code class="prettyprint">MonadError</code> instances, <code class="prettyprint">StateT</code> would probably be interesting.</p>
<h3 id="what-have-we-learned">What have we learned?</h3>
<p>There are plenty of lessons for me in this exercise:</p>
<ul>
<li><p>Haskell is full of interesting generalization, <code class="prettyprint"><|></code> is one of them and worth spotting in production code</p></li>
<li><p>Haskell libraries are not necessarily complete and can probably benefit from contributions</p></li>
<li><p>The modularity story of Haskell is not as obvious as what you might think. I had to think hard to find a satisfying solution and one huge problem is still unsolved. How can I test my full application code with a dummy authentication, which would always succeed? This will be the subject of a following post!</p></li>
</ul>
<p/>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5336273.post-15847948304552885172017-09-10T07:16:00.000+09:002022-09-17T03:22:16.552+09:00ICFP 2017<p>I already blogged last year on <a href="http://icfp17.sigplan.org">ICFP</a>, the awesome conference on Functional Programming, to share a <a href="http://etorreborre.blogspot.nl/2016/09/a-neat-trick-from-icfp-2016.html">neat trick</a>.</p>
<p>This year I would like to do a small brain dump on some of the sessions which I really liked.</p>
<h3 id="workshop-on-higher-programming-with-effects">Workshop on higher-programming with effects</h3>
<h4 id="programming-a-web-server-with-effects">Programming a web server with effects</h4>
<p>This is one of several talks showing the benefits of programming with a language with natively supports the description of <a href="https://github.com/koka-lang/koka">effects in the type system</a>: Koka. Not only Daan Leijen is a great presenter whose enthusiasm is contagious but with Koka it feels entirely natural to write <a href="https://pdfs.semanticscholar.org/3e95/8f0a945bd82a60163a8eecc54e6b5a3aab05.pdf">asynchronous code with interleaving, cancellation, resource-cleanup and so on</a>.</p>
<p>The also hints at something which got me really interested, the <code class="prettyprint">yield</code> operator to create iterators can be seen as an effect. Coupled to the async support above this makes me think that a whole streaming library could be created as a set of effects!</p>
<h3 id="workshop-on-type-driven-development">Workshop on type-driven development</h3>
<h4 id="type-directed-diffing-of-structured-data">Type-directed diffing of structured data</h4>
<p>A fantastic idea: can be do better than line-diffing to diff code and do AST diffs? The presenter introduced a language for representing AST patches (not as easy as it seems) then different possibilities for algorithms doing the diff (clearly a hard problem if you want to find the minimum diff).</p>
<h3 id="icfp">ICFP</h3>
<h4 id="super-8-languages-for-making-movies">Super 8 languages for making movies</h4>
<p>I’m not a big fan of dynamic languages but the power of <a href="https://racket-lang.org/">Racket</a> is impressive. The presenter shows how she created a “Tower of DSLs” to create a video-editing tool. Complete with parser, GUI, documentation, integration with video libraries in a lot less lines of code than anything else I know.</p>
<h4 id="faster-coroutine-pipelines">Faster coroutine pipelines</h4>
<p><a href="https://themonadreader.files.wordpress.com/2011/10/issue19.pdf">Coroutine pipelines</a> are <em>the</em> functional way to create streaming libraries. However they can be made faster. How? But using an encoding based on continuations rather than the “direct” encoding. This trick seems to be pervasive. For example, for algebraic effects, <a href="https://github.com/b-studios/scala-effekt">ScalaEffekt</a> uses such an encoding and gets much better performances than <a href="https://github.com/atnos-org/eff">eff</a>.</p>
<h4 id="a-unified-approach-for-solving-seven-programming-problems">A unified approach for solving seven programming problems</h4>
<p>Racket is at it again but with a secret weapon, <a href="http://minikanren.org">miniKanren</a>. If you view a computation as a “logical” relationship between a program and some outputs, it is very reasonable to ask your solver to find the programs which give you the right outputs. And find, for example, <a href="https://en.wikipedia.org/wiki/Quine_(computing)">Quines</a> which are programs which output themselves!</p>
<h4 id="generic-functional-parallel-algorithms-parallel-scan">Generic functional parallel algorithms: parallel scan</h4>
<p>Scans, computing the prefix sums of a list of ints, are an important ingredient in some algorithms, like regular expression search. If they can be performed in parallel it’s even better! This talk shows that a scan algorithm can be defined for generic datastructures, just defined by sums, products and functor composition. Then by adjusting the decomposition you get different amounts of parallelism and work to compose results.</p>
<h4 id="effect-driven-quickchecking-of-compilers">Effect-driven quickchecking of compilers</h4>
<p>Very entertaining presentation showing that it is entirely possible to generate well-type terms to test a compiler.</p>
<h4 id="compiling-to-categories">Compiling to categories</h4>
<p>My favourite talk this year. What if you consider <code class="prettyprint">lambda</code> and <code class="prettyprint">apply</code> in the lambda calculus as 2 operations which could be interpreted differently? Conal Elliott developed a GHC plugin to transform Haskell expressions into objects in a category where function composition corresponds to the composition of morphisms in a category. What does that give us? Instead of applying functions and getting a result we can generate circuits computing them, differentiate them or do interval computations where the results are intervals instead of just being values.</p>
<h4 id="staged-generic-programming">Staged Generic Programming</h4>
<p>Generic programming is cool. Write a few lines of code and let the compiler figure the right data structure and / or program. Unfortunately the performances might be very bad compared to hand-written code. With staging we can let the compiler be smarter and generate “unrolled” code which will perform as well as the hand-written one.</p>
<p>I was also glad to get to talk to Jeremy Yallop, the presenter, who is a very nice guy, now working at Docker on unikernels, something to keep an eye on!</p>
<h4 id="whip-higher-order-contracts-for-modern-services">Whip: higher-order contracts for modern services</h4>
<p>This is looks more like a CUFP talk, with real applications in the industry :-). The idea is to install some “decorators” on webservices to check where contracts are being broken. They have a <a href="http://whip.services/">nice website</a> and I am tempted to try this at work.</p>
<h3 id="haskell-symposium">Haskell Symposium</h3>
<h4 id="algebraic-graphs-with-class">Algebraic graphs with Class</h4>
<p>A simple but very powerful to describe graphs “algebraically”. Some atoms: the empty graph, the singleton graph and 2 operations: <code class="prettyprint">overlay</code> (sort of addition) and <code class="prettyprint">connect</code> (sort of multiplication). Based on that you can represent any graph and encode any algorithm against that representation, which can then optimised for different purposes.</p>
<h4 id="quickspec-speculate">Quickspec / Speculate</h4>
<p>Those 2 tools will generate the equations that you API is supposed to satisfy. For example, given <code class="prettyprint">[]</code>, <code class="prettyprint">++</code> and <code class="prettyprint">reverse</code> they should be able to generate <code class="prettyprint">reverse ([] ++ xs) == reverse xs</code>. But even more, they can now generate implications and inequalities!</p>
<h4 id="composable-networks-and-remote-monads">Composable networks and remote monads</h4>
<p>The remote monad explores different ways of “grouping” operations when addressing remote services. After tried to do something similar for <code class="prettyprint">eff</code> I appreciate a lot more the subject :-).</p>
<h4 id="testing-with-crowbar">Testing with crowbar</h4>
<p>Impressive example about finding bugs in a unicode library in 10 minutes which QuickCheck can not find in 1 week. The secret? Fuzzing and code coverage. Generate random sequence, turn them to test cases and mutate the sequences until you find some “interesting” part of the code. Those are likely to contain bugs.</p>
<h3 id="ml-family-workshop">ML Family workshop</h3>
<h4 id="state-machines-all-the-way-down">State machines all the way down</h4>
<p>My periodic reminder that I need to find some time to use Idris. Even more so after Edwin Brady said that the state machine programming with the type system tracking before/after state subsumes the effects library! He also told me that he was implementing the Idris compiler in Idris and this was a good example for him about “when not to go too far with types”.</p>
<h4 id="effects-without-monads-non-determinism">Effects without monads: non determinism</h4>
<p>Oleg Kiselyov is back, implementing non-determinism with TTFI (<a href="http://okmij.org/ftp/tagless-final/course/lecture.pdf">typed tagless final interpreters</a>). This example is particularly interesting because he shows that monads are not necessary to interpret this mini-language. Not only that but that but they cannot even be used if your translation is about generating code. Because you cannot write code which will write code!</p>
<h4 id="bioinformatics-the-typed-tagless-final-way">Bioinformatics: the typed tagless final way</h4>
<p>A concrete example of people using TTFI in real life for large and extensible DSLs.</p>
<h3 id="cufp">CUFP</h3>
<h4 id="bonsai-dsl-for-serverless-decisioning">Bonsai: DSL for serverless decisioning</h4>
<p>This DSL is pretty simple, it is just decision trees (<code class="prettyprint">if / then / else</code>) but compiled for maximum throughput for the ad industry.</p>
<h4 id="gens-n-roses-appetite-for-reduction">Gens N’ Roses: appetite for reduction</h4>
<p>Jacob Stanley talk on <a href="https://github.com/hedgehogqa/haskell-hedgehog">Hedgehog</a> where shrinks come for free and failures are beautiful. A “must-use” for Haskell users.</p>
<h3 id="haskell-implementors-workshop">Haskell implementors workshop</h3>
<h4 id="syntactic-musings">Syntactic musings</h4>
<p>This was a lightning talk but it made me laugh. Those are proposals to improve haskell syntax with almost zero possibilities to be adopted :-):</p>
<p>How to write <code class="prettyprint">bracket</code> without parenthesis:</p>
<p>Instead of</p>
<pre class="prettyprint"><code class="prettyprint">bracket
(acquire some resources)
(do your stuff which might be
pretty long)
(and clean up)</code></pre>
<p>Go (yes, bullet points!)</p>
<pre class="prettyprint"><code class="prettyprint">bracket
. acquire some resources
. do your stuff which might be
pretty long
. and clean up</code></pre>
<p>Then set a value for a bunch of functions</p>
<pre class="prettyprint"><code class="prettyprint">context fixed (value) where
add a = value + a
mul a = value * a</code></pre>
<p>to avoid the repetition of <code class="prettyprint">value</code> and to avoid accidentally messing with it. This screams like “module/functor envy” to me.</p>
<p>The last one is crazy but brilliant. Get away with grouping with parentheses by removing whitespace where you want to indicate higher precedence</p>
<pre class="prettyprint"><code class="prettyprint">a+b * c+d</code></pre>
<p>Why not :-)?</p>
<h4 id="an-experiment-in-fragment-based-code-distribution">An experiment in fragment-based code distribution</h4>
<p>I thought this idea could really not roll but now I am intrigued. Following Joe Armstrong’s <a href="http://erlang.org/pipermail/erlang-questions/2011-May/058768.html">“Why do we need modules at all?”</a> Philip Schuster implemented a build system for haskell where each program is decomposed into “slices” containing just one function and its dependencies. Everything identified by numbers. Advantages: just compile and package what you need, never care about version numbers. But the job of curating all of that to get a coherent snapshot, oh boy! As a data point, running anything on Scotty requires 12.000 slices.</p>
<h3 id="conclusion">Conclusion</h3>
<p>A very intense week as in the previous editions I have been to. I don’t know yet what will stick with me and influence my future work but there’s plenty to think about. I also enjoyed very much the discussions I had with Jonathan and Philipp, the authors of <a href="https://github.com/b-studios/scala-effekt">scala-effekt</a>.</p>
<br/><br/><br/>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5336273.post-7000747298669337692017-08-20T07:29:00.000+09:002017-10-03T01:26:18.457+09:00Specs2 4.x<p>It has been now almost 3 years since the last <a href="http://etorreborre.blogspot.de/2014/12/specs2-three.html">major version of specs2</a>. But why would a new major version of a testing library be even needed? Because the world doesn’t stop revolving!</p>
<p>In particular Scala can now be <a href="https://github.com/scala-js/scala-js">compiled to JavaScript</a> and even <a href="https://github.com/scala-native/scala-native">natively</a>. It is then only natural that <s2>specs2</s2> users want to use specs2 on their Scala.js projects:</p>
<div class="figure">
<img src="https://user-images.githubusercontent.com/10988/29239242-503023ea-7f4a-11e7-8050-b0de4e3558ad.png" alt="" />
</div>
<p>How hard can this be? First of all <s2>specs2 3.x</s2> relies on <a href="https://github.com/scalaz/scalaz-stream">scalaz-stream</a> which doesn’t have a Scala.js build, and more largely on <a href="https://github.com/scalaz/scalaz">scalaz</a> (which has a <a href="https://github.com/scalaz/scalaz/commit/d63dcaeeb444ef4e2609df5b181326a12d5d5fcb">Scala.js build</a>).</p>
<p>This is where an <a href="https://hbr.org/2012/06/turning-a-problem-into-an-oppo.html">issue becomes an opportunity</a>! If I need to find an alternative to <code class="prettyprint">scalaz-stream</code> why not remove <code class="prettyprint">scalaz</code> altogether? Indeed it is very inconvenient for a test library to rely on third-party libraries which might conflict with users libraries. Because of this conflict I have to publish several versions of <s2>specs2</s2> for different versions of <code class="prettyprint">scalaz</code> (<code class="prettyprint">7.0.x</code>, <code class="prettyprint">7.1.x</code>, <code class="prettyprint">7.2.x</code> and so on). This is pretty time-consuming, not only because of having to wait for more than 30 minutes for all the jars to be published but also because I have to adapt to some small differences between each <code class="prettyprint">scalaz</code> version.</p>
<p>Removing <code class="prettyprint">scalaz</code> and <code class="prettyprint">scalaz-stream</code> is easier said than done:</p>
<ol style="list-style-type: decimal">
<li><p><s2>specs2</s2> relies heavily on functional programming so I need the usual FP suspects: <code class="prettyprint">Functor</code>, <code class="prettyprint">Applicative</code>, <code class="prettyprint">Monad</code> and so on</p></li>
<li><p><s2>specs2</s2> interacts with the file system, writes out to the console, executes specifications concurrently so it needs a way to control <em>effects</em></p></li>
<li><p><s2>specs2</s2> is structured around the idea of a “stream of specification fragments” so it needs a streaming abstraction</p></li>
</ol>
<p>All of this before I can even consider moving to a Scala.js build!</p>
<h3 id="specs2-own-fp-streaming-infrastructure"><s2>specs2</s2> own FP / streaming infrastructure</h3>
<p>Fortunately I have developed on the side many of the pieces needed above:</p>
<ol style="list-style-type: decimal">
<li><p><a href="https://github.com/atnos-org/eff"><code class="prettyprint">eff</code></a> is a library to handle effects, including concurrency and resources management</p></li>
<li><p><a href="https://github.com/atnos-org/producer"><code class="prettyprint">producer</code></a> is a simple streaming library designed to work with any <code class="prettyprint">Monad</code>, including the <code class="prettyprint">Eff</code> monad</p></li>
<li><p><a href="https://github.com/atnos-org/producer"><code class="prettyprint">origami</code></a> implements a notion of composable “folds” which can be applied to any stream, which is a pattern I (re)-discovered on earlier versions of <s2>specs2</s2>: reporting the results of an executed specification is just “folding” the specification, a bit like using <code class="prettyprint">foldLeft</code> on a <code class="prettyprint">List</code></p></li>
</ol>
<p>A copy of those libraries is now included in the <code class="prettyprint">specs2-common</code> module.</p>
<p>The only thing missing was the FP abstractions. It turns out that it is not too hard to implement the whole menagerie:</p>
<ul>
<li><code class="prettyprint">Functor</code>, <code class="prettyprint">Applicative</code>, <code class="prettyprint">Monad</code>, <code class="prettyprint">Traverse</code>, <code class="prettyprint">Foldable</code>, <code class="prettyprint">Monoid</code>. Those typeclasses are very tied together</li>
<li><code class="prettyprint">Tree</code> and <code class="prettyprint">TreeLoc</code> for manipulating trees
<p/></li>
</ul>
<p>I am greatly indebted to the Scalaz and cats projects for this code, which I reduced to the only parts I needed for <s2>specs2</s2>. Those abstractions and datatypes are now available in the <code class="prettyprint">specs2-fp</code> module (not intended for external use).</p>
<h3 id="execution-model">Execution model</h3>
<p>At the heart of <s2>specs2</s2> is a <code class="prettyprint">Fragment</code>, which is a <code class="prettyprint">Description</code> and an <code class="prettyprint">Execution</code> yielding a <code class="prettyprint">Result</code>. The <code class="prettyprint">Description</code> holds an informal description of the specified system and the <code class="prettyprint">Execution</code> runs some Scala code to verify that behaviour.</p>
<p>Using Scala.js imposes a big change is <s2>specs2</s2>. We can never <code class="prettyprint">await</code> to the get the result of executing a <code class="prettyprint">Fragment</code>. Also since <code class="prettyprint">scalaz.concurrent.Task</code> is out of the equation, in <s2>specs2 4.x</s2> an <code class="prettyprint">Execution</code> is implemented using a <code class="prettyprint">scala.concurrent.Future</code>, like this:</p>
<pre><code class="prettyprint">case class Execution(
run: Option[Env => Future[() => Result]],
executing: Option[Throwable Either Future[Result]])</code></pre>
<p>This is a lot more complicated than just <code class="prettyprint">Future[Result]</code>. Why is that?</p>
<p><code class="prettyprint">run</code> takes an action to execute <code class="prettyprint">() => Result</code>, possibly concurrently <code class="prettyprint">Future[() => Result]</code> and which might depend on the environment <code class="prettyprint">Env => Future[() => Result]</code>. You might wonder why we don’t simply use <code class="prettyprint">Env => Future[Result]</code>. This is to allow the user to add some behaviour <em>around</em> the result, for example with the <code class="prettyprint">AroundEach</code> trait and the <code class="prettyprint">def around[R : AsResult](r: =>R): Result</code> method.</p>
<p>We also want to “stream” results, which are executed concurrently and display them as soon as they are available. This is done by having <code class="prettyprint">executing</code> holding either a <code class="prettyprint">Throwable</code> if we could not even start the execution or the <code class="prettyprint">Result</code> being currently computed.</p>
<p>Note that we can <strong><em>never</em></strong> call <code class="prettyprint">await</code> in a Scala.js program so when we are <em>executing</em> a <code class="prettyprint">Fragment</code> the best we can get is the equivalent of <code class="prettyprint">Future[Result]</code> (in reality an <code class="prettyprint">Action[Result]</code> which is a type possibly having more effects than just concurrency).</p>
<p>This triggers major changes in the implementation of <s2>specs2</s2> and there are rippling effects up to the <code class="prettyprint">Runners</code> used to run a specification. However most of the users of <s2>specs2</s2> should not be concerned with these changes as the “user” DSL for <s2>specs2</s2> isn’t modified. However if you are on the JVM and really need to <code class="prettyprint">await</code> to get the result of a fragment execution you can do so by using one of the methods in the <code class="prettyprint">org.specs2.control.ExecuteActions</code> object. For example <code class="prettyprint">result.runAction(executionEnv)</code> will return an <code class="prettyprint">Error Either Result</code>. Running the same method on Scala.js will throw an exception.</p>
<h3 id="scala.js-modules">Scala.js modules</h3>
<p>“<s2>specs2</s2> runs on Scala.js” is ever going to be a partial truth. There are many limitations related to Scala.js and this means that, while <code class="prettyprint">specs2-core</code> can be used on any Scala.js project, there are some <s2>specs2</s2> features and modules which cannot be used with Scala.js. Here is a list (up to my current knowledge):</p>
<ol style="list-style-type: decimal">
<li><p>the <code class="prettyprint">isolate</code> argument to run each example in its own copy of the specification cannot be used (because this uses reflection to instantiate classes)</p></li>
<li><p>no custom <code class="prettyprint">Selector</code>/<code class="prettyprint">Executor</code>/<code class="prettyprint">Printer</code>/<code class="prettyprint">Notifier</code> can be passed from the command line (for the same reason)</p></li>
<li><p>the <code class="prettyprint">RandomSequentialExecution</code> trait cannot be used because the implementation uses a <code class="prettyprint">TrieMap</code> not available on Scala.js. This could be fixed in the future.</p></li>
<li><p>running only examples which previously failed with <code class="prettyprint">was x</code> on the command-line is not possible because this accesses file system</p></li>
<li><p>all the modules where <code class="prettyprint">scala-xml</code> is used don’t have a Scala.js version because there’s no <code class="prettyprint">scala-xml</code> for Scala.js: <code class="prettyprint">specs2-form</code>, <code class="prettyprint">specs2-html</code></p></li>
<li><p>modules accessing the file system cannot be used: <code class="prettyprint">specs2-html</code>, <code class="prettyprint">specs2-markdown</code></p></li>
<li><p>the <code class="prettyprint">specs2-gwt</code> module defines fragments based on the result of previous fragments, so it currently needs to <code class="prettyprint">await</code>. This could be fixed potentially with a smart enough implementation in the future.</p></li>
</ol>
<h3 id="breaking-changes">Breaking changes</h3>
<p>A major version number is also the opportunity to fix some of the API flaws. I tried to limit the changes in the low-level API to the strict necessary but I also wanted to clean-up one important aspect of the high-level API.</p>
<p>When <a href="http://etorreborre.blogspot.de/2014/12/specs2-three.html"><s2>specs2 3.x</s2></a> came out I envisioned that there could be ways to create a specification from arguments passed on the command line, or from an <code class="prettyprint">ExecutionContext</code> (to execute futures). With the addition of some traits like <code class="prettyprint">CommandLineArguments</code> you could declare:</p>
<pre><code class="prettyprint">class MySpec extends Specification with CommandLineArguments {
def is(args: CommandLine) = s2"""
try this $ok
"""
}</code></pre>
<p>or</p>
<pre><code class="prettyprint">class MySpec extends Specification { def is = s2"""
try this $ex1
"""
def ex1 = { implicit ec: ExecutionContext => Future(1) must be_==(1).await }
}</code></pre>
<p>I later realized that it would be a lot more convenient to directly “inject” the command line arguments or the execution context in the specification like so:</p>
<pre><code class="prettyprint">class MySpec(args: CommandLine) extends Specification { def is = s2"""
try this $ex1
"""
def ex1 = { (1 + 1) must be_==(2).when(args.contains("universe-is-sane"))}
}</code></pre>
<pre><code class="prettyprint">class MySpec(implicit ec: ExecutionContext) extends Specification { def is = s2"""
try this $ex1
"""
def ex1 = { Future(1) must be_==(1).await }
}</code></pre>
<p>Now there were 2 ways to do the same thing, the second one being a lot easier than the first one. In <s2>specs2 4.x</s2> the traits supporting the first behavior <code class="prettyprint">org.specs2.specification.Environment</code>, <code class="prettyprint">org.specs2.specification.CommandLineArguments</code>, <code class="prettyprint">org.specs2.specification.ExecutionEnvironment</code> have been removed. Same thing for the implicit conversions for functions like <code class="prettyprint">implicit ee: ExecutionEnv => Result</code> to create examples.</p>
<p>This will demand some migration efforts for some users but the result will be more concise specifications.</p>
<p>Another word on injected execution environments. From specs2 <code class="prettyprint">3.8.9</code>, 2 distinct execution environments are being used in <s2>specs2</s2>. One for the execution of the specification itself, the other one for the execution of the user examples (to map futures or await for results for example). This is to make sure that some incorrect user code doesn’t prevent the full specification from being executed.</p>
<p>With <s2>specs2 4.x</s2> we can go one step further and allow each specification to get its own dedicated execution environment and make sure that one specification execution doesn’t impact another one:</p>
<pre><code class="prettyprint">import org.specs2.specification.core.{Env, OwnExecutionEnv}
class MySpec extends(val env: Env) extends Specification with OwnExecutionEnv { def is = s2"""
...
"""
}</code></pre>
<p>The injected <code class="prettyprint">env</code> is used to pass command-line argument to a copy of the user execution env, implicitly available as a member of the <code class="prettyprint">OwnExecutionEnv</code> trait. That trait also takes care of shutting down the execution environment once the specification has finished executing so as not to lose resources.</p>
<h3 id="feedback-is-essential">Feedback is essential</h3>
<p>In conclusion, <s2>specs2 4.x</s2> should be a smaller gap from <s2>specs2 3.x</s2> than <s2>specs2 2.x</s2> to <s2>specs2 3.x</s2>. With a lot more to offer!</p>
<ol style="list-style-type: decimal">
<li><p>no more dependencies for <code class="prettyprint">specs2-core</code></p></li>
<li><p>a Scala.js version for <code class="prettyprint">specs2-core</code></p></li>
<li><p>better control on execution environments</p></li>
</ol>
<p/>
<p>The release notes can be found <a href="https://github.com/etorreborre/specs2/releases/tag/SPECS2-4.0.0">here</a>.</p>
<p>Please, please, please ask on <a href="https://gitter.im/etorreborre/specs2">gitter</a>, on the <a href="https://groups.google.com/forum/#!forum/specs2-users">mailing-list</a>, on <a href="https://twitter.com/specs2org">twitter</a> if you have any issue. And a big thank you for reporting any mistake, big or small, that I may have made on this release!<br/><br/><br/>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5336273.post-36736026451488845872016-09-21T18:25:00.003+09:002022-09-17T03:22:31.683+09:00A neat trick from ICFP 2016<p>ICFP is an awesome conference where I can’t pretend I understand every presentation I’m attending but there is definitely lots of food for thought and some gems which makes it immensely worthwhile.</p>
<p>The gem I want to share here is a paper called <a href="http://dl.acm.org/citation.cfm?id=2951949">“All sorts of permutations”</a>.</p>
<p>This paper shows that you can use the <code class="prettyprint">filterM</code> function with a predicate like <code class="prettyprint">a -> [True, False]</code> and generate all the sublists of a list.</p>
<p>Similarly, you can evolve the <code class="prettyprint">sort</code> function into <code class="prettyprint">sortM</code>, taking a monadic comparison function: <code class="prettyprint">a -> a -> m Bool</code>. If the comparison function is <code class="prettyprint">a -> a -> [True, False]</code> the <code class="prettyprint">sortM</code> function will generate the <em>permutations</em> of a list.</p>
<p>The paper goes on exploring if <code class="prettyprint">sortM</code> generates <em>all</em> the permutations of a list or <em>exactly</em> the permutations of a list depending on the implementation of the <code class="prettyprint">sortM</code> function (insertion sort, merge sort, quicksort,…) and tries to set various restrictions on the comparison function to get there.</p>
<p>At the end of the talk <a href="http://homepages.inf.ed.ac.uk/wadler">Philip Wadler</a> asked the following question (paraphrased):</p>
<blockquote>
<p>This is a neat trick to generate permutations but why should I teach it to my students, considering that there are more ‘classical’ implementations for getting the permutations of a list?"</p>
</blockquote>
<p>I think that it is worth showing students because this <code class="prettyprint">sortM</code> function gives us a very neat way of writing a QuickCheck generator for distinct elements!</p>
<p>It is indeed not uncommon to have to generate lists of distinct elements for testing applications. For example if you want to test a function scheduling different tasks and these tasks are supposed to be unique, but arriving in any order.</p>
<p>There is unfortunately no <code class="prettyprint">permutations :: [a] -> Gen [a]</code> combinator in QuickCheck. So how can you produce random Lists with distinct elements?</p>
<p>One way is to take an existing list of distinct elements and shuffle it to get a random permutation of those elements. This is exactly what <code class="prettyprint">sortM</code> gives us, we just need to use the right monad, which happens to be <code class="prettyprint">Gen</code> instead of <code class="prettyprint">[]</code>!</p>
<p>Let’s take an example. Say you want to generate all the permutations of <code class="prettyprint">["Homer", "Marge", "Bart", "Lisa"]</code></p>
<div class="sourceCode"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span class="kw">import </span><span class="dt">Test.QuickCheck</span>
<span class="co">-- An implementation of sortM</span>
<span class="ot">sortM ::</span> <span class="dt">Monad</span> m <span class="ot">=></span> (a <span class="ot">-></span> a <span class="ot">-></span> m <span class="dt">Bool</span>) <span class="ot">-></span> [a] <span class="ot">-></span> m [a]
sortM _ [] <span class="fu">=</span> return []
sortM cmp (x <span class="fu">:</span> xs) <span class="fu">=</span> <span class="kw">do</span>
ys <span class="ot"><-</span> sortM cmp xs
insertM cmp x ys
<span class="ot">insertM ::</span> <span class="dt">Monad</span> m <span class="ot">=></span> (a <span class="ot">-></span> a <span class="ot">-></span> m <span class="dt">Bool</span>) <span class="ot">-></span> a <span class="ot">-></span> [a] <span class="ot">-></span> m [a]
insertM _ x [] <span class="fu">=</span> return [x]
insertM cmp x yys<span class="fu">@</span>(y <span class="fu">:</span> ys) <span class="fu">=</span> <span class="kw">do</span>
b <span class="ot"><-</span> cmp x y
<span class="kw">if</span> b <span class="kw">then</span> return (x <span class="fu">:</span> yys)
<span class="kw">else</span> fmap (y<span class="fu">:</span>) (insertM cmp x ys)
<span class="co">-- A generator for lists of distincts "Simpsons"</span>
<span class="kw">type</span> <span class="dt">Simpson</span> <span class="fu">=</span> <span class="dt">String</span>
simpsons <span class="fu">=</span> [<span class="st">"Homer"</span>, <span class="st">"Marge"</span>, <span class="st">"Bart"</span>, <span class="st">"Lisa"</span>]
<span class="ot">distinctSimpsons ::</span> <span class="dt">Gen</span> [<span class="dt">Simpson</span>]
distinctSimpsons <span class="fu">=</span> sortM (\_ _ <span class="ot">-></span> choose(<span class="dt">True</span>, <span class="dt">False</span>)) simpsons</code></pre></div>
<p>Let’s test it:</p>
<pre><code class="prettyprint">λ> sample distinctSimpsons
["Homer","Lisa","Marge","Bart"]
["Homer","Marge","Bart","Lisa"]
["Homer","Marge","Lisa","Bart"]
["Homer","Marge","Bart","Lisa"]
["Bart","Homer","Marge","Lisa"]
["Marge","Homer","Bart","Lisa"]
["Marge","Homer","Lisa","Bart"]
["Marge","Homer","Lisa","Bart"]
["Marge","Bart","Lisa","Homer"]
["Homer","Marge","Bart","Lisa"]
["Bart","Homer","Lisa","Marge"]</code></pre>
<p>I think this answers Philip’s question, this is a nice application of the <code class="prettyprint">sortM</code> “trick” which can be taught to students learning functional programming.</p>
<p>And what looks <em>really</em> promising is that this idea extends to the generation of other data structures. For example we can generate the partitions of a given set with <code class="prettyprint">groupByM</code>:</p>
<pre><code class="prettyprint">partitionM :: Monad m => (a -> m Bool) -> [a] -> m ([a], [a])
partitionM _ [] = pure ([], [])
partitionM p (x : rest) = do
b <- p x
(xs, ys) <- partitionM p rest
if b then pure (x:xs, ys)
else pure (xs, x:ys)
groupByM :: Monad m => (a -> a -> m Bool) -> [a] -> m [[a]]
groupByM _ [] = pure []
groupByM cmp (x : rest) = do
(xs, ys) <- partitionM (cmp x) rest
ps <- groupByM cmp ys
pure $ (x:xs):ps</code></pre>
<p>Then we can get all the set partitions of <code class="prettyprint">["Homer","Lisa","Marge","Bart"]</code></p>
<pre><code class="prettyprint">λ> pl $ groupByM (\_ _ -> [True, False]) simpsons
[["Homer","Marge","Bart","Lisa"]]
[["Homer","Marge","Bart"],["Lisa"]]
[["Homer","Marge","Lisa"],["Bart"]]
[["Homer","Marge"],["Bart","Lisa"]]
[["Homer","Marge"],["Bart"],["Lisa"]]
[["Homer","Bart","Lisa"],["Marge"]]
[["Homer","Bart"],["Marge","Lisa"]]
[["Homer","Bart"],["Marge"],["Lisa"]]
[["Homer","Lisa"],["Marge","Bart"]]
[["Homer","Lisa"],["Marge"],["Bart"]]
[["Homer"],["Marge","Bart","Lisa"]]
[["Homer"],["Marge","Bart"],["Lisa"]]
[["Homer"],["Marge","Lisa"],["Bart"]]
[["Homer"],["Marge"],["Bart","Lisa"]]
[["Homer"],["Marge"],["Bart"],["Lisa"]]</code></pre>
<p>or get some random partitions of that set:</p>
<pre><code class="prettyprint">λ> sample $ groupByM (\_ _ -> choose(True, False)) simpsons
[["Homer","Marge","Bart"],["Lisa"]]
[["Homer","Lisa"],["Marge","Bart"]]
[["Homer","Bart"],["Marge"],["Lisa"]]
[["Homer","Marge","Bart","Lisa"]]
[["Homer","Marge","Bart","Lisa"]]
[["Homer","Lisa"],["Marge"],["Bart"]]
[["Homer","Marge"],["Bart"],["Lisa"]]
[["Homer"],["Marge"],["Bart","Lisa"]]
[["Homer","Marge","Bart","Lisa"]]
[["Homer","Marge","Bart","Lisa"]]
[["Homer","Marge","Lisa"],["Bart"]]</code></pre>
<p>Some partitions can be drawn twice but the generation should be uniform over the set of all possible partitions (it would be better to not take my word for it and prove it though :-)).</p>
<p>I suspect that this is more than a useful trick and that this deserves a more systematic treatment to understand all the data structures which we can generate by using the appropriate functions taking predicates or relations as parameters. But for now it is already useful as it is! <br/><br/><br/>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5336273.post-41475500627447752262015-07-29T13:31:00.001+09:002015-07-29T13:33:22.371+09:00Precise data typesHere is a new post about creating data types to encode different cases in applicationspp0pp, hosted on the Ambiata lab website: <a href="http://lab.ambiata.com/posts/precise-data-types/index.html">Precise data types</a>.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5336273.post-48485894145656972582014-12-24T11:09:00.000+09:002015-02-21T13:56:54.799+09:00specs2-three<p>This post presents the next major version of specs2, <s2>specs2 3.0</s2>:</p>
<ul>
<li>what are the motivations for a major version?</li>
<li>what are the main benefits and changes?</li>
<li>when will it be available?</li>
</ul>
<h2 id="the-motivations">The motivations</h2>
<p>I started working on this new version a bit more than one year ago now. I had lots of different reasons for giving <s2>specs2</s2> a good face-lift.</p>
<h3 id="the-open-source-reason">The Open Source reason</h3>
<p><s2>specs2</s2> has largely been the effort of a single person. This has probably some advantages, like the possibility to maintain some kind of vision for the project, but also lots of drawbacks. Quality is one of them.</p>
<p>As a programmer I have all sorts of shortcomings. I have always been amazed by other people taking a look at my code and spotting obvious deficiencies either big or small (for example <a href="http://www.twitter.com/jedws">@jedws</a> introduced named threads to me, much easier for debugging!). I want to maximize the possibility that other people will jump in to fix and extend the library as necessary (and be able to go on holidays for 3 weeks without a laptop :-)). Improving the code base can only help other people review my implementation.</p>
<h3 id="the-design-reason">The Design reason</h3>
<p>In <s2>specs2</s2> I’ve had this vision of a flow of “specification fragments” that would get created, executed, and then reported, possibly with different reporters for various output formats. This is not as easy at it seems:</p>
<ul>
<li>I want fragments to be executed concurrently while being printed in sequence</li>
<li>the fragments should also be displayed as soon as executed</li>
<li>I want to be able to recreate a view of the sequence of fragments as a tree to be displayed in IDEs like Eclipse and Intellij</li>
</ul>
<p>This is all done and working in <s2>specs2 < 3.0</s2> but in a very clumsy way, subverting Scalaz Reducers to maintain state and try to compose reporters.</p>
<p>One of these reporters is a HTML reporter and I’ve always wanted to improve it. This was not something I was eager to change given the situation 1 year ago. Luckily <a href="http://github.com/scalaz/scalaz-stream"><code class="prettyprint">scalaz-stream</code></a> version 0.2 came out in December 2013 and allowed me to try out new ideas.</p>
<h3 id="the-functional-programming-reason">The Functional Programming reason</h3>
<p>The major difference between <a href="https://code.google.com/p/specs">specs</a> and <s2>specs2</s2> was the use of immutable data structures and the avoidance of exceptions for control flow. Yet there was still lots of side-effects!</p>
<p>I hadn’t fully grasped how to use the <code class="prettyprint">IO</code> monad to structure my program. Fortunately I happen to work with the terrific <a href="http://www.twitter.com/markhibberd">@markhibberd</a> and he showed me how to use a proper monad stack to track <code class="prettyprint">IO</code> effects but also how to thread in configuration data and track errors.</p>
<h2 id="the-main-benefits-and-changes">The main benefits and changes</h2>
<p>First of all, a happy maintainer! That goes without saying but my ability to fix bugs and add features will be improved a lot if I can better reason about the code :-).</p>
<p>Now for users…</p>
<p>For casual users there should be no changes! If you just use <code class="prettyprint">org.specs2.Specification</code> or <code class="prettyprint">org.specs2.mutable.Specification</code> with no other traits, you should not see any change (except in the User Guide, see below). For “advanced” users there are new benefits and API changes (in no particular order).</p>
<h3 id="refactored-user-guide">Refactored user guide</h3>
<p>The existing User Guide has been divided into a lot more pages (around 60) and follows a pedagogical progression:</p>
<ul>
<li><p>a Quick Start presenting a simple specification (and the mandatory link to the installation page)</p></li>
<li><p>some links from the Quick Start to the most common concepts: what is the structure of a Specification? Which matchers are available? How to run a Specification?</p></li>
<li><p>then, on each other page there is a presentation focusing on one topic plus additional links: <code class="prettyprint">"Now learn how to..."</code> (what is the next thing you will probably need?) and <code class="prettyprint">"If you want to know more"</code> (what is some more advanced topic that is related to this one?)</p></li>
</ul>
<p>In addition to this refactoring there are some “tools” to help users find faster what they are looking for:</p>
<ul>
<li><p>a search box</p></li>
<li><p>reference pages to summarize in one place some topics (matchers and run arguments for example)</p></li>
<li><p>a <code class="prettyprint">Troubleshooting</code> page with the most common issues</p></li>
</ul>
<p>You can have a first look at it <a href="http://etorreborre.github.io/specs2/guide-specs2-three/org.specs2.guide.UserGuide.html">here</a>.</p>
<h3 id="generalized-reader-pattern">Generalized Reader pattern</h3>
<p>One consequence of the “functional” re-engineering is that the environment is now available at different levels. By “environment”, I mean the <code class="prettyprint">org.specs.specification.core.Env</code> class which gives you access to all the components necessary to execute a specification, among which:</p>
<ul>
<li>the command line arguments</li>
<li>the <code class="prettyprint">lineLogger</code> used to log results to the console (from Sbt)</li>
<li>the <code class="prettyprint">systemLogger</code> used to log issues when instantiating the Specification for example</li>
<li>the execution environment, containing a reference to the thread pool used to execute the examples</li>
<li>the <code class="prettyprint">statsRepository</code> to get and store execution statistics</li>
<li>the <code class="prettyprint">fileSystem</code> which mediates all interactions with the file system (to read and write files)</li>
</ul>
<p>I doubt that you will ever need all of this, but parts of the environment can be useful. For example, you can define the structure of your Specification based on command line arguments:</p>
<pre class="prettyprint"><code class="prettyprint">class MySpec extends Specification with CommandLineArguments { def is(args: CommandLine) = s2"""
Do something here with a command line parameter ${args.valueOr("parameter1", "not found")}
"""
}</code></pre>
<p>The <code class="prettyprint">CommandLineArguments</code> uses your definition of the <code class="prettyprint">def is(args: CommandLine): Fragments</code> method to build a more general method <code class="prettyprint">Env => Fragments</code> which is the internal representation of a Specification (fragments that depend on the environment). This means that now you don’t have to skip examples based on condition (<code class="prettyprint">isDatabaseAvailable</code> for example), you can simply remove them!</p>
<p>You can also use the environment, or part of it, to define examples:</p>
<pre class="prettyprint"><code class="prettyprint">class MySpec extends Specification { def is = s2"""
Here are some examples using the environment.
You can access
the full environment $e1
the command line arguments $e2
the execution context to create a Scala future $e3
the executor service to create a Scalaz future $e4
"""
def e1 = { env: Env =>
env.statisticsRepository.getStatistics(getClass.getName).runOption.flatten.foreach { stats =>
println("the previous results for this specification are "+stats)
}
ok
}
def e2 = { args: CommandLine =>
if (args.boolOr("doit", false)) success
else skipped
}
def e3 = { implicit executionContext: ExecutionContext =>
scala.concurrent.Future(1); ok
}
def e4 = { implicit executorService: ExecutorService =>
scalaz.concurrent.Future(1); ok
}
}</code></pre>
<h3 id="better-reporting-framework">Better reporting framework</h3>
<p>This paragraph is mostly relevant to people who want to extend <s2>specs2</s2> with additional outputs. The reporting framework has been refactored around 4 concepts:</p>
<p>A <code class="prettyprint">Runner</code> (for example the <code class="prettyprint">SbtRunner</code>)</p>
<ul>
<li>instantiates the specification class to execute</li>
<li>creates the execution environment (arguments, thread pool)</li>
<li>instantiates a <code class="prettyprint">Reporter</code></li>
<li>instantiates <code class="prettyprint">Printers</code> and starts the execution</li>
</ul>
<p>A <code class="prettyprint">Reporter</code></p>
<ul>
<li>reads the previous execution statistics if necessary</li>
<li>selects the fragments to execute</li>
<li>executes the specification fragments</li>
<li>calls the printers for printing out the results</li>
<li>saves the execution statistics</li>
</ul>
<p>A <code class="prettyprint">Printer</code></p>
<ul>
<li>prepares the environment for printing</li>
<li>uses a <code class="prettyprint">Fold</code> to print or to gather execution data. For example the <code class="prettyprint">TextPrinter</code> prints results to the console as soon as they are available and the <code class="prettyprint">HtmlPrinter</code></li>
</ul>
<p>A <code class="prettyprint">Fold</code></p>
<ul>
<li>has a <code class="prettyprint">Sink[Task, (T, S)]</code> (see scalaz-stream for the definition of a <code class="prettyprint">Sink</code>) to perform side-effects (like writing to a file)</li>
<li>has a <code class="prettyprint">fold: (T, S) => S</code> method to accumulate some state (to compute statistics for example, or create an index)</li>
<li>has an <code class="prettyprint">init: S</code> element to initialize the state</li>
<li>has a <code class="prettyprint">last(s: S): Task[Unit]</code> method to perform one last side-effect with the final state once all the fragments have been executed</li>
</ul>
<p>It is unlikely that you will create a new <code class="prettyprint">Runner</code> (except if you build an Eclipse plugin for example) but you can create custom reporters and printers by passing the <code class="prettyprint">reporter <classname></code> and <code class="prettyprint">printer <classname></code> options as arguments. Note also that <code class="prettyprint">Folds</code> are composable so if you need 2 outputs you can create a <code class="prettyprint">Printer</code> that will compose 2 folds into 1.</p>
<h3 id="the-html-printer">The Html printer</h3>
<h4 id="pandoc">Pandoc</h4>
<p>The Html printer has been reworked to use <a href="http://">Pandoc</a> as a templating system and Markdown engine. I decided this move to Pandoc for several reasons:</p>
<ul>
<li>Pandoc is one of the libraries that is officially endorsing the <a href="http://commonmark.org/">CommonMark</a> format</li>
<li>I’ve had less corner cases with rendering mixed html/markdown with Pandoc than previously</li>
<li>Pandoc opens the possibility to render other markup languages than CommonMark, <code class="prettyprint">LaTeX</code> for example</li>
</ul>
<p>However this comes with a huge drawback, you need to have Pandoc installed as a command line tool on your machine. If Pandoc is not installed, <s2>specs2</s2> will use a default template renderer, but won’t render CommonMark.</p>
<h4 id="templates">Templates</h4>
<p>I’ve extracted a <code class="prettyprint">specs2.html</code> template (and a corresponding <code class="prettyprint">specs2.css</code> stylesheet) and it is possible for you to substitute another template (with the <code class="prettyprint">html.template</code> option) if you want your html files to be displayed differently. This template is using the Pandoc template system so it is pretty primitive but should still cover most cases.</p>
<h3 id="better-api">Better API</h3>
<p>The <s2>specs2</s2> API has been split into a lot more traits to support various objectives:</p>
<ul>
<li><p>support the new execution model with <code class="prettyprint">scalaz-stream</code></p></li>
<li><p>make it possible to separate the DSL methods from the core ones (see Lightweight spec)</p></li>
<li><p>offer a better <code class="prettyprint">Fragment</code> API</p></li>
</ul>
<p>Let’s start with the heart of <s2>specs2</s2>, the <code class="prettyprint">Fragment</code>.</p>
<h4 id="fragmentfactory-methods"><code class="prettyprint">FragmentFactory</code> methods</h4>
<p>Advanced <s2>specs2</s2> users need to tweak the creation of fragments. For example, when using a <a href="http://etorreborre.github.io/specs2/guide-specs2-three/org.specs2.guide.SpecificationTemplate.html">“template specification”</a>:</p>
<pre class="prettyprint"><code class="prettyprint">abstract class DatabaseSpec extends Specification {
override def map(fs: => Fragments): Fragments =
step(startDb) ^ fs ^ step(closeDb)
}</code></pre>
<p>In the <code class="prettyprint">DatabaseSpec</code> you are using different methods to work with <code class="prettyprint">Fragments</code>. The <code class="prettyprint">^</code> method to append them, the <code class="prettyprint">step</code> method to create a <code class="prettyprint">Step</code> fragment. Those 2 methods are part of the <code class="prettyprint">Fragment</code> API. Here is a list of the main changes, compared to <s2>specs2 < 3.0</s2></p>
<ul>
<li><p>first of all there is only one <code class="prettyprint">Fragment</code> type (instead of <code class="prettyprint">Text</code>, <code class="prettyprint">Step</code>, <code class="prettyprint">Example</code>,…). This type contains a <code class="prettyprint">Description</code> and an <code class="prettyprint">Execution</code>. By combining different types of <code class="prettyprint">Description</code>s and <code class="prettyprint">Execution</code>s it is possible to recreate all the previous <s2>specs2 < 3.0</s2> types</p></li>
<li><p>however you don’t need to create a <code class="prettyprint">Fragment</code> by yourself, what you do is invoke the <code class="prettyprint">FragmentFactory</code> methods: <code class="prettyprint">example</code>, <code class="prettyprint">step</code>, <code class="prettyprint">text</code>,… This now unifies the notation between immutable and mutable specifications because in <s2>specs2 < 3.0</s2> you would write <code class="prettyprint">step</code> in a mutable specification and <code class="prettyprint">Step</code> in an immutable one (<code class="prettyprint">Step</code> is now deprecated)</p></li>
<li><p>there is no <code class="prettyprint">ExampleFactory</code> trait anymore since it has been subsumed by methods on the <code class="prettyprint">FragmentFactory</code> trait (so this will break code for people who were intercepting <code class="prettyprint">Example</code> creation to inject additional behaviour)</p></li>
</ul>
<p>Finally those “core” objects have been moved under the <code class="prettyprint">org.specs2.specification.core</code> package, in order to restructure the <code class="prettyprint">org.specs2.specification</code> package into</p>
<ul>
<li><p><code class="prettyprint">core</code>: <code class="prettyprint">Fragment</code> <code class="prettyprint">Description</code>, <code class="prettyprint">SpecificationStructure</code>…</p></li>
<li><p><code class="prettyprint">dsl</code>: all the syntactic sugar <code class="prettyprint">FragmentsDsl</code>, <code class="prettyprint">ExampleDsl</code>, <code class="prettyprint">ActionDsl</code>…</p></li>
<li><p><code class="prettyprint">process</code>: the “processing” classes <code class="prettyprint">Selector</code>, <code class="prettyprint">Executor</code>, <code class="prettyprint">StatisticsRepository</code>…</p></li>
<li><p><code class="prettyprint">create</code>: traits to create the specification <code class="prettyprint">FragmentFactory</code>, <code class="prettyprint">AutoExamples</code>, <code class="prettyprint">S2StringContext</code> (for s2 string interpolation)…</p></li>
</ul>
<h4 id="fragmentsdsl-methods"><code class="prettyprint">FragmentsDsl</code> methods</h4>
<p>When you want to assemble <code class="prettyprint">Fragment</code>s together you will need the <code class="prettyprint">FragmentsDsl</code> trait to do so (mixed-in the <code class="prettyprint">Specification</code> trait, you don’t have to add it).</p>
<p>The result of appending 2 <code class="prettyprint">Fragment</code>s is a <code class="prettyprint">Fragments</code> object. The <code class="prettyprint">Fragments</code> class has changed in <s2>specs2 3.0</s2>. It doesn’t hold a reference to the specification title and the specification arguments anymore, this is now the role of the <code class="prettyprint">SpecStructure</code>. So in summary:</p>
<ul>
<li><p>a <code class="prettyprint">Specification</code> is a function <code class="prettyprint">Env => SpecStructure</code></p></li>
<li><p>a <code class="prettyprint">SpecStructure</code> contains: a <code class="prettyprint">SpecHeader</code>, some <code class="prettyprint">Arguments</code> and <code class="prettyprint">Fragments</code></p></li>
<li><p><code class="prettyprint">Fragments</code> is a sequence of <code class="prettyprint">Fragment</code>s (actually a <code class="prettyprint">scalaz-stream</code> <code class="prettyprint">Process[Task, Fragment]</code>)</p></li>
</ul>
<p>The <code class="prettyprint">FragmentsDsl</code> api allows to combine almost everything into <code class="prettyprint">Fragments</code> with the <code class="prettyprint">^</code> operator:</p>
<ul>
<li>a <code class="prettyprint">String</code> to a <code class="prettyprint">Seq[Fragment]</code></li>
<li>2 <code class="prettyprint">Fragments</code></li>
<li>1 <code class="prettyprint">Fragments</code> and a <code class="prettyprint">String</code></li>
</ul>
<p>One advantage of this fine-grained decomposition of the fragments API is that there is now a <code class="prettyprint">Spec</code> lightweight trait.</p>
<h3 id="lightweight-spec-trait">Lightweight <code class="prettyprint">Spec</code> trait</h3>
<p>Compilation times can be a problem with Scala and <s2>specs2</s2> makes it worse by providing lots of implicit methods in a standard <code class="prettyprint">Specification</code> to provide various DSLs. In <s2>specs2 3.0</s2> there is a <code class="prettyprint">Spec</code> trait which contains a reduced number of implicits to:</p>
<ul>
<li>create a <code class="prettyprint">s2</code> string for an “Acceptance Specification”</li>
<li>create <code class="prettyprint">should</code> and <code class="prettyprint">in</code> blocks in a “Unit Specification”</li>
<li>to create expectations with <code class="prettyprint">must</code></li>
<li>to add arguments to the specification (like <code class="prettyprint">sequential</code>)</li>
</ul>
<p>If you use that trait and you find yourself missing an implicit you will have to either:</p>
<ul>
<li><p>use the <code class="prettyprint">Specification</code> class instead</p></li>
<li><p>search <s2>specs2</s2> for the trait or object providing the missing implicit. There is no magic recipe for this but the <code class="prettyprint">MustMatchers</code> trait and the <code class="prettyprint">Spec2StringContext</code> trait should bring most of the missing implicits in scope</p></li>
</ul>
<p>It is possible that this trait will be adjusted to find the exact balance between expressivity and compile times but I hope it will remain pretty stable.</p>
<h3 id="durations">Durations</h3>
<p>When <s2>specs2</s2> started, the package <code class="prettyprint">scala.concurrent.duration</code> didn’t exist. This is why there was a <code class="prettyprint">Duration</code> type in <s2>specs2 < 3.0</s2> and a <code class="prettyprint">TimeConversions</code> trait. Of course this introduced annoying collisions with the implicits coming from <code class="prettyprint">scala.concurrent.duration</code> when that one came around.</p>
<p>There is no reason to go on using <s2>specs2</s2> Durations anymore now so you can use the standard Scala durations everywhere <s2>specs2</s2> expects a <code class="prettyprint">Duration</code>.</p>
<h3 id="contexts">Contexts</h3>
<p>Context management has been slowly evolving in <s2>specs2</s2>. In <s2>specs2 3.0</s2> we end up with the following traits:</p>
<ul>
<li><code class="prettyprint">BeforeAll</code> do something before all the examples (you had to use a <code class="prettyprint">Step</code> in <s2>specs2 < 3.0</s2>)</li>
<li><code class="prettyprint">BeforeEach</code> do something before each example (was <code class="prettyprint">BeforeExample</code> in <s2>specs2 < 3.0</s2>)</li>
<li><code class="prettyprint">AfterEach</code> do something after each example (was <code class="prettyprint">AfterExample</code> in <s2>specs2 < 3.0</s2>)</li>
<li><code class="prettyprint">BeforeAfterEach</code> do something before/after each example (was <code class="prettyprint">BeforeAfterExample</code> in <s2>specs2 < 3.0</s2>)</li>
<li><code class="prettyprint">ForEach[T]</code> provide an element of type <code class="prettyprint">T</code> (a “fixture”) to some each example (was <code class="prettyprint">FixtureExample[T]</code> in <s2>specs2 < 3.0</s2>)</li>
<li><code class="prettyprint">AfterAll</code> do something after all the examples examples (you had to use a <code class="prettyprint">Step</code> in <s2>specs2 < 3.0</s2>)</li>
<li><code class="prettyprint">BeforeAfterAll</code> do something before/after all the examples examples (you had to use a <code class="prettyprint">Step</code> in <s2>specs2 < 3.0</s2>)</li>
</ul>
<p>There are some other cool things you can do. For example set a time-out for all examples based on a command line parameter:</p>
<pre class="prettyprint"><code class="prettyprint">trait ExamplesTimeout extends EachContext with MustMatchers with TerminationMatchers {
def context: Env => Context = { env: Env =>
val timeout = env.arguments.commandLine.intOr("timeout", 1000 * 60).millis
upTo(timeout)(env.executorService)
}
def upTo(to: Duration)(implicit es: ExecutorService) = new Around {
def around[T : AsResult](t: =>T) = {
lazy val result = t
val termination =
result must terminate(retries = 10,
sleep = (to.toMillis / 10).millis).orSkip((ko: String) => "TIMEOUT: "+to)
if (!termination.toResult.isSkipped) AsResult(result)
else termination.toResult
}
}
}</code></pre>
<p>The <code class="prettyprint">ExamplesTimeout</code> trait extends <code class="prettyprint">EachContext</code> which is a generalization of the <code class="prettyprint">xxxEach</code> traits. With the <code class="prettyprint">EachContext</code> trait you get access to the environment to define the behaviour used to “decorate” each example. So, in that case, we use a <code class="prettyprint">timeout</code> command line parameter to create an <code class="prettyprint">Around</code> context that will timeout each example if necessary. You can also note that this <code class="prettyprint">Around</code> context uses the <code class="prettyprint">executorService</code> passed by the environment so you don’t have to worry about resources management for your Specification.</p>
<h3 id="included-specifications">Included specifications</h3>
<p>As I was reworking the implementation of <s2>specs2</s2> I also looked for ways to simplify its internal model. In <s2>specs2 < 3.0</s2> you can nest a specification inside another one. This adds some significant complexity because a nested specification have its own arguments, its own title. For example during the execution of the inner specification we need to be careful enough to override the outer specification arguments with the inner ones.</p>
<p>I decided to let go of this functionality in favor of a view of specifications as “referencing” each other, with 2 types of references:</p>
<ul>
<li>“link” reference</li>
<li>“see” reference</li>
</ul>
<p>The idea is to model dependency relationships with “link” and weaker relationships with “see” (when you just want to mention that some information is present in another specification).</p>
<p>Then there are 2 modes of execution:</p>
<ul>
<li>the default one</li>
<li>the “all” mode</li>
</ul>
<p>By default when a specification is executed, the Runner will try to display the status of “linked” specifications but not “see” specifications. If you use the <code class="prettyprint">all</code> argument then we collect all the “linked” specifications transitively and run them respecting dependencies (if s1 has a link to s2, then s2 is executed first).</p>
<p>This is particularly important for HTML reporting when the structure of “link” references is used to produce a table of contents and “see” references are merely used to display HTML links.</p>
<h3 id="online-specifications">Online specifications</h3>
<p>I find this exciting even if I don’t know if I will ever use this feature! (it has been requested in the past though).</p>
<p>In <s2>specs2 < 3.0</s2> there is a clear distinction between the “creation” time and the “execution” time for specification. Once you have defined your examples you can not add new ones based on your execution results. But wait! this is more or less the property of a <code class="prettyprint">Monad</code>! “Produce an action based on the value returned by another action”. Since <s2>specs2 3.0</s2> is using <code class="prettyprint">scalaz-stream</code> <code class="prettyprint">Process</code> under the covers which is a <code class="prettyprint">Monad</code>, this means that it is now possible to do the following:</p>
<pre class="prettyprint"><code class="prettyprint">class WikipediaBddSpec extends Specification with Online { def is = s2"""
All the pages mentioning the term BDD must contain a reference to specs2 $e1
"""
def e1 = {
val pages = Wikipedia.getPages("BDD")
// if the page is about specs2, add more examples to check the links
(pages must contain((_:Page) must mention("specs2"))) continueWith
pagesSpec(pages)
}
/** create one example per linked page */
def pagesSpec(pages: Seq[Page]): Fragments = {
val specs2Links = pages.flatMap(_.getLinks).filter(_.contains("specs2"))
s2"""
The specs2 links must be active
${Fragments.foreach(specs2Links)(active)}
"""
}
def active(link: HtmlLink) =
s2"""
The page at ${link.url} must be active ${ link must beActive }"""
}</code></pre>
<p>The specification above is “dynamic” in the sense that it creates more examples based on the tested data. All Wikipedia pages for BDD must mention “specs2” and for each linked page (which we can’t know in advance) we create a new example specifying that the link must be active.</p>
<h3 id="scalacheck">ScalaCheck</h3>
<p>The ScalaCheck trait has been reworked and extended to provide the following features:</p>
<ul>
<li>you can specify <code class="prettyprint">Arbitrary[T]</code>, <code class="prettyprint">Gen[T]</code>, <code class="prettyprint">Shrink[T]</code>, <code class="prettyprint">T => Pretty</code> instances at the property level (for any or all of the arguments)</li>
<li>you can easily collect argument values by appending <code class="prettyprint">.collectXXX</code> to the property (<code class="prettyprint">XXX</code> depends on the argument you want to collect. <code class="prettyprint">collect1</code> for the first, <code class="prettyprint">collectAll</code> for all)</li>
<li>you can override default parameters from the command line. For example pass <code class="prettyprint">scalacheck.mintestsok 10000</code></li>
<li>you can set individual <code class="prettyprint">before</code>, <code class="prettyprint">after</code> actions to be executed before and after the property executes to do some setup/teardown</li>
</ul>
<p>Also, specs2 was previously doing some message reformatting on top of ScalaCheck but now the ScalaCheck original messages have been preserved to keep the consistency between the 2 libraries.</p>
<p><em>Note</em>: the <code class="prettyprint">ScalaCheck</code> trait stays in the <code class="prettyprint">org.specs2</code> package but all the traits it depends on now live in the <code class="prettyprint">org.specs2.scalacheck</code> package.</p>
<h3 id="bits-and-pieces">Bits and pieces</h3>
<p>This section is about various small things which have changed with <s2>specs2 3.0</s2>:</p>
<h5 id="implicit-context">Implicit context</h5>
<p>There is no more implicit context when you use the <code class="prettyprint">.await</code> method to match futures. This means that you have to either import the <code class="prettyprint">scala.concurrent.ExecutionContext.global</code> context or to use a function <code class="prettyprint">ExecutionContext => Result</code> to define your examples:</p>
<pre class="prettyprint"><code class="prettyprint">s2"""
An example using an ExecutionContext $e1
"""
def e1 = { implicit ec: ExecutionContext =>
// use the context here
ok
}</code></pre>
<h5 id="foreach-methods">Foreach methods</h5>
<p>It is possible now to create several examples or results with a <code class="prettyprint">foreach</code> method which will <em>not</em> return <code class="prettyprint">Unit</code>:</p>
<pre class="prettyprint"><code class="prettyprint">// create several examples
Fragment.foreach(1 to 10)(i => "example "+i ! ok)
// create several examples with breaks in between
Fragments.foreach(1 to 10)(i => ("example "+i ! ok) ^ br)
// create several results for a sequence of numbers
Result.foreach(1 to 10)(i => i must_== i)</code></pre>
<h5 id="removed-syntax">Removed syntax</h5>
<ul>
<li><code class="prettyprint">(action: Any).before</code> to create a “before” context is removed (same thing for <code class="prettyprint">after</code>)</li>
<li><code class="prettyprint">function.forAll</code> to create a <code class="prettyprint">Prop</code> from a function</li>
</ul>
<h5 id="dependencies">Dependencies</h5>
<ul>
<li><s2>specs2 3.0</s2> uses scalacheck 1.12.1</li>
<li>you need to use a recent version of sbt, like 0.13.7</li>
<li>you need to upgrade to scalaz-specs2 0.4.0-SNAPSHOT for compatibility</li>
</ul>
<h2 id="can-i-use-it">Can I use it?</h2>
<p><s2>specs2 3.0</s2> is now available as <code class="prettyprint">specs2-core-3.0-M2</code> on Sonatype. I am making it available for early testing and feedback. Please use the mailing-list or the github issues to ask questions and tell me if there is anything going wrong with this new version. I will incorporate your comments in this blog post, serving as a migration guide.</p>
<p><em>Special thanks</em></p>
<ul>
<li>to <a href="https://github.com/cfreeman">Clinton Freeman</a> who started the re-design of the specs2 home page more than one year ago and sparked this whole refactoring</li>
<li>to <a href="https://github.com/pchlupacek">Pavel Chlupacek</a> and <a href="https://github.com/fthomas">Frank Thomas</a> for patiently answering many of my questions about scalaz-stream</li>
<li>to <a href="https://github.com/pchiusano">Paul Chiusano</a> for starting scalaz-stream in the first place!</li>
<li>to <a href="https://github.com/markhibberd">Mark Hibberd</a> for his guidance with functional programming</li>
</ul>
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5336273.post-34532656203157399752014-03-06T07:06:00.000+09:002014-03-06T07:06:05.381+09:00Streaming with previous and next<status class="ok"><div style="display: show; text-indent:0px;"><p>The <a href="https://github.com/scalaz/scalaz-stream">Scalaz streams</a> library is very attractive but it might feel unfamiliar because this is not your standard collection library.</p><p>This short post shows how to produce a stream of elements from another stream so that we get a triplet with: the previous element, the current element, the next element.</p><a name="With+Scala+collections"><h3>With Scala collections</h3></a><p>With regular Scala collections, this is not too hard. We first create a list of all the previous elements. We create them as options because there will not be a previous element for the first element of the list. Then we create a list of next elements (also a list of options) and we zip everything with the input list: </p></div></status><status class="ok"><div style="display: show; text-indent:0px;"><pre><code class="prettyprint">def withPreviousAndNext[T] = (list: List[T]) => {
val previousElements = None +: list.map(Some(_)).dropRight(1)
val nextElements = list.drop(1).map(Some(_)) :+ None
// plus some flattening of the triplet
(previousElements zip list zip nextElements) map { case ((a, b), c) => (a, b, c) }
}
withPreviousAndNext(List(1, 2, 3))
</code></pre><p><code class="prettyprint">> List((None,1,Some(2)), (Some(1),2,Some(3)), (Some(2),3,None))</code></p><a name="And+streams"><h3>And streams</h3></a><p>The code above can be translated pretty straightforwardly to scalaz processes: </p></div></status><status class="ok"><div style="display: show; text-indent:0px;"><pre><code class="prettyprint">def withPreviousAndNext[F[_], T] = (p: Process[F, T]) => {
val previousElements = emit(None) fby p.map(Some(_))
val nextElements = p.drop(1).map(Some(_)) fby emit(None)
(previousElements zip p zip nextElements).map { case ((a, b), c) => (a, b, c) }
}
val p1 = emitAll((1 to 3).toSeq).toSource
withPreviousAndNext(p1).runLog.run
</code></pre><p><code class="prettyprint">> Vector((None,1,Some(2)), (Some(1),2,Some(3)), (Some(2),3,None))</code></p><p>However what we generally want with streams is combinators which you can pipe onto a given Process. We want to write </p>
<pre><code class="prettyprint">def withPreviousAndNext[T]: Process1[T, T] = ???
val p1 = emitAll((1 to 3).toSeq).toSource
// produces the stream of (previous, current, next)
p1 |> withPreviousAndNext
</code></pre><p>How can we write this?</p><a name="As+a+combinator"><h3>As a combinator</h3></a><p>The trick is to use recursion to keep state and this is actually how many of the <code class="prettyprint">process1</code> combinators in the library are written. Let's see how this works on a simpler example. What happens if we just want a stream where elements are zipped with their previous value? Here is what we can write: </p></div></status><status class="ok"><div style="display: show; text-indent:0px;"><pre><code class="prettyprint">def withPrevious[T]: Process1[T, (Option[T], T)] = {
def go(previous: Option[T]): Process1[T, (Option[T], T)] =
await1[T].flatMap { current =>
emit((previous, current)) fby go(Some(current))
}
go(None)
}
val p1 = emitAll((1 to 3).toSeq).toSource
(p1 |> withPrevious).runLog.run
</code></pre><p><code class="prettyprint">> Vector((None,1), (Some(1),2), (Some(2),3))</code></p><p>Inside the <code class="prettyprint">withPrevious</code> method we recursively call <code class="prettyprint">go</code> with the state we need to track. In this case we want to keep track of each previous element (and the first call is with <code class="prettyprint">None</code> because there is no previous element for the first element of the stream). Then <code class="prettyprint">go</code> awaits a new element. Each time there is a new element, we <code class="prettyprint">emit</code> it, then call recursively <code class="prettyprint">go</code> which is again going to wait for the next element, knowing that the <em>new</em> previous element is now <code class="prettyprint">current</code>.</p><p>We can do something similar, but a bit more complex for <code class="prettyprint">withNext</code>: </p></div></status><status class="ok"><div style="display: show; text-indent:0px;"><pre><code class="prettyprint">def withNext[T]: Process1[T, (T, Option[T])] = {
def go(current: Option[T]): Process1[T, (T, Option[T])] =
await1[T].flatMap { next =>
current match {
// accumulate the first element
case None => go(Some(next))
// if we have a current element, emit it with the next
// but when there's no more next, emit it with None
case Some(c) => (emit((c, Some(next))) fby go(Some(next))).orElse(emit((c, None)))
}
}
go(None)
}
val p1 = emitAll((1 to 3).toSeq).toSource
(p1 |> withNext).runLog.run
</code></pre><p><code class="prettyprint">> Vector((1,Some(2)), (2,Some(3)), (2,None))</code></p><p>Here, we start by accumulating the first element of the stream, and then, when we get to the next, we emit both of them. And we make a recursive call remembering what is now the current element. But the process we return in <code class="prettyprint">flatMap</code> has an <code class="prettyprint">orElse</code> clause. It says "by the way, if you don't have anymore elements (no more <code class="prettyprint">next</code>), just emit current and <code class="prettyprint">None</code>".</p><p>Now with both <code class="prettyprint">withPrevious</code> and <code class="prettyprint">withNext</code> we can create a <code class="prettyprint">withPreviousAndNext</code> process: </p></div></status><status class="ok"><div style="display: show; text-indent:0px;"><pre><code class="prettyprint">def withPreviousAndNext[T]: Process1[T, (Option[T], T, Option[T])] = {
def go(previous: Option[T], current: Option[T]): Process1[T, (Option[T], T, Option[T])] =
await1[T].flatMap { next =>
current.map { c =>
emit((previous, c, Some(next))) fby go(Some(c), Some(next))
}.getOrElse(
go(previous, Some(next))
).orElse(emit((current, next, None)))
}
go(None, None)
}
val p1 = emitAll((1 to 3).toSeq).toSource
(p1 |> withPreviousAndNext).runLog.run
</code></pre></div></status><status class="ok"><div style="display: show; text-indent:0px;"><p><code class="prettyprint">> Vector((None,1,Some(2)), (Some(1),2,Some(3)), (Some(2),3,None))</code></p><p>The code is pretty similar but this time we keep track of both the "previous" element and the "current" one.</p><a name="emit%28last+paragraph%29"><h3><code class="prettyprint">emit(last paragraph)</code></h3></a><p>I hope this will help beginners like me to get started with scalaz-stream and I'd be happy if scalaz-stream experts out there leave comments if there's anything which can be improved (is there an effective way to combine <code class="prettyprint">withPrevious</code> and <code class="prettyprint">withNext</code> to get <code class="prettyprint">withPreviousAndNext</code>?</p><p>I finally need to add that, in order to get proper performance/side-effect control for the <code class="prettyprint">withNext</code> and <code class="prettyprint">withPreviousAndNext</code> processes you need to use the <a href="https://github.com/scalaz/scalaz-stream/tree/lazy">lazy branch of scalaz-stream</a>. It contains a fix for <code class="prettyprint">orElse</code> which prevents it to be evaluated <a href="https://github.com/scalaz/scalaz-stream/issues/51">more than necessary</a>.</p></div></status>Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-5336273.post-29919326840945269452013-12-10T08:27:00.000+09:002013-12-10T08:27:35.655+09:00The revenge of the chunks<status class="ok"><p></p></status><status class="ok"><div style="display: show; text-indent:0px;"><p>This series of posts feels like a whole saga for something which should have a quick an easy way to demonstrate the obvious superiority of functional programming over a simple loop. In the <a href="http://etorreborre.blogspot.com.au/2013/12/runstate-for-scalaz-stream-process.html">first post</a>. Then the <a href="http://etorreborre.blogspot.com.au/2013/12/runstate-0-combinators-1.html">second post</a> was about defining proper <code class="prettyprint">scalaz-stream</code> combinators to do the same thing, and particularly how to <a href="http://etorreborre.blogspot.com.au/2013/12/runstate-0-combinators-1.html#Chunky+streaming">"chunk"</a> the processing in order to get good performances.</p><p>However as I was writing unit tests for my requirements I realized that the problem was harder than I thought. In particular, the files I'm processing can have several sections made of <code class="prettyprint">HEADERs</code> and <code class="prettyprint">TRAILERs</code>. When you create chunks of lines to process this results in a number of combinations that need to be analysed. A chunk can:</p>
<ul>
<li>start with a <code class="prettyprint">HEADER</code> but not finish with a <code class="prettyprint">TRAILER</code> which is in another chunk</li>
<li>contain lines only</li>
<li>contains lines + a <code class="prettyprint">TRAILER</code> + a new <code class="prettyprint">HEADER</code></li>
<li>and so on...</li>
</ul><p>For each of these cases it is necessary to use the current state and the contents of the lines to determine if the file is malformed or not. This is a lot less easy that previously.</p><a name="All+the+combinations"><h3>All the combinations</h3></a><p>This is what I came up with:</p>
<pre><code class="prettyprint"> def process(path: String, targetName: String, chunkSize: Int = 10000): String \/ File = {
val targetPath = path.replace(".DAT", "")+targetName
val read =
linesRChunk(path, chunkSize) |>
validateLines.map(lines => lines.mkString("\n"))
val task =
((read |> process1.intersperse("\n") |>
process1.utf8Encode) to io.fileChunkW(targetPath)).run
task.attemptRun.leftMap(_.getMessage).map(_ => new File(targetPath))
}
/**
* validate that the lines have the right sequence of HEADER/column names/lines/TRAILER
* and the right number of lines
*/
def validateLines: Process1[Vector[String], Vector[String]] = {
// feed lines into the lines parser with a given state
// when it's done, follow by parsing with a new state
def parse(lines: Vector[String], state: LineState, newState: LineState) =
emit(lines) |> linesParser(state) fby linesParser(newState)
// parse chunks of lines
def linesParser(state: LineState): Process1[Vector[String], Vector[String]] = {
receive1[Vector[String], Vector[String]] { case lines =>
lines match {
case first +: rest if isHeader(first) =>
if (state.openedSection) fail("A trailer is missing")
else
parse(lines.drop(2),
state.open,
LineState(lines.count(isHeader) > lines.count(isTrailer),
lines.drop(2).size))
case first +: rest if isTrailer(first) =>
val expected = "\\d+".r.findFirstIn(first).map(_.toInt).getOrElse(0)
if (!state.openedSection)
fail("A header is missing")
else if (state.lineCount != expected)
fail(s"expected $expected lines, got ${state.lineCount}")
else {
val dropped = lines.drop(1)
parse(dropped,
state.restart,
LineState(dropped.count(isHeader) > dropped.count(isTrailer),
dropped.size))
}
case first +: rest =>
if (!state.openedSection) fail("A header is missing")
else {
val (first, rest) = lines.span(line => !isTrailer(line))
emit(first) fby
parse(rest, state.addLines(first.size), state.addLines(lines.size))
}
case Vector() => halt
}
}
}
// initialise the parsing expecting a HEADER
linesParser(LineState())
}
private def fail(message: String) = Halt(new Exception(message))
private def isHeader(line: String) = line.startsWith("HEADER|")
private def isTrailer(line: String) = line.startsWith("TRAILER|")
</code></pre><p>The bulk of the code is the <code class="prettyprint">validateLines</code> process which verifies the file structure:</p>
<ul>
<li><p>if the first line of this chunk is a <code class="prettyprint">HEADER</code> the next line needs to be skipped, we know we opened a new section, and we feed the rest to the lines parser again. However we <code class="prettyprint">fail</code> the process if we were not expecting a <code class="prettyprint">HEADER</code> there</p></li>
<li><p>if the first line of this chunk is a <code class="prettyprint">TRAILER</code> we do something similar but we also check the expected number of lines</p></li>
<li><p>otherwise we try to emit as many lines as possible until the next <code class="prettyprint">HEADER</code> or <code class="prettyprint">TRAILER</code> and we recurse</p></li>
</ul><p>This is a bit complex because we need to analyse the first element of the chunk, then emit the rest and calculate the new state we will have when this whole chunk is emitted. On the other hand the processor is easy to test because I don't have to read or write files to check it. This would be a bit more difficult to do with the loop version.</p><p>But unfortunately not all the tests are green. One is still not passing. What if there is no ending <code class="prettyprint">TRAILER</code> in the file? How can I raise an exception? There's no process to run, because there are no more lines to process! My test is pending for now, and I'll post the solution once I have it (maybe there's a smarter way to rewrite all of this?).</p><a name="Is+it+worth+it%3F"><h3>Is it worth it?</h3></a><p>This was definitely worth it for me in terms of learning the <code class="prettyprint">scalaz-stream</code> library. However in terms of pure programmer "productivity", for this kind of requirements, it feels like an overkill. The imperative solution is very easy to come up with and there is no problems with performances. This should change once <strong>streaming parsing</strong> is available (see the <a href="https://github.com/scalaz/scalaz-stream/wiki/Roadmap">roadmap</a>). Probably this use case will just be expressed as a one-liner. In the light of this post I'm just curious how the implementation will deal with chunking.</p></div></status>Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-5336273.post-76673391328423594262013-12-09T09:41:00.000+09:002013-12-09T13:49:24.264+09:00`runState` 0 - combinators 1<status class="ok"><p></p></status><status class="ok"><div style="display: show; text-indent:0px;"><p>In my <a href="http://etorreborre.blogspot.com.au/2013/12/runstate-for-scalaz-stream-process.html">previous blog post</a> I was trying to implement a <code class="prettyprint">runState</code> method with <a href="https://github.com/scalaz/scalaz-stream"><code class="prettyprint">scalaz-stream</code></a> to process a file and try to validate its internal structure. That was however not a good solution because:</p>
<ul>
<li>it doesn't use combinators but a special purpose <code class="prettyprint">runState</code> method</li>
<li>it stackoverflows on large files!</li>
</ul><p>It turns out that there is a much better way of dealing with this use case.</p><a name="Combinators"><h3>Combinators</h3></a><p>First of all it is possible to propagate some state with <code class="prettyprint">scalaz-stream</code> without having to write a special <code class="prettyprint">runState</code> method. The following uses only combinators to do the job:</p>
<pre><code class="prettyprint">def process(path: String, targetName: String): String \/ File = {
val HEADER = "HEADER(.*)".r
val TRAILER = "TRAILER\\|(\\d+)".r
val lineOrTrailer: Process1[String, String] = {
def go(lines: Int): Process1[String, String] =
receive1[String, String] {
case TRAILER(count) =>
if (count.toInt == lines) halt
else Halt(new Exception(s"Expected $count lines, but got $lines"))
case HEADER(h) =>
Halt(new Exception(s"Didn't expected a HEADER here: $h"))
case s =>
emit(s) fby go(lines + 1)
}
go(0)
}
val linesStructure =
discardRegex("HEADER.*") fby
discardLine fby
lineOrTrailer
val read = io.linesR(path) |> linesStructure
val targetPath = path.replace(".DAT", "")+targetName
val task =
((read |> process1.intersperse("\n") |>
process1.utf8Encode) to io.fileChunkW(targetPath)).run
task.attemptRun.leftMap(_.getMessage).map(_ => new File(targetPath))
}
val discardLine = receive1[String, String] { _ => halt }
/** discard a line if it matches the expected pattern */
def discardRegex(pattern: String): Process1[String,String] = {
val compiled = Pattern.compile(pattern)
receive1[String, String] { line =>
if (compiled.matcher(line).matches) halt
else Halt(new Exception(s"Failed to parse $line, does not match regex: $pattern"))
}
}
</code></pre><p>With the code above, processing a file amounts to:</p>
<ul>
<li>reading the lines</li>
<li>analysing them with <code class="prettyprint">linesStructure</code> which propagates the current state, the number of lines already processed, with a recursive method (<code class="prettyprint">go</code>) calling itself</li>
<li>writing the lines to a new file</li>
</ul><p>The <code class="prettyprint">linesStructure</code> method almost looks like a <a href="http://www.scala-lang.org/api/current/index.html#scala.util.parsing.combinator.Parsers">parser combinators</a> expression with parsers sequenced with the <code class="prettyprint">fby</code> ("followed by") method.</p><p>That looks pretty good but... it performs horribly! With the good-old "loop school", it took 8 seconds to process a 700M file:</p>
<pre><code class="prettyprint">def processLoop(path: String, targetName: String): String \/ File = {
val targetPath = path.replace(".DAT", "")+targetName
val writer = new FileWriter(targetPath)
val source = scala.io.Source.fromFile(new File(path))
var count = 0
var skipNextLine = false
try {
source.getLines().foreach { line =>
if (line.startsWith("HEADER")) skipNextLine = true
else if (skipNextLine) { skipNextLine = false }
else if (line.startsWith("TRAILER")) {
val expected = line.drop(8).headOption.map(_.toInt).getOrElse(0)
if (expected != count) throw new Exception(s"expected $expected, got $count")
}
else {
count = count + 1
writer.write(line)
}
}
} catch {
case t: Throwable => t.getMessage.left
} finally {
source.close
writer.close
}
new File(targetPath).right
}
</code></pre><p>With the nice, "no-variables, no loop", method it took almost,... 8 minutes!</p><a name="Chunky+streaming"><h3>Chunky streaming</h3></a><p>It is fortunately possible to retrieve correct performances by "chunking" the lines before processing them. To do this, we need a new combinator, very close to the <code class="prettyprint">io.linesR</code> combinator in <code class="prettyprint">scalaz-stream</code>:</p>
<pre><code class="prettyprint">// read a file, returning one "chunk" of lines at the time
def linesRChunk(filename: String, chunkSize: Int = 10000): Process[Task, Vector[String]] =
io.resource(Task.delay(scala.io.Source.fromFile(filename)))(src => Task.delay(src.close)) { src =>
lazy val lines = src.getLines.sliding(chunkSize, chunkSize) // A stateful iterator
Task.delay {
if (lines.hasNext) lines.next.toVector
else throw End
}
}
</code></pre><p>Now we can process each chunk with:</p>
<pre><code class="prettyprint">def process(path: String, targetName: String, bufferSize: Int = 1): String \/ File = {
val HEADER = "HEADER(.*)".r
val TRAILER = "TRAILER\\|(\\d+)".r
def linesParser(state: LineState): Process1[Vector[String], Vector[String]] = {
def onHeader(rest: Vector[String]) =
(emit(rest) |> linesParser(ExpectLineOrTrailer(0))) fby
linesParser(ExpectLineOrTrailer(rest.size))
def onLines(ls: Vector[String], actual: Int) =
emit(ls) fby linesParser(ExpectLineOrTrailer(actual + ls.size))
def onTrailer(ls: Vector[String], count: Int, actual: Int) =
if ((actual + ls.size) == count) emit(ls)
else fail(new Exception(s"expected $count lines, got $actual"))
receive1[Vector[String], Vector[String]] { case lines =>
(lines, state) match {
case (Vector(), _) =>
halt
case (HEADER(_) +: cols +: rest, ExpectHeader) =>
onHeader(rest)
case (_, ExpectHeader) =>
fail(new Exception("expected a header"))
case (ls :+ TRAILER(count), ExpectLineOrTrailer(n)) =>
onTrailer(ls, count.toInt, n)
case (ls, ExpectLineOrTrailer(n)) =>
onLines(ls, n)
}
}
}
val targetPath = path.replace(".DAT", "")+targetName
val read = linesRChunk(path, bufferSize) |>
linesParser(ExpectHeader).map(lines => lines.mkString("\n"))
val task =
((read |> process1.intersperse("\n") |>
process1.utf8Encode) to io.fileChunkW(targetPath)).run
task.attemptRun.leftMap(_.getMessage).map(_ => new File(targetPath))
}
</code></pre><p>The <code class="prettyprint">linesParser</code> method uses <code class="prettyprint">receive1</code> to analyse:</p>
<ul>
<li>the current state: are we expecting a <code class="prettyprint">HEADER</code>, or some lines followed by a <code class="prettyprint">TRAILER</code>?</li>
<li>the current chunk of lines</li>
</ul><p>When we expect a HEADER and we have one, we skip the row containing the column names (see <code class="prettyprint">onHeader</code>), we <code class="prettyprint">emit</code> the rest of the lines to the linesParser (this is the recursive call) and we change the state to <code class="prettyprint">ExpectLineOrTrailer</code>. If we get some lines with no <code class="prettyprint">TRAILER</code>, we <code class="prettyprint">emit</code> those lines and make a recursive call to <code class="prettyprint">linesParser</code> with an incremented count to signal how many lines we've emitted so far (in the <code class="prettyprint">onLines</code> method). Finally, if we get some lines and a <code class="prettyprint">TRAILER</code> we check that the expected number of lines is equal to the actual one before emitting the lines and stopping the processing (no more recursive call in <code class="prettyprint">onTrailer</code>).</p><p>For reference, here are the state objects used to track the current processing state:</p>
<pre><code class="prettyprint">sealed trait LineState
case object ExpectHeader extends LineState
case class ExpectLineOrTrailer(lineCount: Int = 0) extends LineState
</code></pre><p>This new way of processing lines gets us:</p>
<ul>
<li>a readable state machine with clear transitions, which was my first objective</li>
<li>adequate performances; it takes around 10 seconds to process a 700M file which is slightly more than the <code class="prettyprint">processLoop</code> version but acceptable</li>
</ul><a name="One+other+explored+avenue"><h3>One other explored avenue</h3></a><p>It took me a loooooooooooong time to get there. I think I hit <a href="https://github.com/scalaz/scalaz-stream/issues/51">this issue</a> when trying to use the built-in <code class="prettyprint">chunk</code> combinator. When using <code class="prettyprint">chunk</code>, my parser was being fed the same lines several times. For a chunk of 10 lines, I first had the first line, then the first 2, then the first 3,... Even with a modified version of <code class="prettyprint">chunk</code> the performances were still very bad. This is why I wrote my own <code class="prettyprint">linesRChunk</code>.</p><p>Now I got something working I hope that this will boost other's development time and show that it is possible to avoid loops + variables in that case!</p></div></status>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5336273.post-30855735600911024772013-12-05T21:49:00.000+09:002013-12-05T21:49:12.204+09:00`runState` for a scalaz-stream Process<status class="ok"><p></p></status><status class="ok"><div style="display: show; text-indent:0px;"><p>I was preparing to post this on the scalaz mailing-list but I thought that a short blog post could serve as a reference for other people as well. The following assumes that you have a good knowledge of Scalaz (at least of what's covered in my <a href="http://etorreborre.blogspot.com.au/2011/06/essence-of-iterator-pattern.html">"Essence of the Iterator Pattern" post</a> and some familiarity with the <a href="https://github.com/scalaz/scalaz-stream"><code class="prettyprint">scalaz-stream</code></a> library.</p><a name="My+use+case"><h3>My use case</h3></a><p>What I want to do is very common, just process a bunch of files! More precisely I want to (this is slightly simplified):</p>
<ol>
<li><p>read some pipe delimited files</p></li>
<li><p>validate that the files have the proper internal structure:<br /> one(HEADER marker)<br /> one(column names)<br /> many(lines of pipe delimited values)<br /> one(TRAILER marker with total number of lines since the header)</p></li>
<li><p>output only the lines which are not markers to another file</p></li>
</ol><a name="Scalaz+stream"><h3>Scalaz stream</h3></a><p>The excellent chapter 15 of <a href="http://www.manning.com/bjarnason">Functional Programming in Scala</a> highlights some of the potential problems with processing files:</p>
<ul>
<li>you need to make sure you are closing resources properly even in the face of exceptions</li>
<li>you want to be able to easily compose small processing functions together instead of having a gigantic loop and a bunch of variables</li>
<li>you want to control the amount of data that is in memory at any moment in time</li>
</ul><p>Based on the ideas of the book, Paul Chiusano created <a href="https://github.com/scalaz/scalaz-stream"><code class="prettyprint">scalaz-stream</code></a>, a library providing lots of combinators for doing this kind of input/output streaming operations (and more!).</p><a name="A+state+machine+for+the+job"><h3>A state machine for the job</h3></a><p>My starting point for addressing our requirements is to devise a <code class="prettyprint">State</code> object representing the both the expected file structure and the fact that some lines need to be filtered out. First of all I need to model the kind of lines I'm expecting when reading the file:</p>
<pre><code class="prettyprint">sealed trait LineState
case object ExpectHeader extends LineState
case object ExpectHeaderColumns extends LineState
case class ExpectLineOrTrailer(lineCount: Int = 0) extends LineState
</code></pre><p>As you can see <code class="prettyprint">ExpectLineOrTrailer</code> contains a counter to keep track of the number of lines seen so far.</p><p>Then I need a method (referred as the <em>State function</em> below) to update this state when reading a new line:</p>
<pre><code class="prettyprint">def lineState(line: String): State[Throwable \/ LineState, Option[String]] =
State { state: Throwable \/ LineState =>
def t(message: String) = new Exception(message).left
(state, line) match {
case (\/-(ExpectHeader), HeaderLine(_)) =>
(ExpectHeaderColumns.right, None)
case (\/-(ExpectHeaderColumns), _) =>
(ExpectLineOrTrailer(0).right, None)
case (\/-(ExpectHeader), _) =>
(t("expecting a header"), None)
case (\/-(ExpectLineOrTrailer(n)), HeaderLine(_)) =>
(t("expecting a line or a trailer"), None)
case (\/-(ExpectLineOrTrailer(n)), TrailerLine(e)) =>
if (n == e) (ExpectHeader.right, None)
else (t(s"wrong number of lines, expecting $e, got $n"), None)
case (\/-(ExpectLineOrTrailer(n)), _) =>
(ExpectLineOrTrailer(n + 1).right, Some(line))
case (-\/(e), _) =>
(state, None)
}
}
</code></pre><p>The <code class="prettyprint">S</code> type parameter (in the <code class="prettyprint">State[S, A]</code> type) used to keep track of the "state" is <code class="prettyprint">Throwable \/ LineState</code>. I'm using the "Left" part of the disjunction to represent processing errors. The error type itself is a <code class="prettyprint">Throwable</code>. Originally I was using any type <code class="prettyprint">E</code> but we'll see further down why I had to use exceptions. The value type <code class="prettyprint">A</code> I extract from <code class="prettyprint">State[S, A]</code> is going to be <code class="prettyprint">Option[String]</code> in order to output <code class="prettyprint">None</code> when I encounter a marker line.</p><p>This is all pretty good, functional and testable. But how can I use this state machine with a <code class="prettyprint">scalaz-stream</code> <code class="prettyprint">Process</code>?</p><a name="runState"><h3><code class="prettyprint">runState</code></h3></a><p>After much head scratching and a little <a href="https://groups.google.com/d/msg/scalaz/tOeSbe7799Y/onbe49WSsd8J">help from the mailing-list</a> (thanks Pavel!) I realized that I had to write a new <code class="prettyprint">driver</code> for a <code class="prettyprint">Process</code>. Something which would understand what to do with a <code class="prettyprint">State</code>. Here is what I came up with:</p>
<pre><code class="prettyprint">def runState[F[_], O, S, E <: Throwable, A](p: Process[F, O])
(f: O => State[E \/ S, Option[A]], initial: S)
(implicit m: Monad[F], c: Catchable[F]) = {
def go(cur: Process[F, O], init: S): F[Process[F, A]] = {
cur match {
case Halt(End) => m.point(Halt(End))
case Halt(e) => m.point(Halt(e))
case Emit(h: Seq[O], t: Process[F, O]) => {
println("emitting lines here!")
val state = h.toList.traverseS(f)
val (newState, result) = state.run(init.right)
newState.fold (
l => m.point(fail(l)),
r => go(t, r).map(emitAll(result.toSeq.flatten) ++ _)
)
}
case Await(req, recv, fb: Process[F, O], cl: Process[F, O]) =>
m.bind (c.attempt(req.asInstanceOf[F[Any]])) { _.fold(
{ case End => go(fb, init)
case e => go(cl.causedBy(e), init) },
o => go(recv.asInstanceOf[Any => Process[F ,O]](o), init)) }
}
}
go(p, initial)
}
</code></pre><p>This deserves some comments :-)</p><p>The idea is to recursively analyse what kind of <code class="prettyprint">Process</code> we're currently dealing with:</p>
<ol>
<li><p>if this is a <code class="prettyprint">Halt(End)</code> we've terminated processing with no errors. We then return an empty <code class="prettyprint">Seq()</code> <strong>in the context of <code class="prettyprint">F</code></strong> (hence the <code class="prettyprint">m.point</code> operation). <code class="prettyprint">F</code> is the monad that provides us input values so we can think of all the computations happening here as happening inside <code class="prettyprint">F</code> (probably a <code class="prettyprint">scalaz.concurrent.Task</code> when reading file lines)</p></li>
<li><p>if this is a <code class="prettyprint">Halt(error)</code> we use the <code class="prettyprint">Catchable</code> instance for <code class="prettyprint">F</code> to instruct the input process what to do in the case of an error (probably close the file, clean up resources,...)</p></li>
<li><p>if this is an <code class="prettyprint">Emit(values, rest)</code> we <code class="prettyprint">traverseS</code> the list of values in memory with our <code class="prettyprint">State</code> function and we use the initial value to get: 1. the state at the end of the traversal, 2. all the values returned by our <code class="prettyprint">State</code> at each step of its execution. Note that the traversal will happen on <em>all</em> the values in memory, there won't be any short-circuiting if the <code class="prettyprint">State</code> indicates an error. Also, this is important, the <code class="prettyprint">traverseS</code> method is <strong><em>not</em></strong> trampolined. This means that we will get StackOverflow exceptions if the "chunks" that we are processing are too big. On the other hand we will avoid trampolining on each line so we should get good performances. If there was an error we stop all processing and return the error otherwise we emit all the values collected by the <code class="prettyprint">State</code> appended to a recursive call to <code class="prettyprint">go</code></p></li>
<li><p>if this is an <code class="prettyprint">Await</code> Process we attempt to read input values, with <code class="prettyprint">c.attempt</code>, and use the <code class="prettyprint">recv</code> function to process them. We can do that "inside the <code class="prettyprint">F</code> monad" by using the <code class="prettyprint">bind</code> (or <code class="prettyprint">flatMap</code>) method. The resulting <code class="prettyprint">Process</code> is sent to <code class="prettyprint">go</code> in order to be processed with the <code class="prettyprint">State</code> function</p></li>
</ol><p>Note what we do in case 2. when the <code class="prettyprint">newState</code> returns an <code class="prettyprint">exception.left</code>. We create a <code class="prettyprint">Process.fail</code> process with the exception. This is why I used a <code class="prettyprint">Throwable</code> to represent errors in the <code class="prettyprint">State</code> function.</p><p>Now let's see how to use this new "driver".</p><a name="Let%27s+use+it"><h3>Let's use it</h3></a><p>First of all, we create a test file:</p>
<pre><code class="prettyprint">import scalaz.stream._
import Process._
val lines = """|HEADER|file
|header1|header2
|val11|val12
|val21|val22
|val21|val22
|TRAILER|3""".stripMargin
// save 100 times the lines above in a file
(fill(100)(lines).intersperse("\n").pipe(process1.utf8Encode)
.to(io.fileChunkW("target/file.dat")).run.run
</code></pre><p>Then we read the file but we buffer 50 lines at the time to control our memory usage:</p>
<pre><code class="prettyprint">val lines = io.linesR("target/file.dat").buffer(50)
</code></pre><p>We're now ready to run the state function:</p>
<pre><code class="prettyprint">// this task processes the lines with our State function
// the initial State is `ExpectHeader` because this is what we expect the first line to be
val stateTask: Task[Process[Task, String]] = runState(lines)(lineState, ExpectHeader)
// this one outputs the lines to a result file
// separating each line with a new line and encoding it in UTF-8
val outputTask: Task[Unit] = stateTask.flatMap(_.intersperse("\n").pipe(process1.utf8Encode)
.to(io.fileChunkW("target/result.dat")).run)
// if the processing throws an Exception it will be retrieved here
val result: Throwable \/ Unit = task.attemptRun
</code></pre><p>When we finally run the <code class="prettyprint">Task</code>, the result is either <code class="prettyprint">().right</code> if we were able to read, process, and write back to disc or <code class="prettyprint">exception.left</code> if there was any error in the meantime, including when checking if the file has a valid structure.</p><p>The really cool thing about all of this is that we can now precisely control the amount of memory consumed during our processing by using the <code class="prettyprint">buffer</code> method. In the example above we buffer 50 lines at the time then we process them in memory using <code class="prettyprint">traverseS</code>. This is why I left a <code class="prettyprint">println</code> statement in the <code class="prettyprint">runState</code> method. I wanted to see "with my own eyes" how buffering was working. We could probably load more lines but the trade-off will then be that the stack that is consumed by <code class="prettyprint">traverseS</code> will grow and that we might face StackOverflow exceptions.</p><p>I haven't done yet any benchmark but I can imagine lots of different ways to optimise the whole thing for our use case.</p><a name="try+%7B+blog+%7D+finally+%7B+closing+remarks+%7D"><h3>try { blog } finally { closing remarks }</h3></a><p>I'm only scratching the surface of the <code class="prettyprint">scalaz-stream</code> library and there is still a big possibility that I completely misunderstood something obvious!</p><p>First, it is important to say that you might not need to implement the <code class="prettyprint">runState</code> method if you don't have complex validation requirements. There are 2 methods, <code class="prettyprint">chunkBy</code> and <code class="prettyprint">chunkBy2</code>, which allow to create "chunks" of lines based on a given line (for <code class="prettyprint">chunk</code>) or pair of lines (for <code class="prettyprint">chunk2</code>) naturally serving as "block" delimiters in the read file (for example a pair of "HEADER" followed by a "TRAILER" in my file).</p><p>Second, it is not yet obvious to me if I should use <code class="prettyprint">++</code> or <code class="prettyprint">fby</code> when I'm emitting state-processed lines + "the rest" (in step 2 when doing: <code class="prettyprint">emitAll(result.toSeq.flatten) ++ _</code>). The difference has to do with error/termination management (the <em>fallback</em> process of <code class="prettyprint">Await</code>) and I'm still unclear on how/when to use this.</p><p>Finally I would say that the <code class="prettyprint">scalaz-stream</code> library is intriguing in terms of types. A process is <code class="prettyprint">Process[F[_], O]</code> where O is the type of the output and the type of the input is... nowhere? Actually it is in the <code class="prettyprint">Await[F[_], A, O]</code> constructor as a <code class="prettyprint">forall</code> type. That's not all. In <code class="prettyprint">Await</code> you have the type of request, <code class="prettyprint">F[A]</code>, a function to process elements of type <code class="prettyprint">A</code>: <code class="prettyprint">recv: A => Process[F, O]</code> but no way to <em>extract</em> or <em>map</em> the value <code class="prettyprint">A</code> from the request to pass it to the <code class="prettyprint">recv</code> method! The only way to do that is to provide an additional constraint to the "driver method" by saying, for example, that there is an implicit <code class="prettyprint">Monad[F]</code> somewhere. This is the first time that I see a design where we build structures and <em>then</em> we give them properties when we want to use them. Very unusual.</p><p>I hope this can help other people exploring the library and, who knows, some of this might end up being part of it. Let's see what Paul and others think...</p></div></status><status class="ok"><p></p></status>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5336273.post-68400624297606938602013-07-27T09:35:00.001+09:002013-07-27T09:35:58.534+09:00Endorsing the move on to Java 6This is a short public announcement to say that, as the maintainer of <a href="http://specs2.org">an open-source project</a>, I support the <a href="https://docs.google.com/document/d/1pi8OsiG-hPDjqSge4xqmpZTshryUkMdF4QLBeCf0GXo/edit">move to Java 6</a> this year.
I encourage other OSS projects, especially in the Scala ecosystem, to support this move as well.Unknownnoreply@blogger.com3tag:blogger.com,1999:blog-5336273.post-28271059382826501842013-06-20T15:54:00.000+09:002013-06-20T19:45:34.413+09:00A Zipper and Comonad example<status class="ok"><p></p></status><status class="ok"><div style="display: show; text-indent:0px;"><p>There are some software concepts which you hear about and after some time you roughly understand what they are. But you still wonder: "where can I use this?". "Zippers" and "Comonads" are like that. This post will show an example of:</p>
<ul>
<li>using a <code class="prettyprint" style="white-space:pre-wrap">Zipper</code> for a list</li>
<li>using the Comonad <code class="prettyprint" style="white-space:pre-wrap">cojoin</code> operation for the <code class="prettyprint" style="white-space:pre-wrap">Zipper</code></li>
<li>using the new specs2 <code class="prettyprint" style="white-space:pre-wrap">contain</code> matchers to specify collection properties</li>
</ul><p>The context for this example is simply to specify the behaviour of the following function:</p><p><code class="prettyprint" style="white-space:pre-wrap">def partition[A](seq: Seq[A])(relation: (A, A) => Boolean): Seq[NonEmptyList[A]]</code></p><p>Intuitively we want to partition the sequence <code class="prettyprint" style="white-space:pre-wrap">seq</code> into groups so that all the elements in a group have "something in common" with at least one other element. Here is a concrete example </p></div></status><status class="ok"><div style="display: show; text-indent:0px;"><pre><code class="prettyprint" style="white-space:pre-wrap">val near = (n1: Int, n2: Int) => math.abs(n1 - n2) <= 1
partition(Seq(1, 2, 3, 7, 8, 9))(near)
</code></pre><p><code class="prettyprint" style="white-space:pre-wrap">> List(NonEmptyList(1, 2, 3), NonEmptyList(7, 8, 9))</code></p><a name="Properties"><h4>Properties</h4></a><p>If we want to encode this behaviour with ScalaCheck properties we need to check at least 3 things:</p>
<ol>
<li>for each element in a group, there exists at least another related element in the group</li>
<li>for each element in a group, there doesn't exist a related element in any other group</li>
<li>2 elements which are not related must end up in different groups</li>
</ol><p>How do we translate this to some nice Scala code?</p><a name="Contain+matchers"><h4>Contain matchers</h4></a><p><strong><em>"for each element in a group, there exists another related element in the same group"</em></strong> </p>
<pre><code class="prettyprint" style="white-space:pre-wrap">prop { (list: List[Int], relation: (Int, Int) => Boolean) =>
val groups = partition(list)(relation)
groups must contain(relatedElements(relation)).forall
}
</code></pre><p>The property above uses a random list, a random relation, and does the partitioning into <code class="prettyprint" style="white-space:pre-wrap">groups</code>. We want to check that all groups satisfy the property <code class="prettyprint" style="white-space:pre-wrap">relatedElements(relation)</code>. This is done by:</p>
<ul>
<li>using the <code class="prettyprint" style="white-space:pre-wrap">contain</code> matcher</li>
<li>passing it the <code class="prettyprint" style="white-space:pre-wrap">relatedElements(relation)</code> function to check a given group</li>
<li>do this check <code class="prettyprint" style="white-space:pre-wrap">forall</code> groups</li>
</ul><p>The <code class="prettyprint" style="white-space:pre-wrap">relatedElements(relation)</code> function we pass has type <code class="prettyprint" style="white-space:pre-wrap">NEL[Int] => MatchResult[NEL[Int]]</code> (<code class="prettyprint" style="white-space:pre-wrap">type NEL[A] = NonEmptyList[A]</code>) and is testing each group. What does it do? It checks that each element of a group has at least one element that is related to it. </p>
<pre><code class="prettyprint" style="white-space:pre-wrap">def relatedElements(relation: (Int, Int) => Boolean) = (group: NonEmptyList[Int]) => {
group.toZipper.cojoin.toStream must contain { zipper: Zipper[Int] =>
(zipper.lefts ++ zipper.rights) must contain(relation.curried(zipper.focus)).forall
}
}
</code></pre><p>This function is probably a bit mysterious so we need to dissect it.</p><a name="Zipper"><h4>Zipper</h4></a><p>In the <code class="prettyprint" style="white-space:pre-wrap">relatedElements</code> function we need to check each element of a group <strong><em>in relation to the other elements</em></strong>. This means that we need to <strong><em>traverse the sequence, while keeping the context of where we are in the traversal</em></strong>. This is <em>exactly</em> what a <code class="prettyprint" style="white-space:pre-wrap">Zipper</code> is good at!</p><p>A <a href="http://eed3si9n.com/learning-scalaz/Zipper.html"><code class="prettyprint" style="white-space:pre-wrap">List Zipper</code></a> is a structure which keeps the <code class="prettyprint" style="white-space:pre-wrap">focus</code> on one element of the list and can return the elements on the left or the elements on the right. So in the code above we transform the <code class="prettyprint" style="white-space:pre-wrap">group</code> into a <code class="prettyprint" style="white-space:pre-wrap">Zipper</code> with the <code class="prettyprint" style="white-space:pre-wrap">toZipper</code> method. Note that this works because the group is a <code class="prettyprint" style="white-space:pre-wrap">NonEmptyList</code>. This wouldn't work with a regular <code class="prettyprint" style="white-space:pre-wrap">List</code> because a <code class="prettyprint" style="white-space:pre-wrap">Zipper</code> cannot be empty, it needs something to focus on: </p>
<pre><code class="prettyprint" style="white-space:pre-wrap">// a zipper for [1, 2, 3, 4, 5, 6, 7, 8, 9]
// lefts focus rights
// [ [1, 2, 3] 4 [5, 6, 7, 8, 9] ]
</code></pre><p>Now that we have a <code class="prettyprint" style="white-space:pre-wrap">Zipper</code> that is focusing on the one element of the group. But we don't want to test only one element, we want to test all of them, so we need to get <em>all</em> the possible zippers over the original group!</p><a name="Cojoin"><h4>Cojoin</h4></a><p>It turns out that there is a method doing exactly this for <code class="prettyprint" style="white-space:pre-wrap">Zippers</code>, it is called <code class="prettyprint" style="white-space:pre-wrap">cojoin</code>. I won't go here into the full explanation of what a <code class="prettyprint" style="white-space:pre-wrap">Comonad</code> is, but the important points are:</p>
<ul>
<li><code class="prettyprint" style="white-space:pre-wrap">Zipper</code> has a <code class="prettyprint" style="white-space:pre-wrap">Comonad</code> instance</li>
<li><code class="prettyprint" style="white-space:pre-wrap">Comonad</code> has a <code class="prettyprint" style="white-space:pre-wrap">cojoin</code> method with this signature <code class="prettyprint" style="white-space:pre-wrap">cojoin[A](zipper: Zipper[A]): Zipper[Zipper[A]]</code></li>
</ul><p>Thanks to <code class="prettyprint" style="white-space:pre-wrap">cojoin</code> we can create a <code class="prettyprint" style="white-space:pre-wrap">Zipper</code> of all the <code class="prettyprint" style="white-space:pre-wrap">Zippers</code>, turn it into a <code class="prettyprint" style="white-space:pre-wrap">Stream[Zipper[Int]]</code> and do the checks that really matters to us </p>
<pre><code class="prettyprint" style="white-space:pre-wrap">def relatedElements(relation: (Int, Int) => Boolean) = (group: NonEmptyList[Int]) => {
group.toZipper.cojoin.toStream must contain { zipper: Zipper[Int] =>
val otherElements = zipper.lefts ++ zipper.rights
otherElements must contain(relation.curried(zipper.focus))
}
}
</code></pre><p>We get the <code class="prettyprint" style="white-space:pre-wrap">focus</code> of the <code class="prettyprint" style="white-space:pre-wrap">Zipper</code>, an element, and we check it is related to at least one other element in that group. This is easy because the <code class="prettyprint" style="white-space:pre-wrap">Zipper</code> gives us all the other elements on the <code class="prettyprint" style="white-space:pre-wrap">left</code> and on the <code class="prettyprint" style="white-space:pre-wrap">right</code>.</p><a name="Cobind"><h4>Cobind</h4></a><p>If you know a little bit about Monads and Comonads you know that there is a dualism between <code class="prettyprint" style="white-space:pre-wrap">join</code> in Monads and <code class="prettyprint" style="white-space:pre-wrap">cojoin</code> in Comonads. But there is also one between <code class="prettyprint" style="white-space:pre-wrap">bind</code> and <code class="prettyprint" style="white-space:pre-wrap">cobind</code>. Is it possible to use <code class="prettyprint" style="white-space:pre-wrap">cobind</code> then to implement the <code class="prettyprint" style="white-space:pre-wrap">relatedElements</code> function? Yes it is, and the result is slightly different (arguably less understandable): </p>
<pre><code class="prettyprint" style="white-space:pre-wrap">def relatedElements(relation: (Int, Int) => Boolean) = (group: NonEmptyList[Int]) => {
group.toZipper.cobind { zipper: Zipper[Int] =>
val otherElements = zipper.lefts ++ zipper.rights
otherElements must contain(relation.curried(zipper.focus))
}.toStream must contain((_:MatchResult[_]).isSuccess).forall
}
</code></pre><p>In this case we <code class="prettyprint" style="white-space:pre-wrap">cobind</code> each zipper with a function that will check if there are related elements in the groups. This will gives us back a <code class="prettyprint" style="white-space:pre-wrap">Zipper</code> of results and we need to make sure that it full of <code class="prettyprint" style="white-space:pre-wrap">success</code> values.</p><a name="Second+property"><h4>Second property</h4></a><p><strong><em>"for each element in a group, there doesn't exist a related element in another group"</em></strong> </p>
<pre><code class="prettyprint" style="white-space:pre-wrap">prop { (list: List[Int], relation: (Int, Int) => Boolean) =>
val groups = partition(list)(relation)
groups match {
case Nil => list must beEmpty
case head :: tail => nel(head, tail).toZipper.cojoin.toStream must not contain(relatedElementsAcrossGroups(relation))
}
}
</code></pre><p>This property applies the same technique but now across groups of elements by creating a <code class="prettyprint" style="white-space:pre-wrap">Zipper[NonEmptyList[Int]]</code> instead of a <code class="prettyprint" style="white-space:pre-wrap">Zipper[Int]</code> as before: </p>
<pre><code class="prettyprint" style="white-space:pre-wrap">def relatedElementsAcrossGroups(relation: (Int, Int) => Boolean) = (groups: Zipper[NonEmptyList[Int]]) =>
groups.focus.list must contain { e1: Int =>
val otherGroups = (groups.lefts ++ groups.rights).map(_.list).flatten
otherGroups must contain(relation.curried(e1))
}
</code></pre><p>Note that the ability to "nest" the new specs2 <code class="prettyprint" style="white-space:pre-wrap">contain</code> matchers is very useful in this situation.</p><a name="Last+property"><h4>Last property</h4></a><p>Finally the last property is much easier because it doesn't require any context to be tested. For this property we just make sure that no element is ever related to another one and check that they end up partitioned into distinct groups.</p><p><strong><em>"2 elements which are not related must end up in different groups"</em></strong> </p></div></status><status class="ok"><div style="display: show; text-indent:0px;"><pre><code class="prettyprint" style="white-space:pre-wrap">prop { (list: List[Int]) =>
val neverRelated = (n1: Int, n2: Int) => false
val groups = partition(list)(neverRelated)
groups must have size(list.size)
}
</code></pre><a name="Conclusion"><h4>Conclusion</h4></a><p>Building an intuition for those crazy concepts is really what counts. For me it was "traversal with a context". Then I was finally able to spot it in my own code.</p></div></status>Unknownnoreply@blogger.com3tag:blogger.com,1999:blog-5336273.post-56573377744383329202013-06-08T09:15:00.001+09:002015-02-21T13:56:41.580+09:00Specs2 2.0 - Interpolated - RC2<status class="ok"><div style="display: show; text-indent:0px;"><p><p>This is a quick update to present the main differences with specs2 2.0-RC1. I have been fixing a <a href="https://github.com/etorreborre/specs2/issues/161">few</a> <a href="https://github.com/etorreborre/specs2/issues/163">bugs</a> but more importantly I have:</p>
<ul>
<li>made the <code class="prettyprint">Tags</code> trait part of the standard specification</li>
<li>removed some arguments for reporting and made the formatting of specifications more granular</li>
</ul><p>This all started with an <a href="https://github.com/etorreborre/specs2/issues/162">issue on Github</a>...</p><a name="Formatting"><h3>Formatting</h3></a><p>Creating reports for specifications is a bit tricky. On one hand you hand different possible "styles" for the specifications: "old" acceptance style (with the <code class="prettyprint">^</code> operator), "new" acceptance style (with interpolated strings), "unit" style... Then, on the other hand, you want to report the results in the console, where information is logged on a line-by-line base and in HTML files, where newlines, whitespace and indentation all needs great care.</p><p>I don't think I got it quite right yet, especially for HTML, but working on <a href="https://github.com/etorreborre/specs2/issues/162">issue #162</a> forced me to make specs2 implementation and API a bit more flexible. In particular, in specs2 < 2.0, you could set some arguments to control the display of the specification in the console and/or HTML. For example <code class="prettyprint">noindent</code> is a Specification argument saying that you don't want the automatic indentation of text and examples. And <code class="prettyprint">markdown = false</code> means that you don't want text to be parsed as Markdown before being rendered to HTML.</p><p>However <a href="https://github.com/etorreborre/specs2/issues/162">issue 162</a> shows that setting formatting properties at the level of the whole specification doesn't play well with other features like specification inclusion. I decided to fix this issue by using an existing specs2 feature: tags.</p><a name="Tags+and+Specification"><h3>Tags and Specification</h3></a><p>Tags in specs2 are different from tags you can find in other testing libraries. Not only you can tag single examples but you can also mark a full section of a specification with some tags. We can use this capability to select specific parts of a specification for execution but we can also use it to direct the formatting of the specification text. For example you can now write: </p></p></div></status><status class="ok"><div style="display: show; text-indent:0px;"><p><pre><code class="prettyprint">class MySpec extends Specification { def is = s2""" ${formatSection(verbatim = false)}
This text uses Markdown when printed to html, however if some text is indented with 4 spaces
it should *not* be rendered as a code block because `verbatim` is false.
"""
}
</code></pre><p>Given the versatile use of tags now, I decided to include the <code class="prettyprint">Tags</code> trait, by default, in the <code class="prettyprint">Specification</code> class. I resisted doing that in the past because I didn't want to encumber too much the <code class="prettyprint">Specification</code> namespace with something that was rarely used by some users. Which leads me to the following tip on how to use the <code class="prettyprint">Specification</code> class:</p>
<ul>
<li><p>when starting a new project or prototyping some code, use the <code class="prettyprint">Specification</code> class directly with all inherited features</p></li>
<li><p>when making your project more robust and production-like, create your own <code class="prettyprint">Spec</code> trait, generally inheriting from the <code class="prettyprint">BaseSpecification</code> class for basic features, and mix in only the traits you think you will generally use</p></li>
</ul><p>This should give you more flexibility and choice over which specs2 feature you want to use with a minimal cost in terms of namespace footprint and compile times (because each new implicit you bring in might have an impact in terms of performances)</p><a name="API+changes"><h3>API changes</h3></a><p>The consequence of this evolution is yet another API break:</p>
<ul>
<li>the <code class="prettyprint">Text</code> and <code class="prettyprint">Example</code> classes now use a <code class="prettyprint">FormattedString</code> class containing the necessary parameters to display that string as HTML or in the console</li>
<li>for implementation reasons I have actually changed the constructor parameters of all <code class="prettyprint">Fragment</code> classes to avoid storing state as private variables</li>
<li>the <code class="prettyprint">noindent</code>, <code class="prettyprint">markdown</code> arguments are now gone (you need to replace them with <code class="prettyprint">${formatSection(flow=true)}</code> and <code class="prettyprint">${formatSection(markdown=true)}</code>, see below)</li>
<li>the <code class="prettyprint">Tags</code> trait is mixed in the <code class="prettyprint">Specification</code> class so if you had methods like <code class="prettyprint">def tag</code> you might get conflicts</li>
</ul><p>And there are now 2 methods <code class="prettyprint">formatSection(flow: Boolean, markdown: Boolean, verbatim: Boolean)</code> and <code class="prettyprint">formatTag(flow: Boolean, markdown: Boolean, verbatim: Boolean)</code> to tag specification fragments with the following parameters:</p>
<ul>
<li><code class="prettyprint">flow</code>: the fragment (<code class="prettyprint">Text</code> or <code class="prettyprint">Example</code>) shouldn't be reported with automatic indenting (default = <code class="prettyprint">false</code>, set automatically to <code class="prettyprint">true</code> when using <code class="prettyprint">s2</code> interpolated strings)</li>
<li><code class="prettyprint">markdown</code>: the fragment is using Markdown (default = <code class="prettyprint">true</code>)</li>
<li><code class="prettyprint">verbatim</code>: indented text with more than 4 spaces must be rendered as a code block (default = <code class="prettyprint">true</code> but can be set to <code class="prettyprint">false</code> to solve <a href="https://github.com/etorreborre/specs2/issues/162">#162</a>)</li>
</ul><a name="HTML+reports"><h3>HTML reports</h3></a><p>I'm currently thinking that I should try out a brand new way of translating an executed specification with interpolated text into HTML. My first attempts were not completely successful and I find it hard to preserve the original layout of the specification text, especially with the <code class="prettyprint">Markdown</code> translation in the middle. Yet, I must say a word on the Markdown library I'm using, <a href="https://github.com/sirthias/pegdown">Pegdown</a>. I found this library extremely easy to adapt for my current needs (to implement the <code class="prettyprint">verbatim = false</code> option) and I send my kudos to <a href="https://github.com/sirthias">Mathias</a> for such a great job.</p>
<hr /><p>This is it. Download RC2, use it and provide feedback as usual, thanks!</p></p></div></status>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5336273.post-14975763945411286282013-05-21T09:32:00.001+09:002013-05-21T09:33:36.737+09:00Specs2 2.0 - Interpolated<status class="ok"></status><status class="ok"></status><status class="ok"><div class="level0" style="display: show"><p><p>The latest release of specs2 (2.0) deserves a little bit more than just release notes. It needs explanations, apologies and a bit of celebration!</p><p><em>Explanations</em></p>
<ul>
<li>why is there another (actually several!) new style(s) of writing acceptance specifications</li>
<li>what are <code class="prettyprint">Script</code>s and <code class="prettyprint">ScriptTemplate</code>s</li>
<li>what has been done for compilation times</li>
<li>what you can do with <code class="prettyprint">Snippets</code></li>
<li>what is an <code class="prettyprint">ExampleFactory</code></li>
</ul><p><em>Apologies</em></p>
<ul>
<li>the <code class="prettyprint">>></code> / <code class="prettyprint">in</code> problem</li>
<li>API breaks</li>
<li><code class="prettyprint">Traversable</code> matchers</li>
<li><code class="prettyprint">AroundOutside</code> and <code class="prettyprint">Fixture</code></li>
<li>the never-ending quest for <code class="prettyprint">Given/When/Then</code> specifications</li>
</ul><p><em>Celebration</em></p>
<ul>
<li>compiler-checked documentation!</li>
<li>"operator-less" specifications!</li>
<li>more consistent <code class="prettyprint">Traversable</code> matchers!</li>
</ul><a name="Explanations"><h3>Explanations</h3></a><p>Scala 2.10 is a game changer for specs2, thanks to 2 features: <a href="http://docs.scala-lang.org/overviews/core/string-interpolation.html">String interpolation</a> and <a href="http://docs.scala-lang.org/overviews/macros/overview.html">Macros</a>.</p><a name="String+interpolation"><h4>String interpolation</h4></a><p>Specs2 has been designed from the start with the idea that it should be <a href="http://etorreborre.github.io/specs2/guide/org.specs2.guide.Philosophy.html">immutable by default</a>. This has led to the definition of <em>Acceptance specifications</em> with lots of operators, or, as someone put it elegantly, <a href="http://bit.ly/179dDzq">"code on the left, brainfuck on the right"</a>: </p>
<pre><code class="prettyprint">class HelloWorldSpec extends Specification { def is =
"This is a specification to check the 'Hello world' string" ^
p^
"The 'Hello world' string should" ^
"contain 11 characters" ! e1^
"start with 'Hello'" ! e2^
"end with 'world'" ! e3^
end
def e1 = "Hello world" must have size(11)
def e2 = "Hello world" must startWith("Hello")
def e3 = "Hello world" must endWith("world")
}
</code></pre><p>Fortunately Scala 2.10 now offers a great alternative with String interpolation. In itself, String interpolation is not revolutionary. A string starting with <code class="prettyprint">s</code> can have interpolated variables: </p></p></div></status><status class="ok"><div class="level0" style="display: show"><p><pre><code class="prettyprint">val name = "Eric"
s"Hello $name!"
</code></pre><p><code class="prettyprint">Hello Eric!</code></p><p>But the great powers behind Scala realized that they could both provide standard String interpolation <em>and</em> give you the ability to make your own. Exactly what I needed to make these pesky operators disappear! </p>
<pre><code class="prettyprint">class HelloWorldSpec extends Specification { def is = s2"""
This is a specification to check the 'Hello world' string
The 'Hello world' string should
contain 11 characters $e1
start with 'Hello' $e2
end with 'world' $e3
"""
def e1 = "Hello world" must have size(11)
def e2 = "Hello world" must startWith("Hello")
def e3 = "Hello world" must endWith("world")
}
</code></pre><p>What has changed in the specification above is that text <code class="prettyprint">Fragment</code>s are now regular strings in the multiline <strong><em><code class="prettyprint">s2</code></em></strong> string and the examples are now inserted as interpolated variables. Let's explore in more details some aspects of this new feature:</p>
<ul>
<li>layout</li>
<li>examples descriptions</li>
<li>other fragments</li>
<li>implicit conversions</li>
<li>auto-examples</li>
</ul><a name="Layout"><h5>Layout</h5></a><p>If you run the <code class="prettyprint">HelloWorldSpec</code> you will see that the indentation of each example is respected in the output:</p>
<pre><code class="prettyprint">This is a specification to check the 'Hello world' string
The 'Hello world' string should
+ contain 11 characters
+ start with 'Hello'
+ end with 'world'
</code></pre><p>This means that you don't have to worry anymore about the layout of text and use the <code class="prettyprint">p</code>, <code class="prettyprint">t</code>, <code class="prettyprint">bt</code>, <code class="prettyprint">end</code>, <code class="prettyprint">endp</code> formatting fragments as before.</p><a name="Examples+descriptions"><h5>Examples descriptions</h5></a><p>On the other hand, the string which is taken as the example description is not as well delimited anymore, so it is now choosen by convention to be everything that is on the same line. For example this is what you get with the new interpolated string: </p></p></div></status><status class="ok"><div class="level0" style="display: show"><p><pre><code class="prettyprint">s2"""
My software should
do something that it pretty long to explain,
so long that it needs 2 lines" ${ 1 must_== 1 }
"""
</code></pre>
<pre><code class="prettyprint">My software should
do something that it pretty long to explain,
+ so long that it needs 2 lines"
</code></pre><p>If you want the 2 lines to be included in the example description you will need to use the "old" form of creating an example: </p></p></div></status><status class="ok"><div class="level0" style="display: show"><p><pre><code class="prettyprint">s2"""
My software should
${ """do something that it pretty long to explain,
so long that it needs 2 lines""" ! { 1 must_== 1 } }
"""
</code></pre>
<pre><code class="prettyprint">My software should+ do something that it pretty long to explain,
so long that it needs 2 lines
</code></pre><p>But I suspect that there will be very few times when you will want to do that.</p><a name="Other+fragments+and+variables"><h5>Other fragments and variables</h5></a><p>Inside the <strong><em><code class="prettyprint">s2</code></em></strong> string you can interpolate all the usual specs2 fragments: Steps, Actions, included specifications, Forms... However you will quickly realize that you can not interpolate arbitrary objects. Indeed, excepted specs2 objects, the only other 2 types which you can use as variables are <code class="prettyprint">Snippets</code> (see below) and Strings.</p><p>The restriction is there to remind you that, in general, interpolated expressions are "unsafe". If the expression you're interpolating is throwing an Exception, as it is commonly the case with tested code, there is no way to catch that exception. If that exception is uncaught, the <em>whole</em> specification will fail to be built. Why is that?</p><a name="Implicit+conversions"><h5>Implicit conversions</h5></a><p>When I first started to experiment with interpolated strings I thought that they could even be used to write <em>Unit Specifications</em>: </p>
<pre><code class="prettyprint">s2"""
This is an example of conversion using integers ${
val (a, b) = ("1".toInt, "2".toInt)
(a + b) must_== 3
}
"""
</code></pre><p>Unfortunately such specifications will horribly break if there is an error in one of the examples. For instance if the example was:</p>
<pre><code class="prettyprint">This is an example of conversion using integers ${
// oops, this is going to throw a NumberFormatException!
val (a, b) = ("!".toInt, "2".toInt)
(a + b) must_== 3
}
</code></pre><p>Then the whole string and the <em>whole</em> specification will fail to be instantiated! </p><p>The reason is that everything you interpolate is converted, through an implicit conversion, to a "SpecPart" which will be interpreted differently depending on its type. If it is a <code class="prettyprint">Result</code> then we will interpret this as the body of an <code class="prettyprint">Example</code> and use the preceding text as the description. If it is just a simple string then it is just inserted in the specification as a piece of text. But implicit conversions of a block of code, as above, are not converting the whole block. They are merely <a href="http://bit.ly/11qietE">converting the last value</a>! So if anything before the last value throws an Exception you will have absolutely no way to catch it and it will bubble up to the top.</p><p>That means that you need to be very prudent when interpolating arbitrary blocks. One work-around is to do something like that </p>
<pre><code class="prettyprint">import execute.{AsResult => >>}
s2"""
This is an example of conversion using integers ${>>{
val (a, b) = ("!".toInt, "2".toInt)
(a + b) must_== 3
}}
"""
</code></pre><p>But you have to admit that the whole <code class="prettyprint">${>>{...}}</code> is not exactly gorgeous.</p><a name="Auto-examples"><h5>Auto-examples</h5></a><p>One clear win of Scala 2.10 for specs2 is the use of macros to capture code expressions. This particularly interesting with so-called "auto-examples". This feature is really useful when your examples are so self-descriptive that a textual description feels redundant. For example if you want to specify the <code class="prettyprint">String.capitalize</code> method: </p></p></div></status><status class="ok"><div class="level0" style="display: show"><p><pre><code class="prettyprint">s2"""
The `capitalize` method verifies
${ "hello".capitalize === "Hello" }
${ "Hello".capitalize === "Hello" }
${ "hello world".capitalize === "Hello world" }
"""
</code></pre>
<pre><code class="prettyprint"> The `capitalize` method verifies
+ "hello".capitalize === "Hello"
+ "Hello".capitalize === "Hello"
+ "hello world".capitalize === "Hello world"
</code></pre><p>It turns out that the method interpolating the <strong><em><code class="prettyprint">s2</code></em></strong> extractor is using a macro to extract the text for each interpolated expression and so, if on a given line there is no preceding text, we take the captured expression as the example description. It is important to note that this will only properly work if you enable the <code class="prettyprint">-Yrangepos</code> scalac option (in sbt: <code class="prettyprint">scalacOptions in Test := Seq("-Yrangepos")</code>). </p><p>However the drawback of using that option is the compilation speed cost which you can incur (around 10% in my own measurements). If you don't want (or you forget :-)) to use that option there is a default implementation which should do the trick in most cases but which might not capture all the text in some edge cases.</p><a name="Scripts"><h4>Scripts</h4></a><p>The work on <code class="prettyprint">Given/When/Then</code> specifications has led to a nice generalisation. Since the new <code class="prettyprint">GWT</code> trait decouples the specification text from the steps and examples to create, we can push this idea a bit further and create "classical" specifications where the text is not annotated at all and examples are described somewhere else.</p><p>Let's see what we can do with the <code class="prettyprint">org.specs2.specification.script.Specification</code> class: </p>
<pre><code class="prettyprint">import org.specs2._
import specification._
class StringSpecification extends script.Specification with Grouped { def is = s2"""
Addition
========
It is possible to add strings with the + operator
+ one string and an empty string
+ 2 non-empty strings
Multiplication
==============
It is also possible to duplicate a string with the * operator
+ using a positive integer duplicates the string
+ using a negative integer returns an empty string
"""
"addition" - new group {
eg := ("hello" + "") === "hello"
eg := ("hello" + " world") === "hello world"
}
"multiplication" - new group {
eg := ("hello" * 2) === "hellohello"
eg := ("hello" * -1) must beEmpty
}
}
</code></pre><p>With <code class="prettyprint">script.Specification</code>s you just provide a piece of text where examples are starting with a <code class="prettyprint">+</code> sign and you specify <em>examples groups</em>. Example groups were introduced in a previous version of specs2 with the idea of providing standard names for examples in Acceptance specifications. </p><p>When the specification is executed, the first 2 example lines are mapped to the examples of the first group, and the examples lines from the next block (as delimited with a Markdown title) are used to build examples by taking expectations in the second group (those group are automatically given names, <code class="prettyprint">g1</code> and <code class="prettyprint">g2</code>, but you can specify them yourself: <code class="prettyprint">"addition" - new g1 {...</code>).</p><p>This seems to be a lot of "convention over configuration" but this is actually all configurable! The <code class="prettyprint">script.Specification</code> class is an example of a <code class="prettyprint">Script</code> and it is associated with a <code class="prettyprint">ScriptTemplate</code> which defines how to parse text to create fragments based on the information contained in the <code class="prettyprint">Script</code> (we will see another example of this in action below with the <code class="prettyprint">GWT</code> trait which proposes another type of <code class="prettyprint">Script</code> named <code class="prettyprint">Scenario</code> to define <code class="prettyprint">Given/When/Then</code> steps).</p><p>There are lots of advantages in adopting this new <code class="prettyprint">script.Specification</code> class:</p>
<ul>
<li><p>it is "operator-free", there's no need to annotate your specification on the right with strange symbols</p></li>
<li><p>tags are automatically inserted for you so that it's easy to re-run a specific example or group of examples by name: <code class="prettyprint">test-only StringSpecification -- include g2.e1</code></p></li>
<li><p>examples are marked as pending if you haven't yet implemented them</p></li>
<li><p>it is configurable to accomodate for other templates (you could even create <a href="www.cukes.info">Cucumber</a>-like specifications if that's your thing!)</p></li>
</ul><p>The obvious drawback is the decoupling between the text and the examples code. If you restructure the text you will have to restructure the examples accordingly and knowing which example is described by which piece of text is not obvious. This, or operators on the right-hand side, choose your poison :-)</p><a name="Compilation+times"><h4>Compilation times</h4></a><p>Scala's typechecking and JVM interoperability comes with a big price in terms of compilation times. Moderately-sized projects can take minutes to compile which is very annoying for someone coming from Java or Haskell. </p><p>Bill Venners has tried to do a <a href="http://www.artima.com/articles/compile_time.html">systematic study</a> of which features in testing libraries seems to have the biggest impact. It turns out that implicits, traits and byname parameters have a significant impact on compilation times. Since specs2 is using those features more than any other test library, I tried to do something about it.</p><p>The easiest thing to do was to make <code class="prettyprint">Specification</code> an abstract class, not a trait (and provide the <code class="prettyprint">SpecificationLike</code> trait in its place). My unscientific estimation is that this single change removed 0.5 seconds per compiled file (from 313s to 237s for the specs2 build, and a memory reduction of 55Mb, from 225Mb to 170Mb).</p><p>Then, the next very significant improvement was to use interpolated specifications instead of the previous style of Acceptance specifications. The result is impressive: from 237 seconds to 150 seconds and a memory reduction of more than 120Mb, from 170Mb to 47Mb!</p><p>On the other hand, when I tried to remove some of the byname parameters (the left part of <code class="prettyprint">a must_== b</code>) I didn't observe a real impact on compilation times (only 15% less memory).</p><p>The last thing I did was to remove some of the default matchers (and to add a few others). Those matchers are the "content" matchers: <code class="prettyprint">XmlMatchers</code>, <code class="prettyprint">JsonMatchers</code>, <code class="prettyprint">FileMatchers</code>, <code class="prettyprint">ContentMatchers</code> (and I added instead the <code class="prettyprint">TryMatchers</code>). I did this to remove some implicits from the scope when compiling code but also to reduce the namespace footprint everytime you extend the <code class="prettyprint">Specification</code> class. However I couldn't see a major improvement to compile-time performances with this change.</p><a name="Snippets"><h4>Snippets</h4></a><p>One frustration of software documentation writers is that it is very common to have stale or incorrect code because the API has moved on. What if it was possible to write some code, in the documentation, that will be checked by the compiler? And automatically refactored when you change a method name? </p><p>This is exactly what <code class="prettyprint">Snippets</code> will do for you. When you want to capture and display a piece of code in a Specification you create a <code class="prettyprint">Snippet</code>: </p>
<pre><code class="prettyprint">s2"""
This is an example of addition: ${snippet{
// who knew?
1 + 1 == 2
}}
"""
</code></pre><p>This renders as:</p>
<pre><code class="prettyprint">This is an example of addition
// who knew?
1 + 1 == 2
</code></pre><p>And yes, you guessed it right, the Snippet above was extracted by using another Snippet! I encourage you to read the <a href="http://etorreborre.github.io/specs2/guide/org.specs2.guide.HowTo.html#Capture+snippets">documentation on Snippets</a> to see what you can do with them, the main features are:</p>
<ul>
<li><p>code evaluation: the last value can be displayed as a result</p></li>
<li><p>checks: the last value can be checked and reported as a failure in the Specification</p></li>
<li><p>code hiding: it is possible to hide parts of the code (initialisations, results) by enclosing them in "scissors" comments of the form <code class="prettyprint">// 8<--</code></p></li>
</ul><a name="Example+factory"><h4>Example factory</h4></a><p>Every now and then I get <a href="http://bit.ly/ZBc4rO">a question</a> from users who want to intercept the creation of examples and use the example description to do interesting things before or after the example execution. It is now possible to do so by providing another <code class="prettyprint">ExampleFactory</code> rather than the default one: </p>
<pre><code class="prettyprint">import specification._
class PrintBeforeAfterSpec extends Specification { def is =
"test" ! ok
case class BeforeAfterExample(e: Example) extends BeforeAfter {
def before = println("before "+e.desc)
def after = println("after "+e.desc)
}
override def exampleFactory = new ExampleFactory {
def newExample(e: Example) = {
val context = BeforeAfterExample(e)
e.copy(body = () => context(e.body()))
}
}
}
</code></pre><p>The <code class="prettyprint">PrintBeforeAfterSpec</code> will print the name of each example before and after executing it.</p><a name="Apologies"><h3>Apologies</h3></a><a name="the+%3E%3E+%2F+in+problem"><h4>the <code class="prettyprint">>></code> / <code class="prettyprint">in</code> problem</h4></a><p><a href="https://github.com/etorreborre/specs2/issues/140">This issue</a> has come up at <a href="https://groups.google.com/forum/#!msg/specs2-users/YCPeld1H5kA/yDppSJ8ddo4J">different times</a> and one lesson is: <code class="prettyprint">Unit</code> means "anything" so don't try to be too smart about it. So I owe an apology to the users for this poor API design choice and for the breaking API change that is now ensuing. Please read the thread in the Github issue to learn how to fix compile errors that would result from this change.</p><a name="API+breaks"><h4>API breaks</h4></a><p>While we're on the subject of API breaks, let's make a list:</p>
<ul>
<li><p>Unit values in <code class="prettyprint">>></code> / <code class="prettyprint">in</code>: now you need to explicitly declare if you mean "a list of examples created with foreach" or "a list of expectations created with foreach"</p></li>
<li><p><code class="prettyprint">Specification</code> is not a trait anymore so you should use the <code class="prettyprint">SpecificationLike</code> trait instead if that's what you need (see the <a href="#Compilation+times">Compilation times</a> section)</p></li>
<li><p>Some matchers traits have been removed from the default matchers (XML, JSON, File, Content) so you need to explicitly mix them in (see the <a href="#Compilation+times">Compilation times</a> section)</p></li>
<li><p>The <code class="prettyprint">Given/When/Then</code> functionality has been extracted as a deprecated trait <code class="prettyprint">specification.GivenWhenThen</code> (see the <a href="#Given%2FWhen%2FThen%3F">Given/When/Then?</a> section)</p></li>
<li><p>the negation of the <code class="prettyprint">Map</code> matchers <a href="http://bit.ly/YDebIv">has changed</a> (this can be considered as a fix but this might be a run-time break for some of you)</p></li>
<li><p>many of the <code class="prettyprint">Traversable</code> matchers have been deprecated (see the next section)</p></li>
</ul><a name="Traversable+matchers"><h4><code class="prettyprint">Traversable</code> matchers</h4></a><p>I've had this nagging thought in my mind for some time now but it only reached my conscience recently. I always felt that specs2 matchers for collections were a bit ad-hoc, with not-so-obvious ways to do simple things. After lots of fighting with implicit classes, overloading and subclassing, I think that I have something better to propose.</p><p>With the new API we generalize the type of checks you can perform on elements:</p>
<ul>
<li><p><code class="prettyprint">Seq(1, 2, 3) must contain(2)</code> just checks for the presence of one element in the sequence</p></li>
<li><p>this is equivalent to writing <code class="prettyprint">Seq(1, 2, 3) must contain(equalTo(2))</code> which means that you can pass a matcher to the <code class="prettyprint">contain</code> method. For example <code class="prettyprint">containAnyOf(1, 2, 3)</code> is <code class="prettyprint">contain(anyOf(1, 2, 3))</code> where <code class="prettyprint">anyOf</code> is just another matcher</p></li>
<li><p>and more generally, you can pass any function returning a result! <code class="prettyprint">Seq(1, 2, 3) must contain((i: Int) => i must beEqualTo(2))</code> or <code class="prettyprint">Seq(1, 2, 3) must contain((i: Int) => i == 2)</code> (you can even return a ScalaCheck <code class="prettyprint">Prop</code> if you want)</p></li>
</ul><p>Then we can use combinators to specify how many times we want the check to be performed:</p>
<ul>
<li><p><code class="prettyprint">Seq(1, 2, 3) must contain(2)</code> is equivalent to <code class="prettyprint">Seq(1, 2, 3) must contain(2).atLeastOnce</code></p></li>
<li><p><code class="prettyprint">Seq(1, 2, 3) must contain(2).atMostOnce</code></p></li>
<li><p><code class="prettyprint">Seq(1, 2, 3) must contain(be_>=(2)).atLeast(2.times)</code></p></li>
<li><p><code class="prettyprint">Seq(1, 2, 3) must contain(be_>=(2)).between(1.times, 2.times)</code></p></li>
</ul><p>This covers lots of cases where you would previously use <code class="prettyprint">must have oneElementLike(partialFunction)</code> or <code class="prettyprint">must containMatch(...)</code>. This also can be used instead of the <code class="prettyprint">forall</code>, <code class="prettyprint">atLeastOnce</code> methods. For example <code class="prettyprint">forall(Seq(1, 2, 3)) { (i: Int) => i must be_>=(0) }</code> is <code class="prettyprint">Seq(1, 2, 3) must contain((i: Int) => i must be_>=(0)).forall</code>.</p><p>The other type of matching which you want to perform on collections is with several checks at the time. For example:</p>
<ul>
<li><code class="prettyprint">Seq(1, 2, 3) must contain(allOf(2, 3))</code></li>
</ul><p>This seems similar to the previous case but the combinators you might want to use with several checks are different. <code class="prettyprint">exactly</code> is one of them:</p>
<ul>
<li><code class="prettyprint">Seq(1, 2, 3) must contain(exactly(3, 1, 2))</code> // we don't expect ordered elements by default</li>
</ul><p>Or <code class="prettyprint">inOrder</code></p>
<ul>
<li><code class="prettyprint">Seq(1, 2, 3) must contain(exactly(be_>(0), be_>(1), be_>(2)).inOrder)</code> // with matchers here</li>
</ul><p>One important thing to note though is that, when you are not using <code class="prettyprint">inOrder</code>, the comparison is done greedily, we don't try all the possible combinations of input elements and checks to see if there would be a possibility for the whole expression to match.</p><p>Please explore this new API and report any issue (bug, compilation error) you will find. Most certainly the failure reporting can be improved. The description of failures is much more centralized with this new implementation but also a bit more generic. For now, the failure messages are just listing which elements were not passing the checks but they do not output something nice like: <code class="prettyprint">The sequence 'Seq(1, 2, 3)</code> does not contain exactly the elements 4 and 3 in order: 4 is not found'.</p><a name="AroundOutside+vs+Fixture"><h4><code class="prettyprint">AroundOutside</code> vs <code class="prettyprint">Fixture</code></h4></a><p>My approach to context management in specs2 has been very progressive. First I provided the ability to insert code (and more precisely effects) before or after an Example, reproducing here standard JUnit capabilities. Then I've introduced <code class="prettyprint">Around</code> to place things "in" a context, and <code class="prettyprint">Outside</code> to pass data to an example. And finally <code class="prettyprint">AroundOutside</code> as the ultimate combination of both capabilities.</p><p>I thought that with <code class="prettyprint">AroundOutside</code> you could do whatever you needed to do, end of story. It turns out that it's not so simple. <code class="prettyprint">AroundOutside</code> is not general enough because the generation of <code class="prettyprint">Outside</code> data cannot be controled by the <code class="prettyprint">Around</code> context. This proved to be <em>very</em> problematic for me on a specific use case where I needed to re-run the same example, based on different parameters, with slightly different input data each time. <code class="prettyprint">AroundOutside</code> was just not doing it. The solution? A good old <a href="http://en.wikipedia.org/Test_Fixture.html"><code class="prettyprint">Fixture</code></a>. Very simple, a <code class="prettyprint">Fixture[T]</code>, is a trait like that: </p>
<pre><code class="prettyprint">trait Fixture[T] {
def apply[R : AsResult](f: T => R): Result
}
</code></pre><p>You can define an implicit fixture for all the examples: </p>
<pre><code class="prettyprint">class s extends Specification { def is = s2"""
first example using the magic number $e1
second example using the magic number $e1
"""
implicit def magicNumber = new specification.Fixture[Int] {
def apply[R : AsResult](f: Int => R) = AsResult(f(10))
}
def e1 = (i: Int) => i must be_>(0)
def e2 = (i: Int) => i must be_<(100)
}
</code></pre><p>I'm not particularly happy to add this to the API because it adds to the overall API footprint and learning curve, but in some scenarios this is just indispensable.</p><a name="Given%2FWhen%2FThen%3F"><h4>Given/When/Then?</h4></a><p>With the new "interpolated" style I had to find another way to write <code class="prettyprint">Given/When/Then</code> (GWT) steps. But this is tricky. The trouble with GWT steps is that they are intrisically dependent. You cannot have a <code class="prettyprint">Then</code> step being defined before a <code class="prettyprint">When</code> step for example.</p><p>The "classic" style of acceptance specification is enforcing this at compile time because, in that style, you explicitly chain calls and the types have to "align": </p>
<pre><code class="prettyprint">class GivenWhenThenSpec extends Specification with GivenWhenThen { def is =
"A given-when-then example for a calculator" ^ br^
"Given the following number: ${1}" ^ aNumber^
"And a second number: ${2}" ^ aNumber^
"And a third number: ${6}" ^ aNumber^
"When I use this operator: ${+}" ^ operator^
"Then I should get: ${9}" ^ result^
end
val aNumber: Given[Int] = (_:String).toInt
val operator: When[Seq[Int], Operation] = (numbers: Seq[Int]) => (s: String) => Operation(numbers, s)
val result: Then[Operation] = (operation: Operation) => (s: String) => { operation.calculate must_== s.toInt }
case class Operation(numbers: Seq[Int], operator: String) {
def calculate: Int = if (operator == "+") numbers.sum else numbers.product
}
}
</code></pre><p>We can probably do better than this. What is required?</p>
<ul>
<li>to extract strings from text and transform them to well-typed values</li>
<li>to define functions using those values so that types are respected</li>
<li>to restrict the composition of functions so that a proper order of <code class="prettyprint">Given/When/Then</code> is respected</li>
<li>to transform all of this into <code class="prettyprint">Steps</code> and <code class="prettyprint">Examples</code></li>
</ul><p>So, with apologies for coming up with <em>yet-another-way</em> of doing the same thing, let me introduce you to the <code class="prettyprint">GWT</code> trait: </p></p></div></status><status class="ok"><div class="level0" style="display: show"><p><pre><code class="prettyprint">import org.specs2._
import specification.script.{GWT, StandardRegexStepParsers}
class GWTSpec extends Specification with GWT with StandardRegexStepParsers { def is = s2"""
A given-when-then example for a calculator ${calculator.start}
Given the following number: 1
And a second number: 2
And a third number: 6
When I use this operator: +
Then I should get: 9
And it should be >: 0 ${calculator.end}
"""
val anOperator = readAs(".*: (.)$").and((s: String) => s)
val calculator =
Scenario("calculator").
given(anInt).
given(anInt).
given(anInt).
when(anOperator) { case op :: i :: j :: k :: _ => if (op == "+") (i+j+k) else (i*j*k) }.
andThen(anInt) { case expected :: sum :: _ => sum === expected }.
andThen(anInt) { case expected :: sum :: _ => sum must be_>(expected) }
}
</code></pre><p>In the specification above, <code class="prettyprint">calculator</code> is a <code class="prettyprint">Scenario</code> object which declares some steps through the <code class="prettyprint">given</code>/<code class="prettyprint">when</code>/<code class="prettyprint">andThen</code> methods. The <code class="prettyprint">Scenario</code> class provides a <a href="http://en.wikipedia.org/wiki/Fluent_interface">fluent interface</a> in order to restrict the order of calls. For example, if you try to call a <code class="prettyprint">given</code> step after a <code class="prettyprint">when</code> step you will get a compilation error. Furthermore steps which are using extracted values from previous steps must use the proper types, what you pass to the <code class="prettyprint">when</code> step has to be a partial function taking in a <a href="https://github.com/milessabin/shapeless"><code class="prettyprint">Shapeless</code></a> <code class="prettyprint">HList</code> of the right type.</p><p>You will also notice that the <code class="prettyprint">calculator</code> is using <code class="prettyprint">anInt</code>, <code class="prettyprint">anOperator</code>. Those are <code class="prettyprint">StepParsers</code>, which are simple objects extracting values from a line of text and returning <code class="prettyprint">Either[Exception, T]</code> depending on the correct conversion of text to a type <code class="prettyprint">T</code>. By default you have access to 2 types of parsers. The first one is <code class="prettyprint">DelimitedStepParser</code> which expects that values to extract are enclosed in <code class="prettyprint">{}</code> delimiters (this is configurable). The other one is <code class="prettyprint">RegexStepParser</code> which uses a regular expression with groups in order to know what to extract. For example <code class="prettyprint">anOperator</code> defines that the operator to extract will be just after the column at the end of the line.</p><p>Finally the <code class="prettyprint">calculator</code> scenario is inserted into the <strong><em><code class="prettyprint">s2</code></em></strong> interpolated string to delimitate the text it applies to. <code class="prettyprint">Scenario</code> being a specific kind of <code class="prettyprint">Script</code> it has an associated <code class="prettyprint">ScriptTemplate</code> which defines that the last lines of the text should be paired with the corresponding <code class="prettyprint">given/when/then</code> method declarations. This is configurable and we can imagine other ways of pairing text to steps (see the <code class="prettyprint">org.specs2.specification.script.GWT.BulletTemplate</code> class for example).</p><p>For reasons which are too long to expose here I've never been a big fan of <code class="prettyprint">Given/When/Then</code> specifications and I guess that the multitude of ways to do that in specs2 shows it. I hope however that the GWT fans will find this approach satisfying and customisable to their taste.</p><a name="Celebration%21"><h3>Celebration!</h3></a><p>I think there are some really exciting things in this upcoming specs2 release for "Executable Software Specifications" lovers.</p><a name="Compiler-checked+documentation"><h5>Compiler-checked documentation</h5></a><p>Having compiler-checked snippets is incredibly useful. I've fixed quite a few bugs in boths specs2 and <a href="http://nicta.github.com/scoobi">Scoobi</a> user guides and I hope that I made them more resistant to future changes that will happen through refactoring (when just renaming things for example). I'm also very happy that, thanks to macros, the ability to capture code was extended to "auto-examples". In previous specs2 versions, this is implemented by looking at stack traces and doing horrendous calculations on where a piece of code would be. This gives me the shivers everytime I have to look at that code!</p><a name="No+operators"><h5>No operators</h5></a><p>The second thing is <code class="prettyprint">Scripts</code> and <code class="prettyprint">ScriptTemplates</code>. There is a trade-off when writing specifications. On one hand we would like to read pure text, without the encumbrance of implementation code, on the other hand, when we read specification code, it's nice to have a short sentence explaining what it does. With this new release there is a continuum of solutions on this trade-off axis:</p>
<ol>
<li>you can have pure text, with no annotations but no navigation is possible to the code (with <code class="prettyprint">org.specs2.specification.script.Specification</code>)</li>
<li>you can have annotated text, with some annotations to access the code (with <code class="prettyprint">org.specs2.Specification</code>)</li>
<li>you can have text interspersed with the code (with <code class="prettyprint">org.specs2.mutable.Specification</code>)</li>
</ol><a name="New+matchers"><h5>New matchers</h5></a><p>I'm pretty happy to have new <code class="prettyprint">Traversable</code> matchers covering a lot more use cases than before in a straight-forward manner. I hope this will reduce the thinking time between "I need to check that" and "Ok, this is how I do it".</p>
<hr /><p>Please try out the new Release Candidate, report bugs, propose enhancements and have fun!</p></p></div></status><status class="ok"><p></p></status>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5336273.post-58787395848354295582012-06-18T00:27:00.000+09:002012-10-17T07:01:38.656+09:00Strong functional programming<p>Programming in a statically typed functional language (think <a href="http://haskell.org">Haskell</a>) gives quite a few safety guarantees as well as a lot of expressivity thanks to the use of higher-order functions. However the type system lets you write some problematic functions like:</p>
<pre><code class="prettyprint">loop :: Int -> Int
loop n = 1 + loop n
</code></pre><p>Why is that problematic? For one, this function never terminates, that can't be good. But also, and this is very annoying, we can not anymore use <a href="http://www.haskell.org/haskellwiki/Equational_reasoning_examples">equational reasoning</a>:</p>
<pre><code class="prettyprint">loop 0 = 1 + loop 0
// substract (loop 0) from both sides, and use `x - x = 0`
0 = 1
</code></pre><p><code class="prettyprint">0 = 1</code> can then allow you to <a href="">prove anything</a> <code class="prettyprint">=> #fail</code>.</p><p>In this post I'm going to present the ideas of "Strong functional programming" (or "Total functional programming") and show how they can help avoid this situation above where unrestricted recursion goes wrong.</p><p><em>Note: most of this post is a transliteration of an article of David Turner, <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.106.5271">"Total Functional Programming"</a>, 2004. I'm also using pieces of <a href="http://kar.kent.ac.uk/21427/1/Ensuring_Streams_Flow.pdf">"Ensuring stream flows"</a> and <a href="http://blog.sigfpe.com/2007/07/data-and-codata.html">this blog article</a> by @sigfpe.</em></p><a name="Life+without+%E2%8A%A5+%28bottom%29"><h3>Life without ⊥ (bottom)</h3></a><p>The type <code class="prettyprint">Int</code> which is used for the definition of the <code class="prettyprint">loop</code> function is not as simple as it looks. It is actually Int (⊥) where ⊥ (called "bottom") a value denoting run-time errors and non-terminating programs. "Total functional programming" doesn't allow this type of values and this why it is called "Total": functions are always defined on their domain (their "input values"), they cannot return an <code class="prettyprint">undefined</code> value.</p><p>Not having to deal with ⊥ has a lot of advantages:</p><a name="Simpler+proofs"><h4>Simpler proofs</h4></a><p>As we've seen in the introduction, we can't really assume that <code class="prettyprint">(e - e = 0)</code> when <code class="prettyprint">(e: Nat)</code> without having to worry about the ⊥ case. When ⊥ is present, proof by induction can not be easily used as read in math textbooks. To refresh your memory, this proof principle states that for any property <code class="prettyprint">P</code>:</p>
<pre><code class="prettyprint">if P(0)
and for all n, P(n) => P(n + 1)
then for all n, P(n)
</code></pre><p>However if you allow ⊥ to sneak in, <a href="http://en.wikibooks.org/wiki/Haskell/Denotational_semantics">this principle becomes a tad more complex</a>.</p><a name="Simpler+language+design"><h4>Simpler language design</h4></a><a name="Arguments+evaluation"><h5>Arguments evaluation</h5></a><p>With no bottom value, one divide between programming languages just disappears. We don't really care anymore about strict evaluation (as in Scheme, Scala) vs lazy evaluation (as in Haskell), the evaluation of a function doesn't depend on the evaluation of its arguments:</p>
<pre><code class="prettyprint">-- a function returning the first argument
first a b = a
-- with strict evaluation
first 1 ⊥ = ⊥
-- with lazy evaluation
first 1 ⊥ = 1
</code></pre><a name="Pattern+matching"><h5>Pattern matching</h5></a><p>But that's not the only difference, another one is pattern matching. In Haskell, when you expect a pair of values, you won't be able to match that against ⊥:</p>
<pre><code class="prettyprint">-- will not match if (a, b) is ⊥
first (a, b) = a
</code></pre><p>You can argue that this is a reasonable language decision to make but that's not the only possible one. In Miranda for instance, the match will always succeed:</p>
<pre><code class="prettyprint">-- a bottom value can be "lifted" to a pair of bottom values
(⊥a, ⊥b) = ⊥
</code></pre><a name="%26+operator"><h5>& operator</h5></a><p>Same thing goes with the evaluation of operators like <code class="prettyprint">&</code> (logical AND). It is defined on Booleans by:</p>
<pre><code class="prettyprint">True & True = True
True & False = False
False & True = False
False & False = False
</code></pre><p>If you introduce ⊥ you have to define:</p>
<pre><code class="prettyprint">⊥ & y = ?
x & ⊥ = ?
</code></pre><p>Most of the programming languages decide to be "left-strict", i.e they evaluate the first argument before trying to evaluate the whole expression:</p>
<pre><code class="prettyprint">⊥ & y = ⊥
x & ⊥ = False if x = False
x & ⊥ = ⊥ otherwise
</code></pre><p>Again, that might seem completely obvious to us after years of using this convention, but we can imagine other alternatives (doubly-strict, right-strict, doubly-non-strict), and this breaks the classical symmetry of the <code class="prettyprint">&</code> operator in mathematics.</p><a name="More+flexibility+in+language+implementation"><h4>More flexibility in language implementation</h4></a><a name="Reduction"><h5>Reduction</h5></a><p>Let's introduce a bit of programming languages theory. When you define a programming language, you need to specify:</p>
<ul>
<li><p>how to build language expressions from "atomic" blocks. For example, how to build an <code class="prettyprint">if</code> expression from a boolean expression and 2 other expressions</p><p><code class="prettyprint">if booleanExpression then expression1 else expression2</code></p></li>
<li><p>how to "execute" the language expressions, usually by specifying how to replace values in expressions with other values. For example using a specific boolean value in an <code class="prettyprint">if</code> expression:</p><p><code class="prettyprint">a = true</code><br /><code class="prettyprint">if a then b else c => c</code></p></li>
</ul><p>As you can see, executing a program means "reducing" large expressions to smaller ones, using specific rules. This raises some immediate questions:</p>
<ol>
<li><p>can we reach a final, irreducible, expression (a <em>normal</em> form)?</p></li>
<li><p>if several "reduction paths" are possible (by using the reduction rules in different ways), do they converge on the same normal form?</p></li>
</ol><p>If you can show that your programming language satisfies 2. when 1. is true, then you can say that it has the <a href="http://en.wikipedia.org/wiki/Church%E2%80%93Rosser_theorem">"Church-Rosser"</a> property or it is <em>confluent</em>. Another version of this property, the "strong" form says that <em>every</em> reduction leads to a normal form and it is unique whatever reduction path you took.</p><p>Total functional programming can be called "Strong" because it has this property. The consequence is that the language implementation is free to select various evaluation strategies, for example to get better memory consumption or parallelism, without any fear for changing the program meaning.</p><a name="Not+all+green"><h3>Not all green</h3></a><p>If Strong Functional Programming is so easy don't we all use it right now? What are the disadvantages then?</p><a name="Not+Turing+complete"><h4>Not Turing complete</h4></a><p>The first issue with Strong Functional Programming is that it is not <a href="http://en.wikipedia.org/wiki/Turing_completeness">Turing complete</a>. It computes programs which always terminate but cannot compute <em>all the programs that always terminate</em>. In particular it can not compute its own interpreter. Why is that?</p><p>Let's say the interpreter is a function <code class="prettyprint">eval</code> taking as arguments, a program <code class="prettyprint">P</code> and its input <code class="prettyprint">I</code> (this example comes from <a href="http://www.haskell.org/pipermail/haskell-cafe/2003-May/004343.html">here</a>). Any program is a big sequence of symbol and can be coded into a number, so our eval interpreter can be seen as a function :: (Nat, Nat) -> Nat. Based on that I can build an <code class="prettyprint">evil</code> function in my total language:</p>
<pre><code class="prettyprint">evil :: Nat -> Nat
evil code = 1 + eval code code
</code></pre><p>But there is also a number corresponding to the <code class="prettyprint">evil</code> program. Let's call that number <code class="prettyprint">666</code>:</p>
<pre><code class="prettyprint">-- this means "apply the evil program aka '666' to the value 666"
-- by definition of the eval interpreter
eval 666 666 = evil 666
-- but if we go back to the definition of evil
evil 666 = 1 + (eval 666 666)
-- we can't get a value for 'evil 666', because it's equivalent to the equation 0 = 1
evil 666 = 1 + evil 666
</code></pre><p>So <code class="prettyprint">evil</code> is a non-terminating program of my language, which is however supposed to terminate because built from <code class="prettyprint">1</code> and <code class="prettyprint">eval</code> (a terminating program by our hypothesis) <code class="prettyprint">=> fail</code>.</p><p>This is a troubling conclusion. One way out is to create a hierarchy of languages where each language above has enough power to write an interpreter for the language below. And, at the top of the hierarchy, we use a Turing-complete language. This is not unsimilar to isolating a typed, functional world from a non-typed, side-effecting one (aka "the real world").</p><a name="Non+terminating%21"><h4>Non terminating!</h4></a><p>Yes, by definition. But non-terminating programs are super useful, you wouldn't want your operating system stop after a few keystrokes saying "I'm done, buddy".</p><p>The good news is that we can actually deal with this situation by using <em>codata</em>. But before we look at codata in detail, we're going to enumerate some of the rules of an "elementary" Strong Programming Functional language (or ESFPL).</p><a name="The+rules+for+non-termination"><h3>The rules for non-termination</h3></a><p>Let's adopt a Haskell-like syntax for our ESPFL. One thing we must be able to do is to define the "usual" datatypes:</p>
<pre><code class="prettyprint">data Bool = True | False
data Nat = Zero | Suc Nat
data List a = Nil | Cons a (List a)
data Tree a = Nilt | Node a (Tree a) (Tree a)
</code></pre><p>and some functions on those types:</p>
<pre><code class="prettyprint">-- size of a Tree
size a :: Tree a -> Nat
size Nilt = 0
size (Node a l r) = 1 + size l + size r
-- filter a List with a function
filter f Nil = Nil
filter f (Cons a rest) if (f a) = Cons a (filter f rest)
otherwise = filter f rest
</code></pre><a name="Rule+n.1%3A+all+case+analysis+must+be+complete"><h4>Rule n.1: all case analysis must be complete</h4></a><p>This one should feel obvious to anyone having used data types and pattern matching before. If you forget a case in your pattern matching analysis, you face the risk of a runtime exception: <code class="prettyprint">NoMatchError</code>.</p><p>But this rule has a larger impact. You will also have to change your "standard" library. For example you can't define the function taking the first element of a list, <code class="prettyprint">head</code>, as before. You need to provide a value to use when the list is empty:</p>
<pre><code class="prettyprint">-- taking the first element of a list
head a :: List a a -> a
head Nil default = default
head (Cons a rest) default = a
</code></pre><p>The other alternative is to program with a data type where things <em>can't</em> go wrong:</p>
<pre><code class="prettyprint">data NonEmptyList a = NCons a (NonEmptyList a)
-- taking the first element of a non-empty list
head a :: NonEmptyList a -> a
head (NCons a rest) = a
</code></pre><p>This kind of discipline doesn't seem so harsh, but other difficulties arise with the use of built-in arithmetic operators, starting with <code class="prettyprint">divide</code>. What do you do about <code class="prettyprint">divide 1 0</code>?</p><p>There are <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.3442">propositions</a> to build data types where "<code class="prettyprint">1 / 0</code>" is defined and equal to "<code class="prettyprint">1</code>"! It looks so weird to me that I'd certainly try to go with a <code class="prettyprint">NonZeroNat</code> data type before anything else.</p><a name="Rule+n.2%3A+type+recursion+must+be+covariant"><h4>Rule n.2: type recursion must be covariant</h4></a><p>I like this rule because it was not obvious to me at all when I first read it. What does it mean?</p>
<ol>
<li>You can define data types recursively</li>
<li>You can use functions as values</li>
<li><p>So you can build the following data type</p><p><code class="prettyprint">data Silly a = Very (Silly -> a)</code></p></li>
</ol><p>With the <code class="prettyprint">Silly</code> data type I can recurse indefinitely:</p>
<pre><code class="prettyprint">bad a :: Silly a -> a
bad (Very f) = f (Very f)
ouch :: a
ouch = bad (Very bad)
</code></pre><p>The <code class="prettyprint">ouch</code> value is a non-terminating one, because the evaluation of <code class="prettyprint">bad</code> will use <code class="prettyprint">bad</code> itself with no case for termination. Rule n.2 is here to prohibit this. You can't define a data type with a function that use this data type as an input value. The fact that this rule was not obvious raises questions:</p>
<ul>
<li>how do we know that we have found all the rules for ESFP?</li>
<li>is there a "minimal" set of rules?</li>
</ul><p>I have no answer to those questions unfortunately :-(.</p><a name="Rule+n.3%3A+recursion+must+be+structural"><h4>Rule n.3: recursion must be structural</h4></a><p>The previous rule was about data type recursion, this one is about function recursion. More precisely the rule says that: "each recursive function call must be on a syntactic sub-component of its formal parameter". This means that if you make a recursive call with the function being defined, it has to be on only a part of the input data, like with the ubiquitous <code class="prettyprint">factorial</code> function:</p>
<pre><code class="prettyprint">factorial :: Nat -> Nat
factorial Zero = 0
factorial (Suc Zero) = 1
-- we recurse with a sub-component of (Suc n)
factorial (Suc n) = (Suc n) * (factorial n)
</code></pre><p>This rule is not too hard to understand. If you always recurse on data that is provably "smaller" than your input data, you can prove that your function will terminate. This rule can also accomodate the Ackermann function, which has 2 <code class="prettyprint">Nat</code> parameters:</p>
<pre><code class="prettyprint">ack :: Nat Nat -> Nat
ack 0 n = n + 1
-- m + 1 is a shortcut for (Suc m)
ack (m + 1) 0 = ack m 1
ack (m + 1) (n + 1) = ack m (ack (m + 1) n)
</code></pre><p>By the way, if you don't know the Ackermann function, <a href="http://en.wikipedia.org/wiki/Ackermann_function">look it up</a>! It has an amazing growth, with very small numbers. Try to evaluate <code class="prettyprint">ack 4 3</code> for fun :-)</p><p>How restrictive is this rule? Not so much. This rule authorizes to program:</p>
<ol>
<li><p><a href="http://en.wikipedia.org/wiki/Primitive_recursive_functions">"primitive recursive"</a> functions. Those are the functions, like <code class="prettyprint">factorial</code>, which are defined using only simple recursion (think how <code class="prettyprint">addition (Suc n) m</code> can be defined in terms of <code class="prettyprint">addition n m</code>) and by composing other primitive functions</p></li>
<li><p>other <a href="http://en.wikipedia.org/wiki/Total_recursive_function">"total recursive"</a> functions, like <code class="prettyprint">ack</code> ("total" meaning "terminating" here). Indeed <code class="prettyprint">ack</code> is not "primitive recursive". It was actually <em>especially</em> created to show that not all "total recursive" functions were "primitive recursive"</p></li>
</ol><p>It turns out that all the functions which we can be proven to terminate by using first-order logic (without <code class="prettyprint">forall</code> and <code class="prettyprint">exists</code>) can be programmed with structural recursion. That's <em>a lot</em> of functions, but we have to change or programming practices in order to use only structural recursion. Let's look at an example.</p><a name="Coding+fast+exponentiation"><h5>Coding fast exponentiation</h5></a><p>We want to code the "power" function. Here's a naive version:</p>
<pre><code class="prettyprint">pow :: Nat -> Nat ->
pow x n = 1, if n == 0
= x * (pow x (n - 1)), otherwise
</code></pre><p>It is primitive recursive but not very efficient because we need <code class="prettyprint">n</code> calls to get the result. We can do better than this:</p>
<pre><code class="prettyprint">pow :: Nat -> Nat -> Nat
pow x n = 1, if n == 0
= x * pow (x * x) (n / 2), if odd n
= pow (x * x) (n / 2), otherwise
</code></pre><p>Unfortunately this version is not primitive recursive, because when we call <code class="prettyprint">pow</code> we're not going from <code class="prettyprint">n + 1</code> to <code class="prettyprint">n</code>. It is however obvious that we're "descending" and that this algorithm will terminate. How can we re-code this algorithm with primitive recursion?</p><p>The trick is to encode the "descent" in a data type:</p>
<pre><code class="prettyprint">-- representation of a binary digit
data Bit = On | Off
-- we assume a built-in bits function
bits :: Nat -> List Bit
-- then the definition of pow is primitive recursive, because we descend on the Bit data type
pow :: Nat -> Nat -> Nat
pow x n = pow1 x (bits n)
pow1 :: Nat -> List Bit -> Nat
pow1 x n = 1
pow1 x (Cons On rest) = x * (pow1 (x * x) rest)
pow1 x (Cons Off rest) = pow1 (x * x) rest
</code></pre><p>David Turner conjectures that many of our algorithms can be coded this way and for cases where it get a bit hairier (like for Euclid's <code class="prettyprint">gcd</code>) we could authorize another type of recursion called "Walther's recursion". With this kind of recursion we can recognize a larger class of programs where recursion is guaranteed to terminate because we only use operations effectively "reducing" the data types (acknowledging that <code class="prettyprint">n / 2</code> is necessarily lower than <code class="prettyprint">n</code>). Using "Walther's recursion" with a language having functions as first-class values is still an unsolved problem though.</p><a name="Codata+for+%22infinite%22+computations"><h3>Codata for "infinite" computations</h3></a><p>After having looked at the rules for bounding recursion and avoiding non-termination, we would still like to be able to program an operating system. The key insight here is that our functions need to terminate but our data doesn't need to. For example, the <code class="prettyprint">Stream</code> data type is infinite:</p>
<pre><code class="prettyprint">data Stream a = Cons a (Stream a)
</code></pre><p>In this perspective, an operating system is a bunch of functions acting on an infinite stream of input values. Let's call this type of infinite data, "codata", and see how we can keep things under control.</p><p>Here's our first definition, <code class="prettyprint">Colist</code> (similar to the <code class="prettyprint">Stream</code> type above):</p>
<pre><code class="prettyprint">-- "a <> Colist a" stands for "Cocons a (Colist a)"
codata Colist a = Conil | a <> Colist a
</code></pre><p>Does it breach the "Strong normalization" property that we described earlier? It doesn't if we say that every expression starting with a coconstructor is not reducable (or is in "normal form"), there is nothing which can be substituted. Whereas with a regular <code class="prettyprint">List</code> you can evaluate:</p>
<pre><code class="prettyprint">Cons (1 + 2) rest = Cons 3 rest
</code></pre><p>This is why we need to have a new keyword <code class="prettyprint">codata</code> to explicitly define this type of data. In a way, this is making a strong distinction between "lazy" and "strict" data, instead of using lazy evaluation to implement infinite data types.</p><a name="Primitive+corecursion"><h4>Primitive corecursion</h4></a><p>The equivalent of Rule n.3 for codata is that all recursion on codata must be "primitive corecursive". You have any idea of what it is? It is kind of the opposite / dual of the structural recursion for data types. Instead of proving that we always reduce the input value when making a recursive call we need to prove that we "augment" the result, by always using a "coconstructor" to create the result:</p>
<pre><code class="prettyprint">-- functions on codata must always use a coconstructor for their result
function a :: Colist a -> Colist a
function a <> rest = 'something' <> (function 'something else')
</code></pre><p>Then we can define functions like:</p>
<pre><code class="prettyprint">ones :: Colist Nat
ones = 1 <> ones
fibonacci :: Colist Nat
fibonacci = f 0 1
where f a b = a <> (fibonacci b (a + b))
</code></pre><p>Another way of looking at the difference between data/recursion and codata/corecursion is that:</p>
<ul>
<li>recursion on data is safe if we break-up the data in smaller pieces <em>then</em> recurse on those pieces</li>
<li>corecursion on codata is safe if we apply the recursion first <em>then</em> put that in a coconstructor which declares that we have infinite data</li>
</ul><p>Codata is data which is (potentially) infinite, but <em>observable</em>. There are methods to safely extract results, one at the time, in a terminating way. That's exactly what the fibonacci function does. It builds a sequence of results, so that you can safely extract the first one, then you have a way to build the rest of the values.</p><p>This is why people also talk about the duality between data and codata as a duality between <em>constructors</em> and <em>destructors</em>. Or a duality between <em>accessing the internals</em> of a data type and <em>observing the behavior</em> of a codata type.</p><a name="Coinductive+proofs"><h4>Coinductive proofs</h4></a><p>When we try to reason about codata and corecursion, we can use coinduction (you saw that one coming didn't you :-)?). The principle of coinduction is the following:</p><p>2 pieces of codata are the same, <em>"bisimilar"</em> in the literature, if:</p>
<ul>
<li>their finite part are equal (the "<code class="prettyprint">a</code>" in "<code class="prettyprint">a <> rest</code>")</li>
<li>their infinite part are the same</li>
</ul><p>In other words, if the 2 sequences always produce the same values. Using that principle we can show the following theorem on infinite structures:</p>
<pre><code class="prettyprint">-- iterate a function: x, f x, f (f x), f (f (f x)),...
iterate f x = x <> iterate f (f x)
-- map a function on a colist
comap f Conil = Conil
comap f a <> rest = (f a) <> (comap f rest)
</code></pre><p>Now, we'd like to show that <code class="prettyprint">iterate f (f x) = comap f (iterate f x)</code>:</p>
<pre><code class="prettyprint">iterate f (f x) = (f x) <> iterate f (f (f x)) -- 1. by definition of iterate
= (f x) <> comap f (iterate f (f x)) -- 2. by hypothesis
= comap f (x <> iterate f (f x)) -- 3. by definition of comap
= comap f (iterate f x) -- 4. by definition of iterate
</code></pre><p>The coinduction principle is used exactly in step 2. If "<code class="prettyprint">iterate f (f (f x))</code>" and "<code class="prettyprint">comap f (iterate f (f x))</code>" are the same, then adding a new value "<code class="prettyprint">(f x)</code>" will preserve equality.</p><a name="Limitations"><h4>Limitations</h4></a><p>The definition of "primitive corecursive" is a bit restrictive. It prevents useful, and well-founded, definitions like:</p>
<pre><code class="prettyprint">-- "evens" would need to come first after <> for this definition to be corecursive
evens = 2 <> (comap (+2) evens)
</code></pre><p>Note that we can't allow any kind of construction with "<>". For example the <code class="prettyprint">bad</code> function below is <em>not</em> terminating:</p>
<pre><code class="prettyprint">-- infinite lists
data Colist a = a <> Colist a
-- the tail of an infinite list
cotail a :: Colist a -> Colist a
cotail a <> rest = rest
-- don't do this at home
bad = 1 <> (cotail bad)
</code></pre><p>In <a href="http://kar.kent.ac.uk/21427/1/Ensuring_Streams_Flow.pdf">"Ensuring streams flow"</a>, David Turner shows that it is possible to develop an algorithm which will check where the corecursion is safe and where it isn't. The formal argument takes a few pages of explanation but the idea is simple. We need to count the "levels of guardedness":</p>
<ul>
<li><p>having <code class="prettyprint">evens</code> recursing <em>inside</em> <code class="prettyprint">comap</code> is not a problem, because <code class="prettyprint">comap</code> will add a new coconstructor</p></li>
<li><p>having <code class="prettyprint">bad</code> recursing <em>inside</em> <code class="prettyprint">cotail</code> is a problem, because <code class="prettyprint">cotail</code> "removes" a coconstructor</p></li>
</ul><p>So the idea of the algorithm is to count the number of times where we "add" or "remove" coconstructors to determine if the corecursion will be safe or not.</p><a name="Comadness"><h4>Comadness</h4></a><p>To conclude with the presentation of codata, I want to briefly introduce comonads and give a short example to give an intuition of what it is.</p><p>If you look at the definition in the books and listen to the category theory people, you will read or hear something like: "get the monad definition, change the arrows direction and you have a comonad":</p>
<pre><code class="prettyprint">-- the comonad operations
-- the dual of return or unit for monads
extract :: W a -> a
-- the dual of bind or flatMap for monads
cobind :: W a -> b -> W a -> W b
</code></pre><p>Intuitively you can think of a comonad as:</p>
<ul>
<li><p>"I can extract things from a Context" (to contrast with the "I can put things in a context" of monads)</p></li>
<li><p>If I have a function which "computes values from inputs values which might change depending on the context": <code class="prettyprint">W a -> b</code>, then I can use "values in context", <code class="prettyprint">W a</code>, to return other "values in context", <code class="prettyprint">W b</code></p></li>
</ul><p>This is still not quite clear, so let's give a simple example based on <code class="prettyprint">Colist</code>. <code class="prettyprint">Colist</code> is a comonad which we can use like that:</p>
<pre><code class="prettyprint">-- a Colist of Nats where each new value is the previous value + 1
nats = 0 <> comap (+1) nats
-- a function taking the first 2 elements of a Colist
firstTwo a :: Colist a -> (a, a)
firstTwo a <> b <> rest = (a, b)
-- now, let's cobind firstTwo to nats
cobind firstTwo nats = (0, 1) <> (1, 2) <> (2, 3) <> ...
</code></pre><p>Sometimes you can read that <code class="prettyprint">List</code> is a monad representing the fact that a function can produce non-deterministic results ("zero or many"). By analogy with the <code class="prettyprint">List</code> monad, we can say that <code class="prettyprint">Colist</code> is a comonad that represents the function can have non-deterministic inputs.</p><a name="Conclusion"><h3>Conclusion</h3></a><p>I touched on a lot of subjects with this post, for which there would be a <em>lot</em> to say (and certainly with less approximation that I did). In the first part we saw that, when dealing with finite data, the notions of termination, computation and recursivity and intimately related. In the second part we saw that it can be beneficial to give a special status to infinite data and to provide for it the same kind of tools (proofs with corecursion, comonads) that we usually have for data.</p><p>I hope that the subject of codata will become more and more popular (this <a href="http://www.cl.cam.ac.uk/~dao29/drafts/codo-notation-orchard12.pdf">recent proposal for a <code class="prettyprint">codo</code> notation in Haskell</a> might be a good sign). We can see a lot of architectures these days turning to <a href="http://martinfowler.com/eaaDev/EventSourcing.html">"Event Sourcing"</a>. I suspect that Codata, Comonads, Costate and all the <a href="http://www.haskell.org/wikiupload/e/e9/Typeclassopedia.pdf">Cotypeclassopedia</a> will prove very useful in dealing with these torrents of data.</p>Unknownnoreply@blogger.com15tag:blogger.com,1999:blog-5336273.post-186477111162361552012-03-22T20:54:00.000+09:002012-03-27T09:05:11.321+09:00Coming full circle<p>In this post I want to show a few of the upcoming features in specs2-1.9 and also take a step back on how specs2 features and implementation unfolded since I started the project one year ago.</p><a name="Why+would+you+want+to+rewrite+from+scratch%3F"><h4>Why would you want to rewrite from scratch?</h4></a><p>Good question, that seems to be very masochistic :-). Not only because of the sheer amount of features to re-implement but more importantly because of the difficulty to move users to a new version.</p><p>I explained in details the reasons why I thought the rewrite was necessary, how it would benefit users and how to migrate in a previous <a href="http://etorreborre.blogspot.com.au/2011/05/specs2-migration-guide.html">blog post</a>. What I want to emphasize here is the compromise I decided to make at that time:</p>
<ul>
<li>more concurrency and reliability</li>
<li>less practicality</li>
</ul><p>The main reason for this compromise was immutability. Forcing myself to immutability had opened the gates of robust and concurrent software but at the same time I had to cut back on the syntax I had proposed for the original specs1 project. For example it wasn't possible anymore to write:</p>
<pre><code class="prettyprint"> // this example can only be "registered" if there are side-effects
"hello must have 5 letters" in {
"hello" must have size(5)
}
"world must have 5 letters" in {
"world" must have size(5)
}
</code></pre><p>Instead, I proposed to write:</p>
<pre><code class="prettyprint"> // examples are "linked" together with the ^ operator
"hello must have 5 letters" ! e1 ^
"world must have 5 letters" ! e2
def e1 = "hello" must have size(5)
def e2 = "world" must have size(5)
</code></pre><p>The feed-back was immediate. Some developers who were sent a preview liked it but others told me: "NEVER I would use this horrible syntax". Ok, fair enough, I can't blame you, it's a bit <a href="http://bit.ly/GFz3Lx">ugly on the right</a>.</p><p>I decided then that I would relax my constraints a little bit and add one, just one, <code class="prettyprint">var</code>. It would only be used to accumulate examples when building the specification, to enable the good old specs1 style. This was also, at least partially, solving the migration problem since a full rewrite of the specifications was not necessary.</p><p>Not everything was as cool as in the original specs1 tough. In particular the super-easy <a href="http://code.google.com/p/specs/wiki/DeclareSpecifications#Execution_model">"isolated" execution model</a> was missing. Until...</p><a name="Lightbulb"><h4>Lightbulb</h4></a><p>... Until it struck me, just one month ago, that reintroducing this mode of execution was less than 10 lines of code!</p><p>Because I had chosen the functional paradigm of "executing a Specification" <code class="prettyprint"><=></code> "interpreting a data structure" it was really easy to change the interpretation to something declaring: "for each example, execute the code in a clone of the Specification so that all local variables are seen as fresh variables".</p><p>In terms of product management, it was a "magic" feature almost for free! To me, this is the illustration of a principle of Software: "<strong><em>if code is a liability, maximise your return on investment</em></strong>".</p><a name="Maximize+the+ROI"><h4>Maximize the ROI</h4></a><p>I'm sure you've already read something like: "<em>features are an asset, code is a liability</em>". But I haven't yet read the logical consequence of it: "<strong><em>Maximize the ROI: extract as many features as you can from your existing code</em></strong>". Indeed every time we write a piece of code, it's worth wondering:</p>
<ul>
<li>how can this be put to a better use?</li>
<li>can I add something slightly different to provide a new useful feature?</li>
<li>can I generalize it a bit to make sense of it in another context?</li>
<li>can I make it composable with an existing feature to get a new one?</li>
</ul><p>That's exactly what happened with the "isolated" feature above. Here's another example of this principle in action.</p><a name="IO+and+serialization+are+not+cheap"><h4>IO and serialization are not cheap</h4></a><p>A few months ago, I developed a feature to create an <a href="http://bit.ly/spec2_create_an_index_page">index page</a>. On this index page I had to show the status of specifications which had been executed as part of the previous run. This meant that I had to store somewhere, on the file system, the results of each specification execution.</p><p>In terms of feature price, this is not a particularly cheap one. Everytime there is some IO interaction and some kind of serialization, the number of possible issues to consider <a href="http://exold.com/article/stupid-interview-questions">is not so small</a>. In a way that was really an "iceberg" feature: not a lot of functionality on the surface, but a lot of machinery under the water. So I wondered how I could improve my ROI on this. Hmm... since I'm writing status information to the disk there might be a way to reuse it!</p><p>Indeed, I can use this information to selectively re-run only the failed examples, or the skipped ones, or... And so on. From there, a new "feature space" opens, and the initial investment starts making sense.</p><p>It is also possible to maximize the ROI on <em>existing features</em>. A feature is an "asset". Perhaps, but it's not completely free either. You have to explain it, to promote it, to show when and how it interacts with the rest of the software. This is why maximizing the ROI of features makes sense as well.</p><p>That's exactly what happened with the brand-new "isolated" feature.</p><a name="All+expectations"><h4>All expectations</h4></a><p>Year after year I'm looking at the specifications that people are writing with specs/specs2. Kind of my hallway usability test for open-source libraries... I usually see several "styles" of specifications and one style is pretty frequent:</p>
<pre><code class="prettyprint"> "This is an example of building/getting/processing a datastructure" >> {
// do stuff
val data = processBigData
// check expectations
data must doThis
data must haveThat
data.parts must beLikeThis
// and so on...
}
</code></pre><p>In that style of specification, there is usually one action and lots of things to check. It is very unfortunate that most of the testing libraries out there will stop at the first failure. Because it really makes sense to collect all of them at once instead of having to execute the specification multiple times to get to all the issues.</p><p>Is there a way to "collect" all the failures in specs2-1.8.2? Not really. You can try use "or" with the expectations:</p>
<pre><code class="prettyprint"> (data must doThis) or
(data must haveThat) or
(data.parts must beLikeThis)
</code></pre><p>And that will collect all the failures up to the first success, and then it will stop executing the rest. Besides, the "or" syntax with all the parenthesis is not so nice.</p><p>After a while I realized that the only way to make this work was to use yet another mutable variable to silently register each result. But then I'd be back to the old specs1 problem. What about concurrency? How can I prevent each concurrent example to step on another example results?</p><p>Well, that's easy, I have the "isolated" feature now! Each example can run in its own copy of the specification and the mutable variable will only collect the results from <em>that</em> example safely. The result is a feature that's easy to implement but also easy to use because it's just a matter of <a href="http://bit.ly/specs2_allexpectations">mixing a trait to the specification</a>!</p><a name="Coming+full+circle"><h4>Coming full circle</h4></a><p>Can I go further with the same thinking? Why not?!</p><p>From the beginning of specs2, I liked the fact that the so-called <a href="http://bit.ly/specs2_styles">"Acceptance specification" style</a> allowed me to write a lot of text about what my system behavior and then annotate it with the actual code. The price to pay was all the symbols on the right of the screen which appeared utterly cryptic to some people (to say it nicely).</p><p>Then, last week, I realized that I could rely on something which exists in most code editors: code folding! In a classical JUnit TestCase, specs1 specification or specs2 mutable specification if you fold the code, you're left with only the text, forming a narrative for your system. If you see it like that, any specs2 mutable specification can be turned into an acceptance specification, just a few features are missing:</p>
<ul>
<li>the ability to add an arbitrary piece of text (instead of <code class="prettyprint">should/in</code> blocks) and <a href="http://bit.ly/specs2_formatting_fragments">formatting fragments</a> (for example to give an overview of the system, before going into the details)</li>
<li><a href="http://etorreborre.github.com/specs2/guide-SNAPSHOT/org.specs2.guide.Structure.html#G%2FW%2FT">given/when/then</a> specifications</li>
<li><a href="http://bit.ly/specs2_autoexamples">auto-examples</a></li>
</ul><p>Not a lot to implement really. Most of the machinery is already provided by the <code class="prettyprint">org.specs2.Specification</code> trait. This means that, for a very reasonable "price", you can now use "mutable" specifications in specs2 with a great deal of expressivity (nothing's perfect though, there are small issues due to semi-column inference for example, see <a href="http://bit.ly/GDR2SN">here</a> for variations around the "Stack" theme).</p><a name="Full+circle+and+blurred+lines"><h4>Full circle and blurred lines</h4></a><p>To me, it really feels that I've come full circle:</p>
<ol>
<li>I rebuilt from scratch the "Specification" concept from specs1, throwing away the syntax</li>
<li>I reintroduced some mutability <em>very</em> carefully to get some of that syntax back</li>
<li>I built upon all the immutable code to finally end up with exactly what I would have liked to have in specs1!</li>
</ol><p>My only concern now is that newcomers might feel lost because the library is not really prescribing a specific style: "should I use a 'unit' style or an 'acceptance' style? And what are the differences between the 2 anyway?". I hope that they'll realize that this is actually an opportunity. An opportunity to try out different ways of communicating and then choosing the most efficient or pleasing.</p>Unknownnoreply@blogger.com6tag:blogger.com,1999:blog-5336273.post-50512324358599066402011-12-09T10:29:00.001+09:002011-12-09T10:56:02.995+09:00Pragmatic IO - part 3<p>A follow-up to <a href="http://etorreborre.blogspot.com/2011/12/pragmatic-io-part-2.html">my previous posts about <code class="prettyprint">IO</code></a>.</p><a name="Why+types+and+laws+matter"></a><h4>Why types and laws matter</h4><p>It's embarrassing to write post after post only to find out I keep being wrong :-). While the technique I described in my last post (using <code class="prettyprint">StreamT</code>) works ok, I actually don't have to use it. The <code class="prettyprint">traverse</code> technique explained in <a href="http://etorreborre.blogspot.com/2011/06/essence-of-iterator-pattern.html">EIP</a> works perfectly, provided that:</p>
<ul>
<li><p>we write a proper <code class="prettyprint">Traverse</code> instance using <code class="prettyprint">foldLeft</code> (as explained in the <a href="http://etorreborre.blogspot.com/2011/12/pragmatic-io-part-2.html">first part of this post</a>)</p></li>
<li><p>we <code class="prettyprint">trampoline</code> the <code class="prettyprint">State</code> to avoid Stack overflows due to a long chain of <code class="prettyprint">flatMaps</code> during the traversal</p></li>
<li><p>we use the right types!</p></li>
</ul><a name="Get+the+types+right"></a><h5>Get the types right</h5><p>That's the point I want to insist on today. Last evening I read the revisited <a href="https://github.com/scalaz/scalaz/blob/scalaz-seven/example/src/main/scala/scalaz/example/WordCount.scala#L35">WordCount example</a> in Scalaz-seven which is like the canonical example of using the EIP ideas. Two words in the comments struck my mind: "compose" and "fuse". Indeed, one very important thing about EIP is the ability to compose <code class="prettyprint">Applicatives</code> so that their actions should "fuse" during a traversal. As if they were executed in the same "for" loop!</p><p>So, when I wrote in my previous post that I needed to traverse the stream of lines several times to get the results, something had to be wrong somewhere. The types were wrong!</p>
<ol>
<li>instead of traversing with <code class="prettyprint">State[S, *]</code> where <code class="prettyprint">* =:= IO[B]</code>, I should traverse with <code class="prettyprint">State[S, IO[*]]</code></li>
<li>what I get back is a <code class="prettyprint">State[S, IO[Seq[B]]]</code>, instead of a <code class="prettyprint">State[S, Seq[IO[B]]</code></li>
<li>this matters because passing in an initial state then returns <code class="prettyprint">IO[Seq[B]]</code> instead of <code class="prettyprint">Seq[IO[B]]</code></li>
</ol><p>Exactly what I want, without having to <code class="prettyprint">sequence</code> anything.</p><a name="Use+the+laws"></a><h5>Use the laws</h5><p>Not only I get what I want, but also it is conceptually right as the result of an important <code class="prettyprint">traverse</code> law:</p>
<pre><code class="prettyprint"> traverse(f compose g) == traverse(f) compose traverse(g)
</code></pre><p>That "fusion" law guarantees that composing 2 effects can be fused, and executed in only one traversal. It's worth instantiating <code class="prettyprint">f</code> and <code class="prettyprint">g</code> to make that more concrete:</p>
<pre><code class="prettyprint"> // what to do with the line and line number
def importLine(i: Int, l: String): (Int, IO[Unit]) =
(i, storeLine(l) >>= println("imported line "+i))
// `g` keeps track of the line number
val g : String => State[Int, String] =
(s: String) => state((i: Int) => (i+1, s))
// `f` takes the current line/line number and do the import/reporting
val f : State[Int, String] => State[Int, IO[Unit]] =
st => state((i: Int) => importLine(st(i)))
// `f compose g` fuses both actions
val f_g: String => State[Int, IO[Unit]] = f compose g
</code></pre><p>I'm really glad that I was able to converge on established principles. Why couldn't I see this earlier?</p><a name="Scala+and+Scalaz-seven+might+help"></a><h4>Scala and Scalaz-seven might help</h4><p>In retrospect, I remember what led me astray. When I realized that I had to use a State transformer with a <code class="prettyprint">Trampoline</code>, I just got scared by the <code class="prettyprint">Applicative</code> instance I had to provide. Its type is:</p>
<pre><code class="prettyprint"> /**
* Here we want M1 to `Trampoline` and M2 to be `IO`
*/
implicit def StateTMApplicative[M1[_]: Monad, S, M2[_] : Applicative] =
Applicative.applicative[({type l[a] = StateT[M1, S, M2[a]]})#l]
</code></pre><p>I also need some <code class="prettyprint">Pure</code> and <code class="prettyprint">Apply</code> instances in scope:</p>
<pre><code class="prettyprint"> implicit def StateTMApply[M1[_]: Monad, S, M2[_] : Applicative] =
new Apply[({type l[a]=StateT[M1, S, M2[a]]})#l] {
def apply[A, B](f: StateT[M1, S, M2[A => B]], a: StateT[M1, S, M2[A]]) =
f.flatMap(ff => a.map(aa => aa <*> ff))
}
implicit def StateTMPure[M1[_] : Pure, S, M2[_] : Pure] =
new Pure[({type l[a]=StateT[M1, S, M2[a]]})#l] {
def pure[A](a: => A) = stateT((s: S) => (s, a.pure[M2]).pure[M1])
}
</code></pre><p>Once it's written, it may not seem so hard but I got very confused trying to get there. How can it be made easier? First, we could have better type annotations for partial type application, like:</p>
<pre><code class="prettyprint"> // notice the * instead of the "type l" trick
implicit def StateTMApplicative[M1[_]: Monad, S, M2[_] : Applicative] =
Applicative.applicative[StateT[M1, S, M2[*]]]
implicit def StateTMApply[M1[_]: Monad, S, M2[_] : Applicative] =
new Apply[StateT[M1, S, M2[*]]] {
def apply[A, B](f: StateT[M1, S, M2[A => B]], a: StateT[M1, S, M2[A]]) =
f.flatMap(ff => a.map(aa => aa <*> ff))
}
implicit def StateTMPure[M1[_] : Pure, S, M2[_] : Pure] =
new Pure[StateT[M1, S, M2[*]]] {
def pure[A](a: => A) = stateT((s: S) => (s, a.pure[M2]).pure[M1])
}
</code></pre><p>And with a better type inference, the first definition could be even be (we can always dream :-)):</p>
<pre><code class="prettyprint"> implicit def StateTMApplicative[M1[_]: Monad, S, M2[_] : Applicative]:
Applicative[StateT[M1, S, M2[*]]] = Applicative.applicative
</code></pre><p>Which means that it may even be removed, just import <code class="prettyprint">Applicative.applicative</code>!</p><p>Actually <a href="https://github.com/scalaz/scalaz/tree/scalaz-seven">Scalaz-seven</a> might help by:</p>
<ul>
<li><p>providing those instances out-of-the box</p></li>
<li><p>even better, provide combinators to create those instances easily. That's what the <a href="https://github.com/scalaz/scalaz/blob/scalaz-seven/example/src/main/scala/scalaz/example/WordCount.scala#L36"><code class="prettyprint">compose</code> method</a> does</p></li>
<li><p>give even better type inference. Look at the <code class="prettyprint">traverseU</code> method <a href="https://github.com/scalaz/scalaz/blob/bd79e134e513bc5686db8109006b90eeff771a67/example/src/main/scala/scalaz/example/StateTUsage.scala#L26">here</a>, no type annotations Ma!</p></li>
</ul><p>Now that the fundamentals are working ok for my application, I can go back to adding features, yay!</p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5336273.post-2021617698850315362011-12-08T15:32:00.001+09:002011-12-09T05:34:09.560+09:00Pragmatic IO - part 2<p>A follow-up to <a href="http://etorreborre.blogspot.com/2011/12/pragmatic-io.html">my previous post about <code class="prettyprint">IO</code></a>.</p><a name="Learning+by+being+wrong%2C+very+wrong"></a><h4>Learning by being wrong, very wrong</h4><p>I didn't think that diving into the functional <code class="prettyprint">IO</code> world would have me think so hard about how to use some FP concepts.</p><a name="Mind+the+law"></a><h5>Mind the law</h5><p>The technique I described at the end of my previous post didn't really work. It was wrong on many accounts. First of all, my<br /><code class="prettyprint">Traverse</code> instance was buggy because I was reversing the traversed <code class="prettyprint">Stream</code>. This is why I had inverted messages in the console. During the traversal, starting from the left, I needed to <em>append</em> each traversed element to the result, not <em>prepend</em> it:</p>
<pre><code class="prettyprint"> /**
* WRONG: because each new element `b` must be appended to the result.
* it should be bs :+ b instead
*/
def SeqLeftTraverse: Traverse[Seq] = new Traverse[Seq] {
def traverse[F[_]: Applicative, A, B](f: A => F[B], as: Seq[A]): F[Seq[B]] =
as.foldl[F[Seq[B]]](Seq[B]().pure) { (ys, x) =>
implicitly[Apply[F]].apply(f(x) map ((b: B) => (bs: Seq[B]) => b +: bs), ys)
}
}
</code></pre><p>It is important to mention here that proper testing should have caught this. There are some laws that applicative traversals must satisfy (see <a href="http://www.cs.ox.ac.uk/jeremy.gibbons/publications/iterator.pdf">EIP, section 5</a>. One of them is that <code class="prettyprint">seq.traverse(identity) == seq</code>. This is obviously not the case when I returned a reverted sequence.</p><a name="Not+at+the+right+time"></a><h5>Not at the right time</h5><p>Then I realized something also which was also worrying. On big files, my application would do a lot, <em>then</em>, report its activity to the console. This would definitely not have happened in a simple for loop with unrestricted effects!</p><p>So I set out to find a better implementation for my file reading / records saving. The solution I'm going to present here is just one way of doing it, given <em>my</em> way of constraining the problem. There are others, specifically the use of <code class="prettyprint">Iteratees</code>.</p><a name="My+goal"></a><h4>My goal</h4><p>This is what I want to do: I want to read lines from a file, transform them into "records" and store them in a database. Along the way, I want to inform the user of the progress of the task, based on the number of lines already imported.</p><p>More precisely, I want 2 classes with the following interfaces:</p>
<ul>
<li><p>the <code class="prettyprint">Source</code> class is capable of reading lines and do something on each line, possibly based on some state, like the number of lines already read.<br /> The <code class="prettyprint">Source</code> class should also be responsible for closing all the opened resources whether things go wrong or not</p></li>
<li><p>the <code class="prettyprint">Extractor</code> class should provide a function declaring what to do with the current line and state. Ideally it should be possible to create that function by composing independent actions: storing records in the db and / or printing messages on the console</p></li>
<li><p>and, by the way, I don't want this to go out-of-memory or stack overflow at run-time just because I'm importing a big file :-)</p></li>
</ul><p>Mind you, it took me quite a while to arrive there. In retrospect, I could have noticed that:</p>
<ul>
<li><p>all the processing has to take place inside the <code class="prettyprint">Source.readLines</code> method. Otherwise there's no way to close the resources properly (well unless you're using Iteratees, what Paul Chiusano refers to as <a href="https://groups.google.com/d/msg/scalaz/QPUs6TWTAm4/ccLBtRZB0KMJ">"Consumer" processing</a>)</p></li>
<li><p>the signature of the function passed to <code class="prettyprint">Source.readLines</code> has to be <code class="prettyprint">String => State[IO[A]]</code>, meaning that for each line, we're doing an <code class="prettyprint">IO</code> action (like saving it to the database) but based on a <code class="prettyprint">State</code></p></li>
<li><p>the end result of <code class="prettyprint">readLines</code> has to be <code class="prettyprint">IO[Seq[A]]</code> which is an action returning the sequence of all the return values when all the <code class="prettyprint">IO</code> actions have been performed</p></li>
</ul><p>Considering all that, I ended up with the following signatures:</p>
<pre><code class="prettyprint"> /**
* @param filePath the file to read from
* @f a function doing an IO action for each line, based on a State
* @init the initial value of the State
* @return an IO action reading all the lines and returning the sequence of computed values
*/
def readLinesIO[S, A](filePath: String)(f: String => State[S, IO[A]])(init: S): IO[Seq[A]]
</code></pre><p>and the <code class="prettyprint">Extractor</code> code goes:</p>
<pre><code class="prettyprint"> // the initial state
val counter = new LogarithmicCounter(level = 100, scale = 10)
// a function to notify the user when we've reached a given level
val onLevel = (i: Int) => printTime("imported from "+filePath+": "+i+" lines")
// a function to create and store each line
def onCount(l: String) = (i: Int) => createAndStoreRecord(Line(file, l), creator)
// for each line, apply the actions described as a State[LogarithmicCounter, IO[A]] object
readLinesIO(file.getPath)(l => counter.asState(onLevel, onCount(l)))(counter)
</code></pre><p>I like that interface because it keeps things separated and generic enough:</p>
<ul>
<li>reading the lines and opening/closing the resources goes to one class: <code class="prettyprint">Source</code></li>
<li>defining how the state evolves when an action is done goes to the <code class="prettyprint">LogarithmicCounter</code> class</li>
<li>defining what to do exactly with each line goes in one class, the <code class="prettyprint">Extractor</code> class</li>
</ul><p>Now, the tricky implementation question is: how do you implement <code class="prettyprint">readLinesIO</code>?</p><a name="What+not+to+do"></a><h4>What not to do</h4><p>Definitely the wrong way to do it is to use a "regular" <code class="prettyprint">traverse</code> on the stream of input lines. Even with the trampolining trick described in my previous post.</p><p>Indeed <code class="prettyprint">traverse</code> is going to fold everything once to go from <code class="prettyprint">Stream[String]</code> to <code class="prettyprint">State[S, Seq[IO[B]]]</code> then, with the initial <code class="prettyprint">State</code> value, to <code class="prettyprint">Seq[IO[B]]</code> which then has to be <code class="prettyprint">sequence</code>d into <code class="prettyprint">IO[Seq[B]]</code>. This is guaranteed to do more work than necessary. More than what a single <code class="prettyprint">for</code> loop would do.</p><p>The other thing <em>not to do</em> is so silly that I'm ashamed to write it here. Ok, I'll write it. At least to avoid someone else the same idea. I had an implicit conversion from <code class="prettyprint">A</code> to <code class="prettyprint">IO[A]</code>,...</p><p>That's a very tempting thing to do and very convenient at first glance. Whenever a function was expecting an <code class="prettyprint">IO[A]</code>, I could just pass <code class="prettyprint">a</code> without having to write <code class="prettyprint">a.pure[IO]</code>. The trouble is, some actions were never executed! I don't have a specific example to show, but I'm pretty sure that, at some point, I had values of type <code class="prettyprint">IO[IO[A]]</code>. In that case, doing <code class="prettyprint">unsafePerformIO</code>, i.e. "executing" the <code class="prettyprint">IO</code> action only returns <em>another</em> action to execute and doesn't do anything really!</p><a name="Learn+your+FP+classes"></a><h4>Learn your FP classes</h4><p>I must admit I've been reluctant to go into what was advised on the Scalaz mailing-list: "go with StreamT", "use Iteratees". Really? I just want to do <code class="prettyprint">IO</code>! Can I learn all that stuff <em>later</em>? I finally decided to go with the first option and describe it here in detail.</p><p><code class="prettyprint">StreamT</code> is a "Stream transformer". It is the Scala version of the <a href="http://www.haskell.org/haskellwiki/ListT_done_right"><code class="prettyprint">ListT</code> done right</a> in Haskell. That looks a bit scary at first but it is not so hard. It is an infinite list of elements which are not values but computations. So it more or less has the type <code class="prettyprint">Seq[M[A]]</code> where <code class="prettyprint">M</code> is some kind of effect, like changing a <code class="prettyprint">State</code> or doing <code class="prettyprint">IO</code>.</p><p>My first practical question was: "how do I even build that thing from a Stream of elements?"</p><a name="Unfolding+a+data+structure"></a><h5>Unfolding a data structure</h5><p><em>Unfolding</em> is the way. In the Scala world we are used to "folding" data structures into elements: get the sum of the elements of a list, get the maximum element in a tree. We less often do the opposite: generate a data structure from one element and a function. One example of this is generating the list of Fibonacci numbers out of an initial pair <code class="prettyprint">(0, 1)</code> and a function computing the next "Fibonacci step".</p><p>With the <code class="prettyprint">StreamT</code> class in Scalaz, you do this:</p>
<pre><code class="prettyprint"> /** our record-creation function */
def f[B]: String => M[B]
/**
* unfolding the initial Stream to a StreamT where we apply
* a `State` function `f` to each element
*/
StreamT.unfoldM[M, B, Seq[A]](seq: Seq[A]) { (ss: Seq[A]) =>
if (ss.isEmpty) (None: Option[(B, Seq[A])]).pure[M]
else f(ss.head).map((b: B) => Some((b, ss.tail)))
}
</code></pre><p>The code above takes an initial sequence <code class="prettyprint">seq</code> (the Stream of lines to read) and:</p>
<ul>
<li><p>if there are no more elements, you return <code class="prettyprint">None.pure[M]</code>, there's nothing left to do. In my specific use case that'll be <code class="prettyprint">State[LogarithmicCounter, None]</code></p></li>
<li><p>if there is an element in the <code class="prettyprint">Stream</code>, <code class="prettyprint">f</code> is applied to that element, that is, we create a record from the line and store it in the database (that's the <code class="prettyprint">b:B</code> parameter, which is going to be an <code class="prettyprint">IO</code> action)</p></li>
<li><p>because <code class="prettyprint">M</code> is the <code class="prettyprint">State</code> monad, when we apply <code class="prettyprint">f</code>, this also computes the next state to keep track of how many rows we've imported so far</p></li>
<li><p>then the rest of the stream of lines to process, <code class="prettyprint">ss.tail</code>, is also returned and <code class="prettyprint">unfoldM</code> will use that to compute the next "unfolding" step</p></li>
</ul><p>One very important thing to notice here is that we haven't consumed any element at that stage. We've only declared what we <em>plan</em> to do with each element of the input stream.</p><a name="Running+the+StreamT"></a><h5>Running the StreamT</h5><p>In the <code class="prettyprint">StreamT</code> object in Scalaz there is this method call <code class="prettyprint">runStreamT</code> taking in:</p>
<ul>
<li>a <code class="prettyprint">StreamT[State[S, *], A]</code>, so that's a <code class="prettyprint">StreamT</code> where the state may change with each element of the <code class="prettyprint">Stream</code></li>
<li>an initial state <code class="prettyprint">s0</code></li>
<li>and returning a <code class="prettyprint">StreamT[Id, A]</code></li>
</ul><p><em>Note: <code class="prettyprint">State[S, *]</code> is a non-existing notation for <code class="prettyprint">({type l[a]=State[S, a]})#l</code></em></p><p>That doesn't see to be game-changing, but this does a useful thing for us. <code class="prettyprint">runStreamT</code> "threads" the <code class="prettyprint">State</code> through all elements, starting with the initial value. In our case that means that we're going to create IO actions, where each created action depends on the current state (the number of lines read so far).</p><p>The cool thing is that so far we've described transformations to apply: <code class="prettyprint">Stream[String]</code> to <code class="prettyprint">StreamT[State, IO[B]]</code> to <code class="prettyprint">StreamT[Id, IO[B]]</code> but we still haven't executed anything!</p><a name="UnsafePerformIO+at+last"></a><h5>UnsafePerformIO at last</h5><p>Then, I coded an <code class="prettyprint">unsafePerformIO</code> method specialized for <code class="prettyprint">Stream[Id, IO[A]]</code> to execute all the <code class="prettyprint">IO</code> actions. The method itself is not difficult to write but we need to make sure it's tail-recursive:</p>
<pre><code class="prettyprint"> /** execute all the IO actions on a StreamT when it only contains IO actions */
def unsafePerformIO: Seq[A] = toSeq(streamT, Seq[A]())
/**
* execute a StreamT containing `IO` actions.
*
* Each IO action is executed and the result goes into a list of results
*/
@tailrec
private def toSeq[A](streamT: StreamT[Id, IO[A]], result: Seq[A]): Seq[A] =
streamT.uncons match {
case Some((head, tail)) => toSeq(tail, result :+ head.unsafePerformIO)
case None => result
}
</code></pre><a name="Hidding+everything+under+the+rug"></a><h5>Hidding everything under the rug</h5><p>Let's assemble the pieces of the puzzle now. With an implicit conversion, it is possible to transform any <code class="prettyprint">Seq</code> to a <code class="prettyprint">StreamT</code> performing some action on its elements:</p>
<pre><code class="prettyprint"> /**
* for this to work, M[_] has to be a "PointedFunctor", i.e. have "pure" and
* "fmap" methods. It is implemented using the `unfoldM` method code seen above
*/
seq.toStreamT(f: A => M[B]): StreamT[M, B]
</code></pre><p>Then, we can traverse a whole sequence with a composition of a <code class="prettyprint">State</code> and <code class="prettyprint">IO</code> actions:</p>
<pre><code class="prettyprint"> seq.traverseStateIO(f)
</code></pre><p>Where <code class="prettyprint">traverseStateIO</code> is using the <code class="prettyprint">toStreamT</code> transformation and is defined as:</p>
<pre><code class="prettyprint"> /**
* traverse a stream with a State and IO actions by using a Stream transformer
*/
def traverseStateIO[S, B](f: A => State[S, IO[B]])(init: S): Seq[B] =
StreamT.runStreamT(seq.toStreamT[State[S, *], IO[B]](f), init).unsafePerformIO
</code></pre><p><em>[you'll get bonus points for anyone providing a <code class="prettyprint">Traverse[Stream]</code> instance based on <code class="prettyprint">StreamT</code> - if that exists]</em></p><p>Finally the <code class="prettyprint">Source.readLines</code> method is implemented as:</p>
<pre><code class="prettyprint"> /**
* this method reads the lines of a file and apply stateful actions.
* @return an IO action doing all the reading and return a value for each processed line
*/
def readLinesIO[S, A](path: String)(f: String => State[S, IO[A]])(init: S): IO[Seq[A]] =
readLines(path)(f)(init).pure[IO]
private
def readLines[S, A](path: String)(f: String => State[S, IO[A]])(init: S): Seq[A] = {
// store the opened resource
var source: Option[scala.io.Source] = None
def getSourceLines(src: scala.io.Source) = { source = Some(src); getLines(src) }
try {
// read the lines and execute the actions
getSourceLines(fromFile(path)).toSeq.traverseStateIO(f)(init)
} finally { sources.map(_.close()) }
}
</code></pre><p>And the client code, the <code class="prettyprint">Extractor</code> class does:</p>
<pre><code class="prettyprint"> val counter = new LogarithmicCounter(level = 100, scale = 10)
// notify the user when we've reached a given level
val onLevel = (i: Int) => printTime("imported from "+filePath+": "+i+" lines")
// create and store a record for each new line
def onCount(l: String) = (i: Int) => createAndStoreRecord(Line(file, l), creator)
// an `IO` action doing the extraction
readLinesIO(file.getPath)(l => counter.asState(onLevel, onCount(l)))(counter)
</code></pre><a name="Conclusion"></a><h4>Conclusion</h4><p>I've had more than one WTF moment when trying to mix-in <code class="prettyprint">State</code>, <code class="prettyprint">IO</code> and resources management together. I've been very tempted to go back to vars and unrestricted effects :-).</p><p>Yet, I got to learn useful abstractions like <code class="prettyprint">StreamT</code> and I've seen that it was indeed possible to define generic, composable and functional interfaces between components. I also got the impression that there lots of different situations where we are manually passing state around which is error-prone and which can be done generically by <code class="prettyprint">traverse</code>-like methods.</p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-5336273.post-83171621551282326552011-12-05T15:51:00.000+09:002011-12-05T15:51:50.353+09:00Pragmatic IO<p>This post is my exploration of the use of the <code class="prettyprint">IO</code> type in a small Scala application.</p><a name="A+short+IO+introduction"></a><h4>A short <code class="prettyprint">IO</code> introduction</h4><p>Scala is a pragmatic language when you start learning Functional Programming (FP). Indeed there are some times when you can't apply the proper FP techniques just because <em>you don't know them yet</em>. In those cases you can still resort to variables and side-effects to your heart's content. One of these situations is IO (input/output).</p><p>Let's take an example: you need to save data in a database (not a rare thing :-)). The signature of your method will certainly look like that:</p>
<pre><code class="prettyprint"> /** @return the unique database id of the saved user */
def save(user: User): Int
</code></pre><p>This method is not <a href="http://en.wikipedia.org/wiki/Referential_transparency_(computer_science)">referentially transparent</a> since it changes the state of the "outside world": if you call it twice, you'll get a different result. This kind of thing is very troubling for a Functional Programmer, that doesn't make him sleep well at night. And anyway, in a language like Haskell, where each function is pure you <em>can't</em> implement it. So how do you do?</p><p>The <a href="https://groups.google.com/d/msg/scala-debate/xYlUlQAnkmE/hbnx7L_cTDYJ">neat trick</a> is to return an action saying <em>"I'm going to save the user"</em> instead of actually saving it:</p>
<pre><code class="prettyprint"> def save(user: User): IO[Int]
</code></pre><p>At first, it seems difficult to do anything from there because we just have an <code class="prettyprint">IO[Int]</code>, not a real <code class="prettyprint">Int</code>. If we want to display the result on the screen, what do we do?</p><p>We use <code class="prettyprint">IO</code> as a <code class="prettyprint">Monad</code>, with the <code class="prettyprint">flatMap</code> operation, to sequence the 2 actions, into a new one:</p>
<pre><code class="prettyprint"> /** create an IO action to print a line on the screen */
def printLine(line: Any): IO[Unit] = println(line).pure[IO]
/** save a user and print the new id on the screen */
// or save(user) >>= printLine
def saveAndPrint(user: User): IO[Unit] = save(user).flatMap(id => printLine(id))
</code></pre><p>Then finally when we want to execute the actions for real, we call <code class="prettyprint">unsafePerformIO</code> in the <code class="prettyprint">main</code> method:</p>
<pre><code class="prettyprint"> def main(args: Array[String]) {
saveAndPrint(User("Eric")).unsafePerformIO
}
</code></pre><p>That's more or less the Haskell way to do IO in Scala but <em>nothing forces us to do so</em>. The natural question which arises is: what <em>should</em> we do?</p><a name="IO+or+not+IO"></a><h4><code class="prettyprint">IO</code> or not <code class="prettyprint">IO</code></h4><p>Before starting this experiment, I fully re-read <a href="https://groups.google.com/d/topic/scala-debate/xYlUlQAnkmE/discussion">this thread</a> on the Scala mailing-list, and the answer is not clear for many people apparently. Martin Odersky doesn't seemed convinced that this is the way to go. He thinks that there should be <a href="https://groups.google.com/d/msg/scala-debate/xYlUlQAnkmE/hbnx7L_cTDYJ">a more lightweight and polymorphic way to get effect checking</a> and Bob Harper <a href="http://existentialtype.wordpress.com/2011/05/01/of-course-ml-has-monads/">doesn't see the point</a>.</p><p>Surely the best way to form my own judgement was to try it out by myself (<a href="https://groups.google.com/d/msg/scala-debate/xYlUlQAnkmE/b4BkUz4nKTIJ">as encouraged by Runar</a>). One thing I know for sure is that, <a href="https://github.com/etorreborre/specs2/blob/1.6/notes/1.6.markdown">the moment I started printing out files in specs2</a>, I got an uneasy feeling that something could/would go wrong. Since then,<br /><code class="prettyprint">IO</code> has been on my radar of things to try out.</p><p>Luckily, these days, I'm developing an application which is just the perfect candidate for that. This application:</p>
<ul>
<li>reads log files</li>
<li>stores the records in a MongoDB database</li>
<li>creates graphs to analyze the data (maximums, average, distribution of some variables)</li>
</ul><p>Perfect but,... where do I start :-)?</p><a name="Theory+%3D%3E+Practice"></a><h4>Theory => Practice</h4><p>That was indeed my first question. Even in a small Swing application like mine the effects were interleaved in many places:</p>
<ul>
<li>when I opened the MongoDB connection, I was printing some messages on the application console to inform the user about the db name, the MongoDB version,...</li>
<li>the class responsible for creating the reports was fetching its own data, effectively doing IO</li>
<li>datastore queries are cached, and getting some records (an IO action) required to get other data (the date of the latest import, another IO action) to decide if we could reuse the cached ones instead</li>
</ul><p>The other obvious question was: "where do I call <a href="http://users.skynet.be/jyp/html/base/System-IO-Unsafe.html"><code class="prettyprint">unsafePerformIO</code></a>?". This in an interactive application, I can't put just one<br /><code class="prettyprint">unsafePerformIO</code> in the main method!</p><p>The last question was: "OMG, I started to put IO in a few places, now it's eating all of my application, what can I do?" :-))</p><a name="Segregating+the+effects"></a><h4>Segregating the effects</h4><p>Here's what I ended up doing. First of all let's draw a diagram of the main components of my application before refactoring:</p>
<pre><code class="prettyprint"> +-------+ extract/getReport +------ IO + store +------ IO +
+ Gui + <------------------------> + Process + -----------> + Store +
+-------+ Query/LogReport +----------+ +----------+
| getReport ^
v |
+------ IO + getRecords |
+ Reports + ------------------+
+----------+
</code></pre><p>Simple app, I told you.</p><p>The first issue was that the <code class="prettyprint">Reports</code> class (actually a bit more than just one class) was fetching records and building reports at the same time. So if I decided to put <code class="prettyprint">IO</code> types on the store I would drag them in any subsequent transformation (marked as <code class="prettyprint">IO</code> on the diagram).</p><p>The next issue was that the <code class="prettyprint">Process</code> class was also doing too much: reading files and storing records.</p><p>Those 2 things led me to this refactoring and "architectural" decision, <code class="prettyprint">IO</code> annotations would only happen on the middle layer:</p>
<pre><code class="prettyprint"> +------- IO + store
read + Extractor + -----------------------------+
files +-----------+ | |
^ | |
| extract v v
+-------+ extract/getReport +----------+ +------ IO + +----------+
+ Gui + <------------------------> + Process + + Status + + Store +
+-------+ Query/LogReport +----------+ +----------+ +----------+
| getReport ^ ^
v | |
+------ IO + | getRecords |
+ Reporter + ------------------------------+
+----------+
| createReport
v
+----------+
+ Reports +
+----------+
</code></pre>
<ul>
<li><p>all the <code class="prettyprint">IO</code> actions are being segregrated to special classes. For example extracting lines for log files creates a big composite <code class="prettyprint">IO</code> action to read the lines, to store the lines as a record and to report the status of the extraction. This is handled by the <code class="prettyprint">Extractor</code> class</p></li>
<li><p>the <code class="prettyprint">Status</code> class handles all the printing on the the console. It provides methods like:</p>
<pre><code class="prettyprint"> /**
* print a message with a timestamp, execute an action,
* and print an end message with a timestamp also.
*
* This method used by the `Extractor`, `Reporter` and `Process` classes.
*/
def statusIO[T](before: String, action: IO[T], after: String): IO[T]`
</code></pre></li>
<li><p>the <code class="prettyprint">Reports</code> class only deals with the aggregation of data from records coming from the <code class="prettyprint">Store</code>. However it is completely ignorant of this fact, so testing becomes easier (true story, that really helped!)</p></li>
<li><p>the <code class="prettyprint">Store</code> does not have any <code class="prettyprint">IO</code> type. In that sense my approach is not very pure, since nothing prevents "someone" from directly calling the store from the middle layer without encapsulating the call in an <code class="prettyprint">IO</code> action. I might argue that this is pragmatic. Instead of sprinkling <code class="prettyprint">IO</code> everywhere I started by defining a layer isolating the <code class="prettyprint">IO</code> world (the data store, the file system) from the non-<code class="prettyprint">IO</code> world (the reports, the GUI. Then, if I want, I can extend the <code class="prettyprint">IO</code> marking the rest of the application if I want to. That being said I didn't do it because I didn't really see the added-value at that stage. A name for this approach could be "gradual effecting" (the equivalent of <a href="http://lambda-the-ultimate.org/node/1707">"gradual typing"</a>)</p></li>
<li><p>the <code class="prettyprint">Process</code> class is not marked as <code class="prettyprint">IO</code> because all the methods provided by that class are actually calling<br /> <code class="prettyprint">unsafePerformIO</code> to really execute the actions. This is the only place in the application where this kind of call occurs and the GUI layer could be left unchanged</p></li>
</ul><p>All of that was not too hard, but was not a walk in the park either. Some details of the implementation were hard to come up with.</p><a name="Under+the+bonnet%3A+syntactic+tricks"></a><h4>Under the bonnet: syntactic tricks</h4><p>First of all, what was easy?</p><a name="%22Monadic%22+sequencing"></a><h6>"Monadic" sequencing</h6><p>Sequencing <code class="prettyprint">IO</code> actions is indeed easy. I've used the <code class="prettyprint">for</code> comprehension:</p>
<pre><code class="prettyprint"> def statusIO[T](before: String, action: IO[T], after: String): IO[T] = {
for {
_ <- printTime(before)
result <- action
_ <- printTime(after)
} yield result
}
</code></pre><a name="%22Semi-column%22+sequencing"></a><h6>"Semi-column" sequencing</h6><p>But also the <code class="prettyprint">>>=|</code> operator in Scalaz to just sequence 2 actions when you don't care about the first one:</p>
<pre><code class="prettyprint"> def statusIO[T](before: String)(action: IO[T]): IO[T] = printTime(before) >>=| action
</code></pre><a name="%22Conditional%22+sequencing"></a><h6>"Conditional" sequencing</h6><p>And, for the fun of it, I implemented a "conditional flatMap operator", <code class="prettyprint">>>=?</code>:</p>
<pre><code class="prettyprint"> action1 >>=? (condition, action2)
</code></pre><p>In the code above the <code class="prettyprint">action1</code> is executed and, if the <code class="prettyprint">condition</code> is true, we execute <code class="prettyprint">action2</code> and forget about the result of <code class="prettyprint">action1</code>, otherwise we keep the result of <code class="prettyprint">action1</code>.</p><a name="%22if+else%22+with+a+twist"></a><h6>"if else" with a twist</h6><p>Not completely related to <code class="prettyprint">IO</code> I also liked the <code class="prettyprint">??</code> operator. Say I want to execute an action only if a file exists:</p>
<pre><code class="prettyprint"> if (file.exists) createRecords(file)
else ()
</code></pre><p>The <code class="prettyprint">else</code> line really feels ugly. This kind of "default value" should be inferred from somewhere. Indeed! This "default value" is the <code class="prettyprint">Zero</code> of a given type in Scalaz, so it is possible to write:</p>
<pre><code class="prettyprint"> file.exists ?? createRecords(file)
</code></pre><p>It just works (tm) but it's worth knowing where that zero values really comes from:</p>
<ol>
<li><p><code class="prettyprint">createRecords(file)</code> returns <code class="prettyprint">IO[Int]</code> (the last created id - that's a simplification of the real program)</p></li>
<li><p>there is a way to create a <code class="prettyprint">Monoid</code> from another <code class="prettyprint">Monoid</code> + an <code class="prettyprint">Applicative</code>:</p>
<pre><code class="prettyprint">// from scalaz.Monoid
/** A monoid for sequencing Applicative effects. */
def liftMonoid[F[_], M](implicit m: Monoid[M], a: Applicative[F]): Monoid[F[M]] =
new Monoid[F[M]] {
val zero: F[M] = a.pure(m.zero)
def append(x: F[M], y: => F[M]): F[M] =
a.liftA2(x, y, (m1: M, m2: M) => m.append(m1, m2))
}
</code></pre></li>
<li><p>in this case <code class="prettyprint">IO</code> has an <code class="prettyprint">Applicative</code>, <code class="prettyprint">M</code> is <code class="prettyprint">Int</code> so it has a <code class="prettyprint">Monoid</code> hence <code class="prettyprint">IO[Int]</code> defines a <code class="prettyprint">Monoid</code> where the zero is <strong><em><code class="prettyprint">IO(0)</code></em></strong>. I had to open up the debugger to check that precisely :-)</p></li>
</ol><a name="Under+the+bonnet%3A+it+just+works%2C...+not"></a><h4>Under the bonnet: it just works,... not</h4><p>The next piece of implementation I was really happy with was the "logarithmic reporting". This is a feature I implemented using vars at first, which I wanted to make pure in my quest for <code class="prettyprint">IO</code>.</p><p>What I want is to extract log lines and notify the user when a bunch of lines have been imported (so that he doesn't get too bored). But I don't know how many lines there are in a given file. It could be 100 but it could be 30.000 or 100.000. So I thought that a "logarithmic" counter would be nice. With that counter, I notify the user every 100, 1000, 10.000, 100.000 lines.</p><p>The <code class="prettyprint">LogarithmicCounter</code> works by creating a <code class="prettyprint">State</code> object encapsulating 2 actions to do, one on each <code class="prettyprint">tick</code>, one when a<br /><code class="prettyprint">level</code> is reached:</p>
<pre><code class="prettyprint"> // create and store a line on each `tick`
def onCount(line: String) = (i: Int) => createAndStoreRecord(line, creator)
// and notify the user when we've reached a given level
val onLevel = (i: Int) => printTime("imported from "+filePath+": "+i+" lines")
/** @return a State[LogarithmicCounter, IO[Unit]] */
val readLine = (line: String) => counter.asState(onLevel, onCount(line))
</code></pre><p>The <code class="prettyprint">readLine</code> method is used in a <code class="prettyprint">traversal</code> of the lines returned by a <code class="prettyprint">FileReader</code>:</p>
<pre><code class="prettyprint"> (lines traverseState readLine): State[LogarithmicCounter, Stream[IO[Unit]]]
</code></pre><p>Pass it an initial counter and you get back a <code class="prettyprint">Stream</code> of <code class="prettyprint">IO</code> actions which you can then <code class="prettyprint">sequence</code> to get an <code class="prettyprint">IO[Stream[Unit]]</code>:</p>
<pre><code class="prettyprint"> // initial traversal with a function returning a `State`
val traversed: State[LogarithmicCounter, Stream[IO[Unit]]] = lines traverseState readLine
// feed in the initial values: count = 1, level = 100 to get the end `State`
val traversedEndState: Stream[IO[Unit]] = traversed ! new LogarithmicCounter
// finally get a global action which will execute a stream of
// record-storing actions and printing actions
val toStore: IO[Stream[Unit]] = traversedEndState.sequence
</code></pre><p>Some readers of this blog may recognize one usage of the <a href="http://etorreborre.blogspot.com/2011/06/essence-of-iterator-pattern.html">Essence of the Iterator Pattern - EIP</a> and that's indeed what it is (my first real use case, yes!). <code class="prettyprint">traverseState</code> is just a way to use the more traditional <code class="prettyprint">traverse</code> method but hiding the ugly type annotations.</p><p>This kind of type-directed development is nice. You add some type here and there and you let the compiler guide you to which method to apply in order to get the desired results:</p>
<ol>
<li>after the traversal I get back a <code class="prettyprint">State</code></li>
<li>if I want to get the final state, the <code class="prettyprint">Stream</code> of <code class="prettyprint">IO</code>, I need to feed in some initial values, that's the '!' method</li>
<li>if I want to get an action equivalent to the stream of actions, I need the <code class="prettyprint">sequence</code> method which has exactly the type signature doing what I want</li>
</ol><p>I was about to call it a day when I actually tried my application,... StackOverflow! What?!?</p><a name="Trampolining+to+the+rescue"></a><h5>Trampolining to the rescue</h5><p>It turns out that traversing a "big" sequence with a <code class="prettyprint">State</code> is not so easy. First of all, <code class="prettyprint">State</code> is an <code class="prettyprint">Applicative</code> because it is also a <code class="prettyprint">Monad</code> (please read the earlier EIP post for the details). So basically this amounts to chaining a <em>lot</em> of <code class="prettyprint">flatMap</code> operations which blows up the stack.</p><p>Fortunately for me, Runar has implemented a generic solution for this kind of issue, like, <a href="http://apocalisp.wordpress.com/2011/10/26/tail-call-elimination-in-scala-monads/"><em>less than 2 months ago!</em></a> I leave you to his excellent post for a detailed explanation but the gist of it is to use continuations to describe computations and store them on the heap instead of letting calls happen on the stack. So instead of using <code class="prettyprint">State[S, A]</code> I use <code class="prettyprint">StateT[Trampoline, S, A]</code> where each computation <code class="prettyprint">(S, A)</code> returned by the <code class="prettyprint">State</code> monad is actually encapsulated in a <code class="prettyprint">Trampoline</code> to be executed on the heap.</p><p>The application of this idea was not too easy at first and Runar helped me with a code snippet (thanks!). Eventually I managed to keep everything well hidden behind the <code class="prettyprint">traverseState</code> function. The first thing I did was to "trampoline" the function passed to <code class="prettyprint">traverseState</code>:</p>
<pre><code class="prettyprint"> /**
* transform a function into its "trampolined" version
* val f: T => State[S, A] = (t: T) => State[S, A]
* val f_trampolined: T => StateT[Trampoline, S, A] = f.trampolined
*/
implicit def liftToTrampoline[T, S, A](f: T => State[S, A]) = new LiftToTrampoline(f)
class LiftToTrampoline[T, S, A](f: T => State[S, A]) {
def trampolined = (t: T) => stateT((s: S) => suspend(st.apply(s)))
}
</code></pre><p>So the <code class="prettyprint">traverseState</code> function definition becomes:</p>
<pre><code class="prettyprint"> // with the full ugly type annotation
def traverseState(f: T => State[S, B]) =
seq.traverse[({type l[a]=StateT[Trampoline, S, a]})#l, B](f.trampolined)
</code></pre><p>However I can't leave things that like that because <code class="prettyprint">traverseState</code> then returns a <code class="prettyprint">StateT[Trampoline, S, B]</code> when the client of the function expects a <code class="prettyprint">State[S, B]</code>. So I added an <code class="prettyprint">untrampolined</code> method to recover a <code class="prettyprint">State</code> from a "trampolined" one:</p>
<pre><code class="prettyprint"> /** @return a normal State from a "trampolined" one */
implicit def fromTrampoline[S, A](st: StateT[Trampoline, S, A]) = new FromTrampoline(st)
class FromTrampoline[S, A](st: StateT[Trampoline, S, A]) {
def untrampolined: State[S, A] = state((s: S) => st(s).run)
}
</code></pre><p>The end result is not so bad. The "trampoline" trick is hidden as an implementation detail and I don't get StackOverflows anymore. Really? Not really,...</p><a name="The+subtleties+of+foldRight+and+foldLeft"></a><h5>The subtleties of <code class="prettyprint">foldRight</code> and <code class="prettyprint">foldLeft</code></h5><p>I was still getting a StackOverflow error but not in the same place as before (<code class="prettyprint">#&$^@!!!</code>). It was in the traversal function itself, not in the chaining of <code class="prettyprint">flatMaps</code>. The reason for that one was that the <code class="prettyprint">Traverse</code> instance for a <code class="prettyprint">Stream</code> in Scalaz is using <code class="prettyprint">foldRight</code> (or <code class="prettyprint">foldr</code>):</p>
<pre><code class="prettyprint"> implicit def StreamTraverse: Traverse[Stream] = new Traverse[Stream] {
def traverse[F[_]: Applicative, A, B](f: A => F[B], as: Stream[A]): F[Stream[B]] =
as.foldr[F[Stream[B]]](Stream.empty.pure) { (x, ys) =>
implicitly[Apply[F]].apply(f(x) map ((a: B) => (b: Stream[B]) => a #:: b), ys)
}
}
</code></pre><p>and <code class="prettyprint">foldr</code> is recursively intensive. It basically says: <code class="prettyprint">foldRight(n) = f(n, foldRight(n-1))</code> whereas <code class="prettyprint">foldl</code> is implemented with a for loop and a variable to accumulate the result.</p><p>The workaround for this situation is simple: just provide a <code class="prettyprint">Traverse</code> instance using <code class="prettyprint">foldLeft</code>. But then you can wonder: "why is <code class="prettyprint">traverse</code> even using <code class="prettyprint">foldRight</code> in the first place?". The answer is in my next bug! After doing the modifications above I didn't get a SOE anymore but the output in the console was like:</p>
<pre><code class="prettyprint"> imported from test.log: 10000 lines [12:33:30]
imported from test.log: 1000 lines [12:33:30]
imported from test.log: 100 lines [12:33:30]
</code></pre><p>Cool, I have invented a Time Machine, Marty! That one left me puzzled for a while but I found the solution if not the explanation. The "left-folding" <code class="prettyprint">Traverse</code> instance I had left in scope was being used by the <code class="prettyprint">sequence</code> method to transform a <code class="prettyprint">Stream[IO]</code> into an <code class="prettyprint">IO[Stream]</code>. Changing that to the standard "right-folding" behaviour for a <code class="prettyprint">Stream</code> traversal was ok. So there <em>is</em> a difference (meaning that something is not <a href="http://en.wikipedia.org/wiki/Associative_property">associative</a> somewhere,...)</p><a name="Conclusion"></a><h4>Conclusion</h4><p>The main conclusion from this experiment is that tagging methods with <code class="prettyprint">IO</code> made me really <strong><em>think</em></strong> about where are the effects of my application. It also encouraged functional programming techniques such as <code class="prettyprint">traverse</code>, <code class="prettyprint">sequence</code> and al.</p><p>I must however say that I was surprised on more than one account:</p>
<ul>
<li><p>I stumbled upon a whole new class of bugs: non-execution. My effects were not executed improperly, they were not executed at all because I forgot to call <code class="prettyprint">unsafePerformIO</code>!</p></li>
<li><p>I was not expecting to be needing an optimisation which had just been added to Scalaz, for something which I thought was a casual traversal</p></li>
<li><p>there are still some FP mysteries to me. For example I don't know yet <a href="https://groups.google.com/d/msg/scalaz/QPUs6TWTAm4/Srypc61Agg0J">how to traverse a full <code class="prettyprint">Stream</code></a></p></li>
<li><p>I also don't get why my messages were inverted on the console. I tried different experiments, with a <code class="prettyprint">Seq</code> instead of a <code class="prettyprint">Stream</code>, with <code class="prettyprint">Identity</code> or <code class="prettyprint">Option</code> as an <code class="prettyprint">Applicative</code> instead of <code class="prettyprint">IO</code> and I could only reproduce this specific behavior with <code class="prettyprint">Stream</code> and <code class="prettyprint">IO</code></p></li>
</ul><p>Anyway, I would say that it was overall worth using the <code class="prettyprint">IO</code> type, at least for the architectural clarification and the better testing. It also brought back the <a href="http://bit.ly/saCAyu">warm, fuzzy feeling</a> that things are under control. On the other hand refactoring to <code class="prettyprint">IO</code> took me longer than expected and required more knowledge than just using vars and regular I/O. But that should really be accounted for in the 'investment for the future' column.</p><a name="PS"></a><h6>PS</h6><p>There are certainly many stones left unturned in that post for programmers new to Functional Programming and Scalaz. I do apologize for that (this post is certainly long enough :-)) and I encourage those readers to ask questions on the <a href="https://groups.google.com/forum/#!forum/scalaz">Scalaz mailing-list</a>.</p>Unknownnoreply@blogger.com13