<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>python &amp;mdash; Sebastian Wiesner</title>
    <link>https://swsnr.writeas.com/tag:python</link>
    <description>System engineer for satellite mission planning. Gnome. Rust. Arch.  </description>
    <pubDate>Sat, 09 May 2026 13:26:03 +0000</pubDate>
    <image>
      <url>https://i.snap.as/9knB2j11.jpg</url>
      <title>python &amp;mdash; Sebastian Wiesner</title>
      <link>https://swsnr.writeas.com/tag:python</link>
    </image>
    <item>
      <title>Calling Python from Haskell</title>
      <link>https://swsnr.writeas.com/calling-python-from-haskell?pk_campaign=rss-feed</link>
      <description>&lt;![CDATA[#haskell #python&#xA;&#xA;In a past version of this blog I used Pandoc to convert Markdown to HTML. It&#39;s by far the best and most powerful markdown converter, but it has one, albeit little weakness: Its syntax highlighting is based highlighting-kate, which is less good and supports less languages than the Python library Pygments, the de-facto standard highlighter used by Github and others.&#xA;&#xA;It&#39;s easy to implement custom highlighting thanks to the great API of Pandoc, with just two functions in Text.Highlighting.Pygments.Pandoc:&#xA;&#xA;[pandoc]: https://johnmacfarlane.net/pandoc/&#xA;[highlighting-kate]: https://hackage.haskell.org/package/highlighting-kate&#xA;[pygments]: https://pygments.org/&#xA;&#xA;!--more--&#xA;&#xA;import Text.Highlighting.Pygments (toHtml)&#xA;&#xA;blockToHtml :: Block -  IO Block&#xA;blockToHtml x@(CodeBlock attr ) | attr == nullAttr = return x&#xA;blockToHtml x@(CodeBlock (,[],) ) = return x&#xA;blockToHtml (CodeBlock (,language:,) text) = do&#xA;  colored &lt;- toHtml text language&#xA;  return (RawBlock (Format &#34;html&#34;) colored)&#xA;blockToHtml x = return x&#xA;&#xA;codeBlocksToHtml :: Pandoc -  IO Pandoc&#xA;codeBlocksToHtml = walkM blockToHtml&#xA;&#xA;This code transforms all code blocks to a raw HTML block containing the code highlighted by Pygments.  The language used in the code block is taken from the first unnamed attribute of the code block, just like in Github&#39;s markdown dialect.  Code blocks which do not specify a language are not touched.&#xA;&#xA;So far I just went the easy way, and called the pygmentize script in toHtml, passing the code to be highlighted on stdin, and reading the result from stdout.  While this is easy to implement with just a few lines it also slows down the build considerably.&#xA;&#xA;Last weekend I sat down and tried to call Pygments directly via Python&#39;s C API through Haskell&#39;s FFI.  This is what came out of this adventure.&#xA;&#xA;[pygmentize]: https://pygments.org/docs/cmdline/&#xA;&#xA;Native wrappers &#xA;&#xA;Foreign.Python.Native is an hsc2hs module which imports the required functions from Python&#39;s C API and declares corresponding Haskell signatures.&#xA;&#xA;The module also declares the necessary types, using a special hsc2hs feature to automatically derive the right Haskell type for a given C type.  For instance, the following declaration declares an appropriate Haskell alias for Python&#39;s Pyssizet, so I didn&#39;t need to grok the header files for the typedef:&#xA;&#xA;type PySSizeT = #type Pyssizet&#xA;&#xA;I also use the CApiFFI extension to avoid the hassle of finding out whether to import the UCS2 or the UCS4 API of CPython.  Instead, I just import the macro API and let GHC figure out the rest:&#xA;&#xA;foreign import capi unsafe &#34;Python.h PyUnicodeAsUTF8String&#34;&#xA;  pyUnicodeAsUTF8String :: RawPyObject -  IO RawPyObject&#xA;&#xA;foreign import capi unsafe &#34;Python.h PyUnicodeFromStringAndSize&#34;&#xA;  pyUnicodeFromStringAndSize :: CString -  PySSizeT -  IO RawPyObject&#xA;&#xA;GHC automatically generates a wrapper C functions for these macros, and  figures out whether to link PyUnicodeUCS2AsUTF8String or PyUnicodeUCS4AsUTF8String.&#xA;&#xA;[hsc2hs]: https://www.haskell.org/ghc/docs/7.6.3/html/usersguide/hsc2hs.html&#xA;&#xA;Convenient Haskell API&#xA;&#xA;Foreign.Python is the convenient Haskell API around the native Python functions.&#xA;&#xA;----&#xA;&#xA;Update (April 16, 2014): I changed toPyObject to return Nothing if given a null pointer, for increased safety.  Before toPyObject would simply wrap the given pointer, whether NULL or not.  The definition of toPyObjectChecked was updated too.&#xA;&#xA;While wrapping a NULL pointer in a managed pointer doesn&#39;t do any harm in and by itself, because the dereferencing functions from Python are safe to call with NULL, it was still possible to try and use the pointer, e.g. by trying to call the underlying Python object, and thus trigger a segfault.&#xA;&#xA;Now it&#39;s impossible to obtain a PyObject from NULL, increasing the safety of my Python API.&#xA;&#xA;----&#xA;&#xA;I use ForeignPtr to wrap the raw PyObject pointers into an opaque Haskell type which automatically calls PyXDECREF on the underlying PyObject when it goes out of scope:&#xA;&#xA;newtype PyObject = PyObject (ForeignPtr ())&#xA;&#xA;toPyObject :: RawPyObject -  IO (Maybe PyObject)&#xA;toPyObject raw | raw == nullPtr  = return Nothing&#xA;toPyObject raw = liftM (Just . PyObject) (newForeignPtr pyDecRef raw)&#xA;&#xA;withPyObject :: PyObject -  (RawPyObject -  IO a) -  IO a&#xA;withPyObject (PyObject ptr) = withForeignPtr ptr&#xA;&#xA;Only the opaque type is exported from the module, so outside code never has any chance of messing with the underlying C object and bypassing the garbage&#xA;collector.&#xA;&#xA;Many CPython functions return NULL to indicate that the operation failed and a Python exception was raised.  To deal with these situations I use a little helper that throws a Haskell exception from the current Python exception if given a NULL pointer:&#xA;&#xA;toPyObjectChecked :: RawPyObject -  IO PyObject&#xA;toPyObjectChecked = toPyObject   =  maybe throwCurrentPythonException return&#xA;&#xA;To obtain objects from the Python runtime, I define a bunch of functions to import modules, get attributes and call objects.  The implementations are boilerplate code, so I&#39;ll just show you the type signatures:&#xA;&#xA;importModule :: String -  IO PyObject&#xA;getAttr      :: PyObject -  String -  IO PyObject&#xA;callObject   :: PyObject -  [PyObject] -  [(PyObject, PyObject)] -  IO PyObject&#xA;&#xA;To convert these objects from Haskell, and to pass Haskell objects to Python, I use a little type class to convert a type to and from Python:&#xA;&#xA;class Object a where&#xA;  toPy   :: a -  IO PyObject&#xA;  fromPy :: PyObject -  IO a&#xA;&#xA;As I only need strings to call Pygments, there are only two instances for ByteString and String.&#xA;&#xA;To convert from a ByteString, I just need to obtain a temporary buffer from the byte string and pass that to Python:&#xA;&#xA;instance Object ByteString where&#xA;  toPy s = useAsCStringLen s $ \(buffer, len) -  pyStringFromStringAndSize buffer (fromIntegral len)     = toPyObjectChecked&#xA;&#xA;Converting back to a is a little more intricate, because Python needs addressable fields to take the raw bytes out of the underlying PyObject. Foreign.Marshal.Alloc.alloca comes to rescue and conveniently allocates addressable fields which I can then hand down to Python.  Python puts the memory address and size of the underlying string buffer into these fields, which I can then read with Foreign.Storable.peek to copy the entire byte sequence into an independent ByteString:&#xA;&#xA;  fromPy s =&#xA;    alloca $ \sbufferptr -  alloca $ \slenptr -  withPyObject s $ \raw -  do&#xA;      result &lt;- pyStringAsStringAndSize raw sbufferptr slenptr&#xA;      unless (result == 0) throwCurrentPythonException&#xA;      buffer &lt;- peek sbufferptr&#xA;      len &lt;- peek slenptr&#xA;      packCStringLen (buffer, fromIntegral len)&#xA;&#xA;Converting from a String almost looks like converting from a ByteString, except that we need to encode the String to UTF-8 to pass it to PyUnicodeFromStringAndSize, which expects a UTF-8 encoded char array. Converting back is simple as well, because I can build upon the ByteString conversion from above. I just need to turn the Python unicode object into an encoded char array with PyUnicodeAsUTF8String which I can then convert to a ByteString and decode:&#xA;&#xA;instance Object String where&#xA;  toPy s = useAsCStringLen (UTF8.fromString s) $ \(buffer, len) -  pyUnicodeFromStringAndSize buffer (fromIntegral len)     = toPyObjectChecked&#xA;  fromPy o = do&#xA;    s - withPyObject o pyUnicodeAsUTF8String   = toPyObjectChecked&#xA;    liftM UTF8.toString (fromPy s)&#xA;&#xA;Pygments interface&#xA;&#xA;Text.Highlighting.Pygments is the Pygments interface that builds upon this Python API.&#xA;&#xA;I start with some type aliases for Pygments types.  They don&#39;t add more type &#xA;safety, because Python is dynamically typed anyway, but they make the type signatures a little nicer:&#xA;&#xA;type Lexer     = PyObject&#xA;type Formatter = PyObject&#xA;&#xA;Then I wrap the required functions from Pygments into convenient Haskell functions. getLexerByName gives me the Pygments Lexer for the name of a programming language:&#xA;&#xA;getLexerByName :: String -  IO Lexer&#xA;getLexerByName name = do&#xA;  initialize False&#xA;  lexers &lt;- importModule &#34;pygments.lexers&#34;&#xA;  getlexerbyname &lt;- getAttr lexers &#34;getlexerbyname&#34;&#xA;  pyName &lt;- toPy name&#xA;  callObject getlexerbyname [pyName] []&#xA;&#xA;The function&#xA;&#xA;initializes the interpreter,&#xA;imports pgyments.lexers,&#xA;gets a reference to the underlying getlexerbyname function,&#xA;converts the given language to a Python object,&#xA;and ultimately calls getlexerbyname with the appropriate arguments.&#xA;&#xA;This function is as safe as it can be when dealing with a dynamically typed language:&#xA;&#xA;It will never try to use invalid objects, because Python operations never fail   silently.  If any Python call fails, e.g. because Pygments is not installed,  the Python interface throws a Haskell exception.&#xA;Even in case of an exception, the function does not leak memory.  All   references to Python objects are managed pointers which automatically free the   underlying Python object when they go out of scope, whether by a normal   return, or in case of an exception.&#xA;&#xA;highlight highlights a given piece of code with a lexer and formatter:&#xA;&#xA;highlight :: String -  Lexer -  Formatter -  IO String&#xA;highlight code lexer formatter = do&#xA;  initialize False&#xA;  pygments &lt;- importModule &#34;pygments&#34;&#xA;  pyhighlight &lt;- getAttr pygments &#34;highlight&#34;&#xA;  codeObj &lt;- toPy code&#xA;  callObject pyhighlight [codeObj, lexer, formatter] []     = fromPy&#xA;&#xA;With these convenient wrappers I am now able to implement toHtml:&#xA;&#xA;toHtml :: String -  String -  IO String&#xA;toHtml code language = do&#xA;  formatters &lt;- importModule &#34;pygments.formatters&#34;&#xA;  htmlformatter &lt;- getAttr formatters &#34;HtmlFormatter&#34;&#xA;  cssclasskey &lt;- toPy &#34;cssclass&#34;&#xA;  cssclass &lt;- toPy &#34;highlight&#34;&#xA;  formatter &lt;- callObject htmlformatter [] [(cssclasskey, cssclass)]&#xA;  lexer &lt;- getLexerByName language&#xA;  highlight code lexer formatter&#xA;&#xA;This function first creates an instance of the HtmlFormatter class, by importing the pygments.formatters module, obtaining a reference to the class object, and calling the class object with some options to create a new instance.&#xA;&#xA;Then it gets the lexer object, and passes these objects and the code to highlight.  The result is a string containing HTML to highlight the given code.&#xA;&#xA;Building &#xA;&#xA;I use Cabal to build these modules.  The corresponding cabal file is simple:&#xA;&#xA;executable lunarsite&#xA;  […]&#xA;  other-modules:       Foreign.Python&#xA;                       Foreign.Python.Native&#xA;                       Text.Highlighting.Pygments&#xA;                       Text.Highlighting.Pygments.Pandoc&#xA;  build-depends:       base   =4.6 &amp;&amp; &lt;4.8,&#xA;                       bytestring   =0.10 &amp;&amp; &lt; 0.11,&#xA;                       utf8-string   =0.3 &amp;&amp; &lt;0.4,&#xA;                       pandoc-types   =1.12 &amp;&amp; &lt;1.13,&#xA;                       pandoc   =1.12 &amp;&amp; &lt;1.13&#xA;  build-tools:         hsc2hs&#xA;&#xA;  if os(darwin)&#xA;     extra-libraries:   python2.7&#xA;     include-dirs:      /usr/include/python2.7&#xA;  else&#xA;     pkgconfig-depends: python ==2.7&#xA;&#xA;I enable hsc2hs in build-tools to compile Foreign.Python.Native, and tell Cabal to link against Python 2.7.&#xA;&#xA;pkg-config is missing on OS X, but since the layout of the pre-installed system Python is fixed anyway, I just hard-code the library name and the include directory.&#xA;&#xA;On other systems I just rely on Cabal&#39;s built-in support for pkg-config to automatically find the library name and the include directories for Python 2.7.&#xA;&#xA;Lessons learned &#xA;&#xA;Calling Python from Haskell was much, much easier than I thought, thanks to Haskell&#39;s good FFI, which takes over all marshaling of primitive types, and provides great utilities and helpers to marshal complex types.&#xA;&#xA;It would even been even easier, however, if the C API of Python 2.7 was a little better, and had a little more consistent reference count semantics, and if Haskell supported varargs functions in its FFI.&#xA;&#xA;While Python functions normally do not steal references and do not return borrowed references, there are some notable exceptions, which lead the entire idea to offer a consistent API ad absurdum, since you still need to check any function carefully for how it handles the reference counts in its arguments and return values.&#xA;&#xA;And since Haskell doesn&#39;t support foreign varargs functions, I often had to manually assemble complex Python objects such as argument tuples using the lower-level API, instead of just calling Py_BuildValue to build complex Python objects from C values directly.&#xA;&#xA;Despite these minor nuisances working with Haskell&#39;s FFI has been a really pleasant experience so far, and I&#39;m truly surprised that a language which is generally renowned for its advancement of computer science also excels at the dirty low-level task of calling C libraries.&#xA;&#xA;[hakyll]: https://jaspervdj.be/hakyll/&#xA;]]&gt;</description>
      <content:encoded><![CDATA[<p><a href="https://swsnr.writeas.com/tag:haskell" class="hashtag" rel="nofollow"><span>#</span><span class="p-category">haskell</span></a> <a href="https://swsnr.writeas.com/tag:python" class="hashtag" rel="nofollow"><span>#</span><span class="p-category">python</span></a></p>

<p>In a past version of this blog I used <a href="https://johnmacfarlane.net/pandoc/" rel="nofollow">Pandoc</a> to convert Markdown to HTML. It&#39;s by far the best and most powerful markdown converter, but it has one, albeit little weakness: Its syntax highlighting is based <a href="https://hackage.haskell.org/package/highlighting-kate" rel="nofollow">highlighting-kate</a>, which is less good and supports less languages than the Python library <a href="https://pygments.org/" rel="nofollow">Pygments</a>, the de-facto standard highlighter used by Github and others.</p>

<p>It&#39;s easy to implement custom highlighting thanks to the great API of Pandoc, with just two functions in <strong>Text.Highlighting.Pygments.Pandoc</strong>:</p>



<pre><code class="language-haskell">import Text.Highlighting.Pygments (toHtml)

blockToHtml :: Block -&gt; IO Block
blockToHtml x@(CodeBlock attr _) | attr == nullAttr = return x
blockToHtml x@(CodeBlock (_,[],_) _) = return x
blockToHtml (CodeBlock (_,language:_,_) text) = do
  colored &lt;- toHtml text language
  return (RawBlock (Format &#34;html&#34;) colored)
blockToHtml x = return x

codeBlocksToHtml :: Pandoc -&gt; IO Pandoc
codeBlocksToHtml = walkM blockToHtml
</code></pre>

<p>This code transforms all code blocks to a raw HTML block containing the code highlighted by Pygments.  The language used in the code block is taken from the first unnamed attribute of the code block, just like in Github&#39;s markdown dialect.  Code blocks which do not specify a language are not touched.</p>

<p>So far I just went the easy way, and called the <a href="https://pygments.org/docs/cmdline/" rel="nofollow"><code>pygmentize</code> script</a> in <code>toHtml</code>, passing the code to be highlighted on stdin, and reading the result from stdout.  While this is easy to implement with just a few lines it also slows down the build considerably.</p>

<p>Last weekend I sat down and tried to call Pygments directly via Python&#39;s C API through Haskell&#39;s FFI.  This is what came out of this adventure.</p>

<h2 id="native-wrappers" id="native-wrappers">Native wrappers</h2>

<p><strong>Foreign.Python.Native</strong> is an <a href="https://www.haskell.org/ghc/docs/7.6.3/html/users_guide/hsc2hs.html" rel="nofollow">hsc2hs</a> module which imports the required functions from Python&#39;s C API and declares corresponding Haskell signatures.</p>

<p>The module also declares the necessary types, using a special <code>hsc2hs</code> feature to automatically derive the right Haskell type for a given C type.  For instance, the following declaration declares an appropriate Haskell alias for Python&#39;s <code>Py_ssize_t</code>, so I didn&#39;t need to grok the header files for the typedef:</p>

<pre><code class="language-haskell">type PySSizeT = #type Py_ssize_t
</code></pre>

<p>I also use the <code>CApiFFI</code> extension to avoid the hassle of finding out whether to import the UCS2 or the UCS4 API of CPython.  Instead, I just import the macro API and let GHC figure out the rest:</p>

<pre><code class="language-haskell">foreign import capi unsafe &#34;Python.h PyUnicode_AsUTF8String&#34;
  pyUnicode_AsUTF8String :: RawPyObject -&gt; IO RawPyObject

foreign import capi unsafe &#34;Python.h PyUnicode_FromStringAndSize&#34;
  pyUnicode_FromStringAndSize :: CString -&gt; PySSizeT -&gt; IO RawPyObject
</code></pre>

<p>GHC automatically generates a wrapper C functions for these macros, and  figures out whether to link <code>PyUnicodeUCS2_AsUTF8String</code> or <code>PyUnicodeUCS4_AsUTF8String</code>.</p>

<h2 id="convenient-haskell-api" id="convenient-haskell-api">Convenient Haskell API</h2>

<p><strong>Foreign.Python</strong> is the convenient Haskell API around the <a href="#native-wrappers" rel="nofollow">native Python functions</a>.</p>

<hr/>

<p><strong>Update</strong> <em>(April 16, 2014)</em>: I changed <code>toPyObject</code> to return <code>Nothing</code> if given a null pointer, for increased safety.  Before <code>toPyObject</code> would simply wrap the given pointer, whether <code>NULL</code> or not.  The definition of <code>toPyObjectChecked</code> was updated too.</p>

<p>While wrapping a <code>NULL</code> pointer in a managed pointer doesn&#39;t do any harm in and by itself, because the dereferencing functions from Python are safe to call with <code>NULL</code>, it was still possible to try and use the pointer, e.g. by trying to call the underlying Python object, and thus trigger a segfault.</p>

<p>Now it&#39;s impossible to obtain a <code>PyObject</code> from <code>NULL</code>, increasing the safety of my Python API.</p>

<hr/>

<p>I use <code>ForeignPtr</code> to wrap the raw <code>PyObject</code> pointers into an opaque Haskell type which automatically calls <code>Py_XDECREF</code> on the underlying <code>PyObject</code> when it goes out of scope:</p>

<pre><code class="language-haskell">newtype PyObject = PyObject (ForeignPtr ())

toPyObject :: RawPyObject -&gt; IO (Maybe PyObject)
toPyObject raw | raw == nullPtr  = return Nothing
toPyObject raw = liftM (Just . PyObject) (newForeignPtr pyDecRef raw)

withPyObject :: PyObject -&gt; (RawPyObject -&gt; IO a) -&gt; IO a
withPyObject (PyObject ptr) = withForeignPtr ptr
</code></pre>

<p>Only the opaque type is exported from the module, so outside code never has any chance of messing with the underlying C object and bypassing the garbage
collector.</p>

<p>Many CPython functions return <code>NULL</code> to indicate that the operation failed and a Python exception was raised.  To deal with these situations I use a little helper that throws a Haskell exception from the current Python exception if given a <code>NULL</code> pointer:</p>

<pre><code class="language-haskell">toPyObjectChecked :: RawPyObject -&gt; IO PyObject
toPyObjectChecked = toPyObject &gt;=&gt; maybe throwCurrentPythonException return
</code></pre>

<p>To obtain objects from the Python runtime, I define a bunch of functions to import modules, get attributes and call objects.  The implementations are boilerplate code, so I&#39;ll just show you the type signatures:</p>

<pre><code class="language-haskell">importModule :: String -&gt; IO PyObject
getAttr      :: PyObject -&gt; String -&gt; IO PyObject
callObject   :: PyObject -&gt; [PyObject] -&gt; [(PyObject, PyObject)] -&gt; IO PyObject
</code></pre>

<p>To convert these objects from Haskell, and to pass Haskell objects to Python, I use a little type class to convert a type to and from Python:</p>

<pre><code class="language-haskell">class Object a where
  toPy   :: a -&gt; IO PyObject
  fromPy :: PyObject -&gt; IO a
</code></pre>

<p>As I only need strings to call Pygments, there are only two instances for <code>ByteString</code> and <code>String</code>.</p>

<p>To convert from a <code>ByteString</code>, I just need to obtain a temporary buffer from the byte string and pass that to Python:</p>

<pre><code class="language-haskell">instance Object ByteString where
  toPy s = useAsCStringLen s $ \(buffer, len) -&gt;
    pyString_FromStringAndSize buffer (fromIntegral len) &gt;&gt;= toPyObjectChecked
</code></pre>

<p>Converting back to a is a little more intricate, because Python needs addressable fields to take the raw bytes out of the underlying <code>PyObject</code>. <code>Foreign.Marshal.Alloc.alloca</code> comes to rescue and conveniently allocates addressable fields which I can then hand down to Python.  Python puts the memory address and size of the underlying string buffer into these fields, which I can then read with <code>Foreign.Storable.peek</code> to copy the entire byte sequence into an independent <code>ByteString</code>:</p>

<pre><code class="language-haskell">  fromPy s =
    alloca $ \s_buffer_ptr -&gt;
    alloca $ \s_len_ptr -&gt;
    withPyObject s $ \raw -&gt; do
      result &lt;- pyString_AsStringAndSize raw s_buffer_ptr s_len_ptr
      unless (result == 0) throwCurrentPythonException
      buffer &lt;- peek s_buffer_ptr
      len &lt;- peek s_len_ptr
      packCStringLen (buffer, fromIntegral len)
</code></pre>

<p>Converting from a <code>String</code> almost looks like converting from a <code>ByteString</code>, except that we need to encode the <code>String</code> to UTF-8 to pass it to <code>PyUnicode_FromStringAndSize</code>, which expects a UTF-8 encoded char array. Converting back is simple as well, because I can build upon the <code>ByteString</code> conversion from above. I just need to turn the Python unicode object into an encoded char array with <code>PyUnicode_AsUTF8String</code> which I can then convert to a <code>ByteString</code> and decode:</p>

<pre><code class="language-haskell">instance Object String where
  toPy s = useAsCStringLen (UTF8.fromString s) $ \(buffer, len) -&gt;
    pyUnicode_FromStringAndSize buffer (fromIntegral len) &gt;&gt;= toPyObjectChecked
  fromPy o = do
    s &lt;- withPyObject o pyUnicode_AsUTF8String &gt;&gt;= toPyObjectChecked
    liftM UTF8.toString (fromPy s)
</code></pre>

<h2 id="pygments-interface" id="pygments-interface">Pygments interface</h2>

<p><strong>Text.Highlighting.Pygments</strong> is the Pygments interface that builds upon this <a href="#convenient-haskell-api" rel="nofollow">Python API</a>.</p>

<p>I start with some type aliases for Pygments types.  They don&#39;t add more type
safety, because Python is dynamically typed anyway, but they make the type signatures a little nicer:</p>

<pre><code class="language-haskell">type Lexer     = PyObject
type Formatter = PyObject
</code></pre>

<p>Then I wrap the required functions from Pygments into convenient Haskell functions. <code>getLexerByName</code> gives me the Pygments Lexer for the name of a programming language:</p>

<pre><code class="language-haskell">getLexerByName :: String -&gt; IO Lexer
getLexerByName name = do
  initialize False
  lexers &lt;- importModule &#34;pygments.lexers&#34;
  get_lexer_by_name &lt;- getAttr lexers &#34;get_lexer_by_name&#34;
  pyName &lt;- toPy name
  callObject get_lexer_by_name [pyName] []
</code></pre>

<p>The function</p>
<ol><li>initializes the interpreter,</li>
<li>imports <code>pgyments.lexers</code>,</li>
<li>gets a reference to the underlying <code>get_lexer_by_name</code> function,</li>
<li>converts the given <code>language</code> to a Python object,</li>
<li>and ultimately calls <code>get_lexer_by_name</code> with the appropriate arguments.</li></ol>

<p>This function is as safe as it can be when dealing with a dynamically typed language:</p>
<ul><li>It will never try to use invalid objects, because Python operations never fail   silently.  If any Python call fails, e.g. because Pygments is not installed,  the Python interface throws a Haskell exception.</li>
<li>Even in case of an exception, the function does not leak memory.  All   references to Python objects are managed pointers which automatically free the   underlying Python object when they go out of scope, whether by a normal   return, or in case of an exception.</li></ul>

<p><code>highlight</code> highlights a given piece of code with a lexer and formatter:</p>

<pre><code class="language-haskell">highlight :: String -&gt; Lexer -&gt; Formatter -&gt; IO String
highlight code lexer formatter = do
  initialize False
  pygments &lt;- importModule &#34;pygments&#34;
  py_highlight &lt;- getAttr pygments &#34;highlight&#34;
  codeObj &lt;- toPy code
  callObject py_highlight [codeObj, lexer, formatter] [] &gt;&gt;= fromPy
</code></pre>

<p>With these convenient wrappers I am now able to implement <code>toHtml</code>:</p>

<pre><code class="language-haskell">toHtml :: String -&gt; String -&gt; IO String
toHtml code language = do
  formatters &lt;- importModule &#34;pygments.formatters&#34;
  html_formatter &lt;- getAttr formatters &#34;HtmlFormatter&#34;
  cssclass_key &lt;- toPy &#34;cssclass&#34;
  cssclass &lt;- toPy &#34;highlight&#34;
  formatter &lt;- callObject html_formatter [] [(cssclass_key, cssclass)]
  lexer &lt;- getLexerByName language
  highlight code lexer formatter
</code></pre>

<p>This function first creates an instance of the <code>HtmlFormatter</code> class, by importing the <code>pygments.formatters</code> module, obtaining a reference to the class object, and calling the class object with some options to create a new instance.</p>

<p>Then it gets the lexer object, and passes these objects and the code to <code>highlight</code>.  The result is a string containing HTML to highlight the given <code>code</code>.</p>

<h2 id="building" id="building">Building</h2>

<p>I use Cabal to build these modules.  The corresponding cabal file is simple:</p>

<pre><code>executable lunarsite
  […]
  other-modules:       Foreign.Python
                       Foreign.Python.Native
                       Text.Highlighting.Pygments
                       Text.Highlighting.Pygments.Pandoc
  build-depends:       base &gt;=4.6 &amp;&amp; &lt;4.8,
                       bytestring &gt;=0.10 &amp;&amp; &lt; 0.11,
                       utf8-string &gt;=0.3 &amp;&amp; &lt;0.4,
                       pandoc-types &gt;=1.12 &amp;&amp; &lt;1.13,
                       pandoc &gt;=1.12 &amp;&amp; &lt;1.13
  build-tools:         hsc2hs

  if os(darwin)
     extra-libraries:   python2.7
     include-dirs:      /usr/include/python2.7
  else
     pkgconfig-depends: python ==2.7
</code></pre>

<p>I enable <code>hsc2hs</code> in <code>build-tools</code> to compile <a href="#native-wrappers" rel="nofollow">Foreign.Python.Native</a>, and tell Cabal to link against Python 2.7.</p>

<p><code>pkg-config</code> is missing on OS X, but since the layout of the pre-installed system Python is fixed anyway, I just hard-code the library name and the include directory.</p>

<p>On other systems I just rely on Cabal&#39;s built-in support for <code>pkg-config</code> to automatically find the library name and the include directories for Python 2.7.</p>

<h2 id="lessons-learned" id="lessons-learned">Lessons learned</h2>

<p>Calling Python from Haskell was much, much easier than I thought, thanks to Haskell&#39;s good FFI, which takes over all marshaling of primitive types, and provides great utilities and helpers to marshal complex types.</p>

<p>It would even been even easier, however, if the C API of Python 2.7 was a little better, and had a little more consistent reference count semantics, and if Haskell supported varargs functions in its FFI.</p>

<p>While Python functions normally do not steal references and do not return borrowed references, there are some notable exceptions, which lead the entire idea to offer a consistent API ad absurdum, since you still need to check any function carefully for how it handles the reference counts in its arguments and return values.</p>

<p>And since Haskell doesn&#39;t support foreign varargs functions, I often had to manually assemble complex Python objects such as argument tuples using the lower-level API, instead of just calling <code>Py_BuildValue</code> to build complex Python objects from C values directly.</p>

<p>Despite these minor nuisances working with Haskell&#39;s FFI has been a really pleasant experience so far, and I&#39;m truly surprised that a language which is generally renowned for its advancement of computer science also excels at the dirty low-level task of calling C libraries.</p>
]]></content:encoded>
      <guid>https://swsnr.writeas.com/calling-python-from-haskell</guid>
      <pubDate>Mon, 14 Apr 2014 22:00:00 +0000</pubDate>
    </item>
  </channel>
</rss>