This error is reported occasionally, which you can see in Google search results. There hasn't been a clear solution.
Cabal is the build tool that everyone here probably loves/hates.
zlib is a compression library, and what it is saying is that it was given some compressed data that stopped unexpectedly.
I dug into this and what is going on, at least in this particular case, is nothing to do with cabal or zlib specifically.
The same problem can be reproduced without cabal:
We don't even need to talk to hackage to see this problem. Getting any large file demonstrates the problem.
So it looks like somewhere in the bowels of Network.HTTP, the response is being truncated.
Changing from lazy bytestrings to strict ones makes this work...
Maybe there's a race condition here? In my mind, laziness in an I/O context is associated with those.
Let's put that to one side now and talk about...
We've got several protocols in a stack. At the bottom is IP, the internet protocol; then TCP the transmission control protocol; and above that HTTP, the Hypertext Transfer Protocol.
Each layer provides services used by the layer above.
At the bottom, IP delivers datagrams, smallish packets of data, across the internet without making many guarantees about how they are going to be delivered.
Above that, TCP provides a reliable, ordered byte stream between two computers.
And HTTP deals with things like retrieving contents from URLs, over TCP.
The internet should just pass IP packets back and forth between the stacks, without knowing anything about the higher level TCP and HTTP protocols.
This is called the end-to-end principle.
On the troublesome network, there is a middlebox between the client and the internet. This middlebox intercepts all of the HTTP traffic by acting as an HTTP server, and making requests to the real target server on your behalf.
Reasons to do this are to force caching, filter malware, and to censor undesirable content.
So this is part of the explanation of why the behaviour is different on this specific network: the web server we are talking to is a different web server, and behaves differently.
The HTTP response from the middleware box looks a bit different from the HTTP response from the hackage server.
The relevant part here is this Connection: close
header. It triggers a different code path in the Haskell HTTP
client library.
Normally a TCP connection can be re-used for many HTTP requests
in a row; Connection: close
means that a TCP connection
should only be used for one HTTP request/response and then closed; and
new HTTP requests should happen on new TCP connections.
This close option is pretty rare, so I wondered if there was a bug in Network.HTTP related to this.
So I configured my test web server to disable keep alives and
return a Connection: close
header, to see if I could
reproduce this away from the misbehaving network. I couldn't, but
this is still relevant.
So let's dig a bit deeper down the stack into TCP behaviour. HTTP uses TCP to provide a reliable stream of data between computers; and TCP does that by sending packets using IP.
So what does that look like?
... but this looks different when we use the lazy bytestring implementation ...
There's a fairly subtle change here: the client to server FIN is sent right around the time the HTTP response is delivered. On most networks, this is fine - the client to server half of the connection is closed, but the server to client half of the connection is still open and our data still arrives.
But in the case of the misbehaving network, the the HTTP session gets terminated pretty much as soon as this FIN arrives - the middlebox web server is (mis)interpreting the FIN to mean "close the connection right now, stop stop stop!!!". The connection closes and Network.HTTP assumes that is all the data.
This is, I think, a bug in the middlebox, and the only actual bug in all of this.
There is one other bit of misbehaviour, this time on the part of Network.HTTP:
We've got a content length header telling us how long the response is, in bytes. Network.HTTP could have recognised that this was not the same as the number of bytes it actually received, and thrown an error of some kinds.
This wouldn't fix our high level problem but might have given more useful clues for debugging this.
So in the strict case, everything gets read before we start looking at headers and deciding to close.
But, in the lazy case, we only force as much to be
read as we need - asking if there is a Connection: close
header forces enough of the headers to decide that, and then we
close, leaving the rest to be read lazily. And that manifests as
on-the-wire behaviour.