Zope Care Day

Today has been decreed Zope Care Day. Andy asked me to figure out
why PUT request would bloat Zope’s memory. We made a bet that if I
fixed the issue before he woke up he would eat his laptop without
salt!

Going to log whatever I find here so I don’t keep track of all the
findings.

The whole samba starts in ZServer/HTTPServer.py,
zhttp_collector.
- If the request size (?) is bigger than 524288 bytes (?) it
  uses a TemporaryFile to store the request data. Otherwise, it
  uses a cStringIO.StringIO. Fair enough, though I suppose that
  threshold could be smaller.
However, it just uses zhttp_collector if a CONTENT_LENGTH
header is found! (zhttp_handler.handle_request)
That suggests that if the client (in our case davlib.py) doesn’t
set Content-Length, a cStringIO.StringIO will be blindly
created.
davlib.py (at least our modified version) seems to set
Content-Length properly (and so does cadaver), so Zope is
creating a temporary file as expected.
The next thing that happens is that the request is passed through
cgi.FieldStorage, which creates yet another temporary file
by reading the one zhttp_collector had created. This far,
nothing read the whole file in memory, which is cool.
Next thing is traversing to a resource and calling it’s PUT
method.
- If the resource doesn’t yet exists:
  - A webdav.NullResource object is created, and it’s PUT
    method is called.
  - It looks for a Content-Type header on the request. If that’s
    not found, it tries to guess the content type from the
    filename. If that doesn’t happen, it tries a re.search to
    figure out if its a binary file. Humm… which seems like it
    will fail if the uploaded file is big as it will receive a file
    object here??
  - PUT looks for a PUT_factory method on the parent object,
    and if that’s not found, it uses
    NullResource._default_PUT_factory, which will:
    - Create a ZopePageTempate for a file ending with .pt
    - Create a DTMLDocument for anything with content-type of
      text/html, text/xml or text/plain
    - Create a OFS.Image for anything with content-type
      image/*
    - Create a OFS.File for anything else.
  - When inside CMF/Plone, PortalFolder implements
    PUT_factory and delegates to content_type_registry. That
    one may be reading the whole file in memory. Add note to
    check later.
- After PUT_factory is called, everything behaves as if the file
  already existed.
- The next step is delegating to the existing resource, or
  newly-created object PUT method.
- When using OFS.File, Zope seems to behave exceptionally
  well. Here’s what happens:
  - The request body is read in 64K chunks into a linked list of
    Pdata objects.
  - The Pdata objects get a _p_jar immediately, and a
    subtransaction is triggered.
  - As the subtransaction is triggered, a TmpStore object is
    created to hold the transaction data temporarily.
  - The TmpStore creates yet another temporary file.
  - When the real transaction is commited, all the info on the
    TmpStore is copied over to the real storage.

Conclusion so far: Zope seems to be able to handle large files
correctly out of the box. The problem may lie somewhere inside
CMF/Plone.

Update: Found two places where Zope was reading the whole file in
memory.

NullResource.PUT does a REQUEST.get('BODY', ''), which reads the
file into a string, thus loading the whole thing in memory.
Still in NullResource.PUT, after the object is created but
before it is stitched into the storage, the PUT method for
the object is called. OFS.File though reads the whole file
into a single Pdata object if a _p_jar is not found.

Here’s a patch for both problems.

Next step is trying the same thing in the context of a CMF/Plone site.

3 thoughts on “Zope Care Day”

Chui Tey says:

October 28, 2004 at 8:47 pm

guess_content_type(body)
Wouldn’t that fail? Since you’ve set body to Nothing?

Sidnei da Silva says:

October 29, 2004 at 12:12 am

Nope
BODY there is just used to deciding the content type, and even then just if it fails to guess the content type from the filename. Next thing on my tasklist is to fix that. It really shouldn’t try to look at BODY unless really really needed.

Darryl Dixon says:

April 4, 2007 at 8:49 pm

Thanks heaps for this post, especially this quote:
“””
The request body is read in 64K chunks into a linked list of
Pdata objects.
“””
Which confirms my observations and accounts for the inability to pickle any object greater than ~31MB (32000/64 = 500 chunks, and requires 1000 stack recursions to pickle with Pickler())

Now I can write a sensible testcase, yay :)

D

Zope Care Day

Related

3 thoughts on “Zope Care Day”

Leave a comment Cancel reply