Friday, October 2, 2020

Azure Files Storage Development Part III

We've switched to using Azure Files for file storage with our ResearchStory web application. I've already written a couple of posts (here and here) about some of the things I've figured out along the way. The most recent issue I sadly discovered only after pushing code from development into production. Turns out there is a 4MB upload limit. Files can be much larger in the cloud but you need to transfer them up in 4MB (or smaller) chunks. Of course all of the files I tested with during dev were smaller than this.

Once in production I started getting RequestBodyTooLarge exceptions:

The request body is too large and exceeds the maximum permissible limit.

Searching on that exception you get this page and explanation:

Cause - There's a 4-MB limit for each call to the Azure Storage service. If your file is larger than 4 MB, you must break it in chunks.

Trying to track down possible fixes, I started with the Azure Files documentation and specially the v12.x client libraries. The overall documentation is pretty sparse. There are some basic examples but nothing complex. The Azure.Storage.Files.Shares reference documentation enumerates all of the classes, methods and parameters but just gives simple descriptions for each of them.

For example the first two parameters for the UploadRangeAsync method are commented as follows:

range   HttpRange
Specifies the range of bytes to be written. Both the start and end of the range must be specified.

content Stream
A Stream containing the content of the range to upload.

The rest of the descriptions are also this useless and there is no mention on this page of the 4MB limit (at least at the time of this blog post). This limit is documented in the REST API put-range reference but it is not documented for any of the SDK Upload* methods. There are also no examples of working around this.

After much trial and error I was able to create the following method to work around the file upload limits. In the code below _dirClient is an already initialized ShareDirectoryClient set to the folder I'm uploading to.

If the incoming stream is larger than 4MB the code reads 4MB chunks from it and uploads them until done. The HttpRange parameter specifies where the bytes will be added to the file already uploaded to Azure. The index has to be incremented to point to the end of the Azure file so the new bytes will be appended.

public async Task WriteFileAsync(string filename, Stream stream) {

    //  Azure allows for 4MB max uploads  (4 x 1024 x 1024 = 4194304)
    const int uploadLimit = 4194304;

    stream.Seek(0, SeekOrigin.Begin);   // ensure stream is at the beginning
    var fileClient = await _dirClient.CreateFileAsync(filename, stream.Length);

    // If stream is below the limit upload directly
    if (stream.Length <= uploadLimit) {
        await fileClient.Value.UploadRangeAsync(new HttpRange(0, stream.Length), stream);
        return;
    }

    int bytesRead;
    long index = 0;
    byte[] buffer = new byte[uploadLimit];

    // Stream is larger than the limit so we need to upload in chunks
    while ((bytesRead = stream.Read(buffer, 0, buffer.Length)) > 0) {
        // Create a memory stream for the buffer to upload
        using MemoryStream ms = new MemoryStream(buffer, 0, bytesRead);
        await fileClient.Value.UploadRangeAsync(new HttpRange(index, ms.Length), ms);
        index += ms.Length; // increment the index to the account for bytes already written
    }
}

Azure Files seems like a pretty good product but the SDK documentation and examples are lacking. Hopefully this improves in the future. Until then maybe this post can help someone else.

No comments: