Friday, October 2, 2020

Azure Files Storage Development Part III

We've switched to using Azure Files for file storage with our ResearchStory web application. I've already written a couple of posts (here and here) about some of the things I've figured out along the way. The most recent issue I sadly discovered only after pushing code from development into production. Turns out there is a 4MB upload limit. Files can be much larger in the cloud but you need to transfer them up in 4MB (or smaller) chunks. Of course all of the files I tested with during dev were smaller than this.

Once in production I started getting RequestBodyTooLarge exceptions:

The request body is too large and exceeds the maximum permissible limit.

Searching on that exception you get this page and explanation:

Cause - There's a 4-MB limit for each call to the Azure Storage service. If your file is larger than 4 MB, you must break it in chunks.

Trying to track down possible fixes, I started with the Azure Files documentation and specially the v12.x client libraries. The overall documentation is pretty sparse. There are some basic examples but nothing complex. The Azure.Storage.Files.Shares reference documentation enumerates all of the classes, methods and parameters but just gives simple descriptions for each of them.

For example the first two parameters for the UploadRangeAsync method are commented as follows:

range   HttpRange
Specifies the range of bytes to be written. Both the start and end of the range must be specified.

content Stream
A Stream containing the content of the range to upload.

The rest of the descriptions are also this useless and there is no mention on this page of the 4MB limit (at least at the time of this blog post). This limit is documented in the REST API put-range reference but it is not documented for any of the SDK Upload* methods. There are also no examples of working around this.

After much trial and error I was able to create the following method to work around the file upload limits. In the code below _dirClient is an already initialized ShareDirectoryClient set to the folder I'm uploading to.

If the incoming stream is larger than 4MB the code reads 4MB chunks from it and uploads them until done. The HttpRange parameter specifies where the bytes will be added to the file already uploaded to Azure. The index has to be incremented to point to the end of the Azure file so the new bytes will be appended.

public async Task WriteFileAsync(string filename, Stream stream) {

    //  Azure allows for 4MB max uploads  (4 x 1024 x 1024 = 4194304)
    const int uploadLimit = 4194304;

    stream.Seek(0, SeekOrigin.Begin);   // ensure stream is at the beginning
    var fileClient = await _dirClient.CreateFileAsync(filename, stream.Length);

    // If stream is below the limit upload directly
    if (stream.Length <= uploadLimit) {
        await fileClient.Value.UploadRangeAsync(new HttpRange(0, stream.Length), stream);
        return;
    }

    int bytesRead;
    long index = 0;
    byte[] buffer = new byte[uploadLimit];

    // Stream is larger than the limit so we need to upload in chunks
    while ((bytesRead = stream.Read(buffer, 0, buffer.Length)) > 0) {
        // Create a memory stream for the buffer to upload
        using MemoryStream ms = new MemoryStream(buffer, 0, bytesRead);
        await fileClient.Value.UploadRangeAsync(new HttpRange(index, ms.Length), ms);
        index += ms.Length; // increment the index to the account for bytes already written
    }
}

Azure Files seems like a pretty good product but the SDK documentation and examples are lacking. Hopefully this improves in the future. Until then maybe this post can help someone else.

Thursday, October 1, 2020

Azure Files Storage Development Part II

Since posting last time about using two file separate subfolders for development and production I've decided to modify the approach somewhat. I still want to be able to develop and test directly against Azure but realize that there are times when I might not want to use the cloud for development. One scenario is that I might be disconnected from the internet and I don't want to be unable to develop and debug code. The other consideration is cost. While file storage is fairly cheap on Azure it still costs more than just using my local hard drive.

So (with some inspiration from this blog post) I decided to create an abstraction layer. I created a generic storage interface for the basic storage functionality I needed (list files/create/delete/read/write) and then specific wrappers for Azure Files and local files.

Here is a simple UML for the StorageFolder interface:

IStorageFolder UML

In Startup.cs I can either instantiate one of the concrete implementations based on the environment and then use Dependency Injection with the interface.

services.AddScoped<IStorageClient, AzureStorageClient>(client => {
	var shareName = WebEnvironment.IsDevelopment() ? "dev" : "prod";
	var connectionString = Configuration.GetConnectionString("StorageConnection");
	var shareClient = new ShareClient(connectionString, shareName);
	return new AzureStorageClient(shareClient);
});

I've uploaded the full implementation at this Github Gist but here are the two implementations for creating a subfolder

Here is the Azure Files implementation. The StorageFolder class is instantiated pointing to a specific folder. So creating a subfolder is relative to that directory.

public class AzureFolder : IStorageFolder {
	private readonly ShareDirectoryClient _dirClient;

	public AzureFolder(ShareDirectoryClient directoryClient) {
		_dirClient = directoryClient;
	}

	public async Task<IStorageFolder> CreateSubfolderAsync(string folderName) {
		var directoryClient = await _dirClient.CreateSubdirectoryAsync(folderName);
		return await Task.FromResult(new AzureFolder(directoryClient.Value));
	}
}

Here is the local storage implementation. Since most of the calling code is from a website I wanted to use async where available. The local file API doesn't use async but using Task.Run() you can simulate that behavior and allow for a common interface.

public class LocalFolder : IStorageFolder {
	private readonly DirectoryInfo _dirInfo;

	public LocalFolder(DirectoryInfo dirInfo) {
		_dirInfo = dirInfo;
	}

	public async Task<IStorageFolder> CreateSubfolderAsync(string folderName) {
		var path = Path.Combine(_dirInfo.FullName, folderName);
		await Task.Run(() => Directory.CreateDirectory(path));
		return new LocalFolder(new DirectoryInfo(path));
	}
}