Friday, October 2, 2020

Azure Files Storage Development Part III

We've switched to using Azure Files for file storage with our ResearchStory web application. I've already written a couple of posts (here and here) about some of the things I've figured out along the way. The most recent issue I sadly discovered only after pushing code from development into production. Turns out there is a 4MB upload limit. Files can be much larger in the cloud but you need to transfer them up in 4MB (or smaller) chunks. Of course all of the files I tested with during dev were smaller than this.

Once in production I started getting RequestBodyTooLarge exceptions:

The request body is too large and exceeds the maximum permissible limit.

Searching on that exception you get this page and explanation:

Cause - There's a 4-MB limit for each call to the Azure Storage service. If your file is larger than 4 MB, you must break it in chunks.

Trying to track down possible fixes, I started with the Azure Files documentation and specially the v12.x client libraries. The overall documentation is pretty sparse. There are some basic examples but nothing complex. The Azure.Storage.Files.Shares reference documentation enumerates all of the classes, methods and parameters but just gives simple descriptions for each of them.

For example the first two parameters for the UploadRangeAsync method are commented as follows:

range   HttpRange
Specifies the range of bytes to be written. Both the start and end of the range must be specified.

content Stream
A Stream containing the content of the range to upload.

The rest of the descriptions are also this useless and there is no mention on this page of the 4MB limit (at least at the time of this blog post). This limit is documented in the REST API put-range reference but it is not documented for any of the SDK Upload* methods. There are also no examples of working around this.

After much trial and error I was able to create the following method to work around the file upload limits. In the code below _dirClient is an already initialized ShareDirectoryClient set to the folder I'm uploading to.

If the incoming stream is larger than 4MB the code reads 4MB chunks from it and uploads them until done. The HttpRange parameter specifies where the bytes will be added to the file already uploaded to Azure. The index has to be incremented to point to the end of the Azure file so the new bytes will be appended.

public async Task WriteFileAsync(string filename, Stream stream) {

    //  Azure allows for 4MB max uploads  (4 x 1024 x 1024 = 4194304)
    const int uploadLimit = 4194304;

    stream.Seek(0, SeekOrigin.Begin);   // ensure stream is at the beginning
    var fileClient = await _dirClient.CreateFileAsync(filename, stream.Length);

    // If stream is below the limit upload directly
    if (stream.Length <= uploadLimit) {
        await fileClient.Value.UploadRangeAsync(new HttpRange(0, stream.Length), stream);
        return;
    }

    int bytesRead;
    long index = 0;
    byte[] buffer = new byte[uploadLimit];

    // Stream is larger than the limit so we need to upload in chunks
    while ((bytesRead = stream.Read(buffer, 0, buffer.Length)) > 0) {
        // Create a memory stream for the buffer to upload
        using MemoryStream ms = new MemoryStream(buffer, 0, bytesRead);
        await fileClient.Value.UploadRangeAsync(new HttpRange(index, ms.Length), ms);
        index += ms.Length; // increment the index to the account for bytes already written
    }
}

Azure Files seems like a pretty good product but the SDK documentation and examples are lacking. Hopefully this improves in the future. Until then maybe this post can help someone else.

Thursday, October 1, 2020

Azure Files Storage Development Part II

Since posting last time about using two file separate subfolders for development and production I've decided to modify the approach somewhat. I still want to be able to develop and test directly against Azure but realize that there are times when I might not want to use the cloud for development. One scenario is that I might be disconnected from the internet and I don't want to be unable to develop and debug code. The other consideration is cost. While file storage is fairly cheap on Azure it still costs more than just using my local hard drive.

So (with some inspiration from this blog post) I decided to create an abstraction layer. I created a generic storage interface for the basic storage functionality I needed (list files/create/delete/read/write) and then specific wrappers for Azure Files and local files.

Here is a simple UML for the StorageFolder interface:

IStorageFolder UML

In Startup.cs I can either instantiate one of the concrete implementations based on the environment and then use Dependency Injection with the interface.

services.AddScoped<IStorageClient, AzureStorageClient>(client => {
	var shareName = WebEnvironment.IsDevelopment() ? "dev" : "prod";
	var connectionString = Configuration.GetConnectionString("StorageConnection");
	var shareClient = new ShareClient(connectionString, shareName);
	return new AzureStorageClient(shareClient);
});

I've uploaded the full implementation at this Github Gist but here are the two implementations for creating a subfolder

Here is the Azure Files implementation. The StorageFolder class is instantiated pointing to a specific folder. So creating a subfolder is relative to that directory.

public class AzureFolder : IStorageFolder {
	private readonly ShareDirectoryClient _dirClient;

	public AzureFolder(ShareDirectoryClient directoryClient) {
		_dirClient = directoryClient;
	}

	public async Task<IStorageFolder> CreateSubfolderAsync(string folderName) {
		var directoryClient = await _dirClient.CreateSubdirectoryAsync(folderName);
		return await Task.FromResult(new AzureFolder(directoryClient.Value));
	}
}

Here is the local storage implementation. Since most of the calling code is from a website I wanted to use async where available. The local file API doesn't use async but using Task.Run() you can simulate that behavior and allow for a common interface.

public class LocalFolder : IStorageFolder {
	private readonly DirectoryInfo _dirInfo;

	public LocalFolder(DirectoryInfo dirInfo) {
		_dirInfo = dirInfo;
	}

	public async Task<IStorageFolder> CreateSubfolderAsync(string folderName) {
		var path = Path.Combine(_dirInfo.FullName, folderName);
		await Task.Run(() => Directory.CreateDirectory(path));
		return new LocalFolder(new DirectoryInfo(path));
	}
}

Wednesday, September 23, 2020

Azure Files Storage Development

We've been using an App Service in Azure since first developing our site (ResearchStory). An App Service allows you to run your code in the cloud with out having to maintain a full server or virtual machine. You get local storage but it's not unlimited (there is different storage levels by tier see: Azure App Service Tiers). It's also not easy to manage remotely. You can use Kudu and extensions (like Azure Web Apps Disk Usage) to view the data in a web browser but it's still a bit disconnected.

So we've decided to add Azure Files for storage of logs and generated reports. There are some samples and basic documentation but not a lot of extended examples. In particular I tried to find an example of how others were using the code to develop locally as well as use it in production yet keep the data separated.

I had first considered two storage accounts with separate connection strings like we do for database development but that seemed overly complicated. In the end, I created two root subfolders (one 'dev' and one 'prod') inside the Azure file share. During startup the code sets the root folder based on the environment and adds a scoped reference to DI.


services.AddScoped(client => {
	var shareName = WebEnvironment.IsDevelopment() ? "dev" : "prod";
	var connectionString = Configuration.GetConnectionString("StorageConnection");
	return new ShareClient(connectionString, shareName);
});

Then any code that needs to interact with the storage system will get it injected already configured to the right subfolder.


public async Task OnGetAsync(int id, [FromServices] ShareClient shareClient) {
	...
	var dirClient = shareClient.GetDirectoryClient("reports");
	await dirClient.CreateIfNotExistsAsync();
	...
}

Thursday, August 6, 2020

Displaying Git Commit and Build Info in an ASP.Net website

Scott Hanselman wrote a really good blog post on March 6, 2020 called Adding a git commit hash and Azure DevOps Build Number and Build ID to an ASP.NET website. In the post he covered passing build information into an ASP.Net web app so that it could be displayed on a web page to confirm which version of your code was running in the cloud. If you work with multiple machines, deployment slots or even if you just forget when you last pushed your code up you know how important it can be to confirm which version is running.

His article was well written, contained lots of great information and steps to follow but could it be improved. As I set out to implement it I found a couple of ways it could be.

My build process on Azure DevOps was the classic (legacy) UI build pipeline using MSBuild. I've wanted to update it to use dotnet for awhile so I figured this was a good reason to do so. I created a new Pipeline using the ASP.NET template. That created a new yaml file but it was using MSBuild. After some googling I was able to replace that with a simple dotnet driven pipeline.

The YAML pipeline shown below is about as basic as it gets (sidenote I had never used YAML before but it's pretty straighforward once you start working with it). Thi pipeline is triggered when a new commit happens on 'master'. It gets a windows vm and sets some variables and then starts the build. 'dotnet build' also does a NuGet restore first but you can break that out separate if you have packages from different sources. It then runs 'dotnet test' which runs through all of the unit tests. Then 'dotnet publish' to zip up the results and finally publishes the build artifact (the website zip). This artifact can be used in a Release deploy.


trigger:
- master

pool:
  vmImage: 'windows-latest'

variables:
  solution: '**/*.sln'
  buildPlatform: 'Any CPU'
  buildConfiguration: 'Release'

steps:
- task: DotNetCoreCLI@2
  displayName: 'dotnet build $(buildConfiguration)'
  inputs:
    command: 'build'
    arguments: '--configuration $(buildConfiguration)'

# Run Unit Tests
- task: DotNetCoreCLI@2
  displayName: 'dotnet test'
  inputs:
    command: 'test'
    projects: '**/*Tests/*.csproj'
	
# Prepare output to send to the Release pipeline
- task: DotNetCoreCLI@2
  displayName: 'dotnet publish'
  inputs:
    command: publish
    publishWebProjects: True
    arguments: '--configuration $(BuildConfiguration) --output $(Build.ArtifactStagingDirectory)'
    zipAfterPublish: True

# Take all the files in $(Build.ArtifactStagingDirectory) and upload them as an artifact of the build.
- task: PublishBuildArtifacts@1
  inputs:
    PathtoPublish: '$(Build.ArtifactStagingDirectory)'
    ArtifactName: 'drop'

Once that was all working I started to add the pieces Scott talks about in his article. I added '/p:SourceRevisionId=$(Build.SourceVersion)' to the build command to pass in the git commit hash as an assembly attribute. Using the code he provided I was able to read this value back out and display it on a webpage. Unfortunately this is the only variable that works this way. For the build number and id you can pass them in but you have to create custom attributes for each one along with specialized code to read them out. Scott doesn't include the code to read them back out instead preferring to create a file containing each of these values.

As I was working on implementing his code to output the build number and id into a file it occurred to me that it would probably be simpler to place all of the values in this buildinfo file. If I also format it as JSON it would super easy to read in these values in my application code. So starting with Scott's code I made the following changes.

In the build YAML I added the following task. It uses the echo command to create a minimal JSON file with the build number, build id and commit hash. I also wanted to include the build date as a separate field but after much searching (and build runs) was unable to figure out how to accomplish that.


- script: 'echo {"buildNumber":"$(Build.BuildNumber)","buildId":"$(Build.BuildId)","sourceVersion":"$(Build.SourceVersion)"} > .buildinfo.json'
  displayName: "Emit build info"
  workingDirectory: '$(Build.SourcesDirectory)/Neptune.Web'
  failOnStderr: true

I created the following small class to match the buildinfo JSON


    public class BuildInformation {
        public string BuildNumber { get; set; }
        public string BuildId { get; set; }
        public string SourceVersion { get; set; }
    }

I then simplified his 'AppVersionInfo' class into the following. It reads in the JSON on creation


    public class ApplicationInfo {

        private const string BuildFileName = ".buildinfo.json";
        private BuildInformation BuildInfo { get; set; }

        public ApplicationInfo(IHostEnvironment hostEnvironment) {
            var buildFilePath = Path.Combine(hostEnvironment.ContentRootPath, BuildFileName);
            if (File.Exists(buildFilePath)) {
                var fileContents = File.ReadAllText(buildFilePath);
                BuildInfo = JsonConvert.DeserializeObject<BuildInformation>(fileContents);
            }
        }

        /// <summary>
        /// Return the Build Id
        /// </summary>
        public string BuildId {
            get {
                return BuildInfo == null ? "123" : BuildInfo.BuildId;
            }
        }

        /// <summary>
        /// Return the Build Number
        /// </summary>
        public string BuildNumber {
            get {
                return BuildInfo == null ? DateTime.UtcNow.ToString("yyyyMMdd") + ".0" : BuildInfo.BuildNumber;
            }
        }

        /// <summary>
        /// Return the git hash of the commit that triggered the build
        /// </summary>
        public string GitHash {
            get {
                return BuildInfo == null ? "" : BuildInfo.SourceVersion;
            }
        }

        /// <summary>
        /// Return a short version (6 chars) of the git hash (or local)
        /// </summary>
        public string ShortGitHash {
            get {
                return GitHash.Length >= 6 ? GitHash.Substring(0, 6) : "local";
            }
        }
    }

As Scott does you add this class to Services in Startup.cs: 'services.AddSingleton();'

Then rather than displaying it in the footer I added it to an admin sys info page in a table. Note: the ApplicationInfo class is injected into the page.


@page
@inject ApplicationInfo appInfo

<h1>System Info</h1>
<table class="table">
    <tr>
        <td>Commit:</td>
        <td><a href="https://vs.com/commit/@appInfo.GitHash" target="_blank"><i class="fal fa-code-branch"></i> @appInfo.ShortGitHash</a></td>
    </tr>
    <tr>
        <td>Build:</td>
        <td><a href="https://vs.com/_build/results?buildId=@appInfo.BuildId&view=results" target="_blank"><i class="fab fa-simplybuilt"></i> @appInfo.BuildNumber</a></td>
    </tr>
    <tr>
        <td>Powered by:</td>
        <td>@System.Runtime.InteropServices.RuntimeInformation.FrameworkDescription</td>
    </tr>
</table>

Friday, December 6, 2019

Enumerating a Collection with an Index

There is often the case where you want to loop over a collection of objects along with an indicator of where you are in the loop. While some languages have a built-in syntax for expressing this C# does not (yet). Using a bit of LINQ you can easily accomplish this.

Normally starting with a collection of things you might use a for loop to iterate over list and provide your index.

For example:


 var items = new List<string>();

 for (int i = 0; i < items.Count; i++) {
  var item = items[i];
  Console.WriteLine($"Item {i}: {item}");
 }

You can also use a foreach and use an temp variable to track the index.


 int i = 0;
 foreach (var item in items) {
  Console.WriteLine($"Item {i}: {item}");
  i++;
 }

Recently I discovered this gem. The following code uses a Linq Select statement to transform the list into tuples containing the index and the object as named properties. Not quite as nice as if this was built-in to the language but still nice to be able to reduce the code down.


 foreach (var (item, index) in items.Select((v, i) => (v, i))) {
  Console.WriteLine($"Item {index}: {item}");
 }

You can also add an extension method to encapsulate the Select and make it a little cleaner.


  /// <summary>
  /// Return the list as an enumerable of tuples with the item and it's index in the list. 
  /// </summary>
  /// <typeparam name="T">The typeof items in the list</typeparam>
  /// <param name="list">List to converted</param>
  /// <returns>Enumerable of tuples with the item and it's index</returns>
  public static IEnumerable<(T, int)> WithIndex<T>(this IEnumerable<T> list) => 
    list.Select((value, index) => (value, index));

And then use it like this:


 foreach (var (item, index) in items.WithIndex()) {
  Console.WriteLine($"Item {index}: {item}");
 }