Intro
As software developers, we frequently encounter scenarios where reading from or writing to streams is an essential part of our daily tasks. Whether it’s processing data from files, network communication, or manipulating memory buffers, working with streams lies at the heart of many programming tasks. Mastering the art of handling file streams efficiently is not just a skill but a necessity.
This article is not about covering all stream – related scenarios, instead the point of focus will how to use file streams in the following scenario:
- we have our own blob service
- we have multiple layers above this blob service that will have to forward the stream
Hypothetical Setup
Services
Imaging the following scenario:
- We have a Gateway
- The Gateway forwards the request to Service A
- ServiceA forwards the request to ServiceBlobs
ServiceBlobs now can leverage any medium storage, in my example I have used MongoGridFs to store files.
Requirements
- the user can send large files
- the memory footprint must be small
- files should be verified.
About Streams
What are streams?
.NET streams are a fundamental abstraction for handling input and output operations, providing a uniform way to read from and write to various data sources and destinations.
In .NET, streams represent a sequence of bytes and are commonly used for tasks such as reading from files, network communication, memory operations, and more.
One specific stream in .NET is the HttpRequestStream
, which is used in the context of HTTP requests. When you make an HTTP request in .NET, such as with the HttpClient
class, the request body (if any) can be accessed through the HttpRequestStream
. This stream allows you to write data that you want to send in the body of the HTTP request.
Api Design
When it comes to transmitting files over the web, developers often face the decision between sending the file as binary content in the request body or using multipart form data. Each approach has its own advantages and considerations, depending on factors such as file size, network constraints, and server capabilities.
Sending File as Binary Content:
The file is sent directly as binary content in the body of the HTTP request. This method is straightforward and efficient. The file is typically encoded as raw binary data or base64-encoded to ensure compatibility with HTTP protocols.
Using Multipart Form Data:
It involves packaging the file along with any additional form fields into a multipart message, allowing for structured data transmission. This approach is commonly used when submitting forms with file uploads in HTML web applications.
Considerations:
While both approaches have their merits, using multipart form data for large files can present challenges. Here’s why:
- Memory Consumption: Multipart form data requires the entire file to be loaded into memory or temporary storage before transmission. For large files, this can lead to significant memory consumption and potential performance issues, especially on servers with limited resources.
- Network Overhead: Multipart form data adds overhead to the HTTP request due to the additional headers and formatting required for multipart messages. This can result in larger request sizes and increased network traffic, impacting performance and latency, particularly in low-bandwidth environments.
- Server Processing: Handling multipart form data on the server side often involves parsing and extracting the file content from the multipart message. For large files, this process can be resource-intensive and may strain server resources, leading to slower request processing times and potential scalability issues.
In summary, while multipart form data is suitable for smaller file uploads and scenarios where structured form data is required, it may not be the optimal choice for transmitting large files over the web. For large files, sending the file as binary content in the request body offers a more efficient and lightweight alternative, minimizing memory consumption, network overhead, and server processing requirements.
Analysis
[Gateway] Api Endpoint
The choice of sending the file as part of Multi-Part Form Data or in the request body was clear, to avoid large memory consumption and to enable simple stream buffering the latter was used.
This choice is efficient but it also has its challenges:
- the HttpRequestBody is an un-seekable stream by default for efficiency reasons, and I leveraged this behavior by simply passing the stream forward.
- This meant that data was not piling in memory, content was buffered and overall the endpoint was lightweight
[Gateway -> Service A] ServiceClient
The client had to also be lightweight, and this meant that we should no longer try to read the stream, to extract any form of information, that was simply inefficient. We needed to simply pass the input stream in the HTTP request and that’s that.
HttpRequestMessage request = new (...); request.Content = new StreamContent(httpContext.Request.Body)
[Gateway -> Service A] ServiceClient
I determined that to have efficient stream forwarding I needed to avoid accessing/reading it if possible. File content validation was pushed at a lower more specialized level and for now, I was happy to forward this stream similar to what was performed in the gateway.
[Blobs Service]
The blob service uploaded the content to mongo using the GridFS functionality. Almost similar to other services, I wanted this to be as fast as possible. Which meant that, even though this was a specialized service and some form of content validation was to be executed, I didn’t want to read the entire content in memory.
- First I wanted to make the request stream seekable
HttpContext.Request.EnableBuffering();
This meant that now our code can do this:
using MemoryStream memoryStream = new MemoryStream(); await data.Stream.CopyToAsync(memoryStream, 1024); data.Stream.Seek(0, SeekOrigin.Begin); await _blobContentStorage.WriteAsync(storeId, new BlobWriteStream { FileName = fileName, Content = data.Stream, });
Reading the first couple of bytes was important to me because I wanted to check the signature of the files, but at this stage, any form of content validation can be performed. Just bear in mind the performance of your API when you are designing this part.
There is a lot of information over the internet that provides signature byte content for each file type, I leveraged this information to check each file to validate that the content matches the file name extension provided.
Example of jpeg signature
new byte[] { 0xFF, 0xD8, 0xFF, 0xE0 }, new byte[] { 0xFF, 0xD8, 0xFF, 0xE1 }, new byte[] { 0xFF, 0xD8, 0xFF, 0xE8 }, new byte[] { 0xFF, 0xD8, 0xFF, 0xE2 }, new byte[] { 0xFF, 0xD8, 0xFF, 0xE3 }