• About

Brian Richardson's Blog

  • Putting it all together – From Monolith to Cloud-Native

    September 9th, 2022

    There’s an underlying theme to the posts I’ve been making lately, namely that I am beginning the process of migrating an existing application to an Event-Driven Architecture (EDA). This has posed some interesting technical challenges, and the overall process can be used to break down an existing monolithic application into component microservices based on Domain-Driven Design. Let’s start at the beginning at look at the challenges that will be faced, and the required work.

    1. The Domain Model

    As with any DDD project, you’ll begin with the domain model for a given domain. Follow DDD best practices to build this domain model.

    2. Data Migration

    There’s a fundamental difference between the static state stored by a traditional application in a relational database, and the aggregate state stored by an event-sourced system in (e.g.) Event Store DB. Storing the static state is like storing a document – you only get the latest version of the document. However, if you turn on “Track Changes”, you get a different document, one with a history of revisions. This document with the revision history is what is stored in the Event Store DB.

    This is challenge #1: how do we convert the static state of an existing persisted entity into the required aggregate state of a new event stream?

    The migration is doubly difficult, because the legacy application uses Entity Framework 6.0 as its persistence layer. The new application will run on .NET 6, using EF Core 6. So, we can’t just copy the entity model verbatim: any fluent customizations will have to be rewritten for EF Core 6. The good news is that existing attributes are used by EF Core 6 as well. However, copying the model verbatim is a good start, and tweaking it as we go along.

    With the copy of the EF6 model, we can rewrite any fluent customizations that are necessary (e.g. many-many relationships). This is also a good time to replace unnecessary fluent customizations with the corresponding attributes. Test your new .NET 6 EF Core model and ensure you can iterate through data.

    Now that we can read in our static entities, the challenge is met by essentially reverse engineering the current state of the entity into its component methods in the domain model. i.e. We load the entity into memory, and construct a domain object using the methods we defined in DDD that has the same “value” as the entity. See the example below:

    var entities = db.Entities.ToList();
    foreach (var entity in entities)
    {
        var aggregate = new Aggregate();
        aggregate.SetProperty1(entity.Property1);
        foreach (var item in entity.Collection1)
            aggregate.AddToCollection1(item);
    }
    // aggregate's current state now equals entity
    // write the aggregate to the aggregate store

    Note that we use aggregate.SetProperty1() instead of aggregate.Property1 = . This is the important part of the migration – we must use the methods defined in the DDD to achieve the desired state. When we do so, we will have created our aggregate suitable for storing in Event Store DB. Repeat the process for all aggregates identified in DDD. You have now migrated your data to an event store.

    Important: You will probably not use the same ID values in the new system. You should create a property in your legacy entity to store the ID value from the new system. This will become necessary later on when we must build a “bridge” back to the existing application.

    Of course, the event store is not suitable for querying data. For that, we need to project the event store data into a form suitable for reading.

    3. Projections

    Projections are code that is called in response to an incoming event. Any service can subscribe to the stream of events being written to the event store. The service will then choose which events to respond to, usually by writing data to another database that can be used for queries. Good choices here are SQL Server and CosmosDB. This is one of the big performance gains we get by separating the “read side” and “write side” of the database: instead of doing join queries on a relational database in SQL Server, we can write materialized query results in JSON documents to CosmosDB instead. While not as space-efficient, the performance gains in read speed far exceed the additional space required.

    4. The “Bridge”

    Challenge #2: How to incrementally release a monolithic application?

    We do not want to wait for the entire new application to be written before we start using the new system. Therefore, we need a method of being able to run both applications in parallel and migrate users to the new application as new features become available for their team. This requires that both applications work on the same data. Of course, at this point, the data is in two separate databases! However, this is not an insurmountable difficulty: we can write a projection to deal with writing back to SQL Server. Since we already created our entity model for .NET 6 when we migrated the data to the event store, it is actually a rather simple task to write a projection that writes to SQL Server instead of CosmosDB. And, you can simply copy the code that writes to SQL Server from your old application, since it uses the exact same entity model (albeit for different platforms).

    These projections back to SQL Server are the key. With these projections in place, it is a simple matter of fetching an entity by v2 ID and performing the desired updates as per the incoming event.

    The interesting thing is that this bridge need not be temporary. While it is certainly possible to remove the bridge once the new application is complete, there are cases when it may be desirable to keep some or all of the bridge to facilitate external processes that may prefer SQL Server to CosmosDB.

    5. Microservices and Kubernetes

    The goal is to run the microservices on Kubernetes (or some other as-yet unwritten container orchestrator). This requires that the application run on Linux. This is what necessitates the upgrade to .NET 6. We expect that the infrastructure savings over time using an orchestrator should result in a reduced core count overall (Microsoft reports 50% reduction in core count when moving AAD gateway from .NET to .NET Core) and more efficient use of resources.

    6. Conclusions

    This is, in a nutshell, the process used to break down the monolithic legacy application based on Entity Framework. .NET 6 is a stable, multi-platform framework target that will allow you to containerize your .NET applications. The upgrade to .NET 6 should take advantage of all the platform and language features available, and use modern design techniques to build a lasting architecture that minimizes maintenance and maximizes readability and understanding to the reader.

    Advertisement
  • The Importance of Naming Conventions

    August 31st, 2022

    I recently migrated an Entity Framework model from EF6 to EF Core/.NET 6. This was made considerably more difficult by the fact that the original developers had not taken advantage of Entity Framework’s conventions. Not only that, but a few small conventions changed between the two versions. And it would have been so easy had properties been named correctly.

    It got me to thinking about just how important naming conventions have become. Automation such as Infrastructure-as-Code, code generation tools, and your own automation based on reflection: all of these are based on having predictable names. But simply being predictable is not enough. It is also necessary that you be able to reconstruct the name the same way every time. And finally, you should probably be able to type it, or fit more than one identifier on a line of code. This was the main flaw of the model I was working with: the names were so descriptive that following a naming convention led to excessively long names, so abbreviations were made that broke the convention.

    It seems that a middle ground needs to be taken. I grew up on C, where identifiers were typically one- and two-letter abbreviations that probably only meant anything in the mind of the author. Calling C code a “technical specification” is probably pretty generous given the lack of descriptive nouns and verbs within the spec. A webcomic I saw a long time ago, and alas cannot find now, makes the point that well-written code _is_ a technical specification. The characters were discussing how great it would be if it were possible to create a specification detailed enough that a computer could write a program. And yet, that is what software developers do every day – write higher-level constructs that specify the software to the point that a compiler can create executable machine language. That sounds like a specification to me.

    So let’s treat it like one. Instead of the glory days of C where programmers competed with each other to create the most obfuscated code possible, let’s use the code also as a document that can clearly trace back to business requirements written in business language. Optimizing for readability should be a thing! Or at very least demanding that developers create readable specs that can be easily reviewed by an architect or other technically-capable businessperson.

    That’s where the Ubiquitous Language of Domain-Driven Design comes in. If the business calls something by a name, we should also use that name in our specification. It will allows us to clearly trace our code back to our business requirements. It will allows us to desk check business rules without wondering what certain variables refer to. Indeed, a modern language like C# is fluid enough that you can express thoughts in very readable fashion. A well-organized, well-named code base is easy to work with. You can use your IDE’s code completion features much more easily if you don’t have to guess multiple names.

    Other naming conventions that are important are the Entity Framework names I mentioned above. For example, a property named Id is always going to be considered the primary key for an entity. Additionally, a property named <Entity>Id is also considered the primary key if Id doesn’t exist. Foreign keys can be inferred in much the same way. While annotations are provided in EF Core to allow you to override the default behavior, it’s so much easier if you work _with_ the conventions rather than _against_ them.

    Terraform will get you thinking about names too, in terms of input variables. All of our names are going to look programmatic, because they are concatenations of input variables. Still, we want to be careful that names don’t get too long, since our cloud provider is going to have limits. So, what many have adopted is a variant of Hungarian notation (made popular during those glory days of C/C++!) where common abbreviations are adopted by those working with the technology. For example, a resource group will always lead to a -rg suffix. A container repository -acr and Kubernetes cluster -aks.

    So, I guess I’d make the following recommendations about naming in your own programs:

    1. Descriptive is good. Long is bad. Find a balance between the two. Hungarian notation can really help with this.
    2. It’s still ok to use one- and two-letter abbreviations for temporary/local variables
    3. Think of names in terms of input variables and how you can combine them programmatically as Terraform does.
    4. Use Ubiquitous Language from business requirements when available
    5. Take the time to do Domain-Driven Design when not available

    Remember that you are not just writing software for an end-user. You are also writing documentation of business processes and rules for an organization. When you win the lottery, the organization still needs to be able to understand what has been written and what the current state of the business is. So, it is important that some conventions be put in place and enforced via peer review.

  • Review – Steam Deck

    August 24th, 2022

    It was a long wait, but worth it. I reserved my Steam Deck something like six months ago, and finally got the notification to complete the purchase at the beginning of the month. It is a rather nice piece of hardware. Both graphics and CPU are sufficient to play plenty of modern games at 1280×800. The battery life is pretty good for a portable device with 3d graphics and decent CPU. I was going through about 10% battery per 30 minutes, giving a total battery life of about 5 hours per charge. Charging is done via a USB-C connection.

    That is the first point to make: it is not a 1080p display. This may be a bit of a surprise to people who are used to gaming on their phone at 1080p, but the graphics do not disappoint. Some games are too difficult to play at this resolution (most of my strategy games require too much screen real estate to play well at such a low resolution), but there are plenty of good ones to keep me going. Stellaris, for example, is surprisingly playable at lower resolutions, and the community control map works very well.

    Funnily enough, I ditched the Steam Controller I bought when I first picked up a Steam Link. I found the mouse controls too finicky to use well. However, I am back to the Steam Controller, because that’s what the Deck uses. I’ve gotten reasonably proficient with the mouse, and I don’t find that it’s particularly prohibitive in deciding what to play.

    A nice touch on the Deck is can boot into Desktop mode, giving you a KDE shell on the underlying Linux OS. Most people won’t use this, but there’s plenty who will find this appealing.

    I guess the final point to make is that because it is a Linux-based device, it won’t run all your games. Windows games are run using Proton, a collection of Windows libraries ported to Linux. The Steam Store now has Steam Deck compatibility notes for all titles (though not all have been tested yet, and have a compatibility rating of “Unknown”). Many titles are “verified”, meaning that they are ideal games to play on the Deck in terms of control scheme and hardware requirements, as well as considering the lower resolution. Most titles will not be “verified”, but upon reading the notes, you can decide to install it anyway.

    Common concerns include: small text, requirement to invoke on-screen keyboard, no official control scheme. Most of these concerns are trivial: there’s a magnifier available, the on-screen keyboard is easily invoked, and there are many good community control schemes. It’s worth going through your library to see what works.

    Overall, I was skeptical of the lower resolution, but I have found many titles that run well under this constraint. I had some cooling issues during a recent heat wave, but since the heat wave has subsided, I’ve had no further issues. The Deck is maybe a little larger than I would have expected, but it’s still a good size for a handheld device. It’s very usable, and it’s seen some use since I’ve gotten it. I’d definitely recommend it at the current price; it’s hard to see how it could be much cheaper and still make Valve any money.

  • Initial Thoughts on .NET MAUI

    August 18th, 2022

    I had to run into it sometime. .NET MAUI has generated a bit of buzz in the community. For React people, it’s nothing new, I know. But the Microsoft answer to React Native is really quite nice. It allows use of both XAML and Blazor (!) pages, and produces an application that runs on Windows, MacOS, Android and iOS. This puts mobile and utility development in a whole new light. Now we can:

    • Create utility programs as easily as writing a Blazor WASM app. MAUI allows me to write a desktop app as a hosted SPA within the application itself. This is so much nicer than a console program for no extra effort.
    • Create native applications that runs on tablets, smartphones and desktop PCs
    • Learn one framework for web, desktop and mobile

    A new, practical use for Blazor WASM! You can use the exact same controls you use in your web application inside a desktop app as well. And, since I’m on the subject, I’ll also add a plug for Radzen Blazor, a wonderful free UI library for use with Blazor WASM. These controls are beautiful and easy to use, and are completely free!

    If you haven’t tried out .NET MAUI yet, try writing your next utility using it. I think you’ll never go back to console again. I am hoping that Linux support will be added as well at some point in the future, though I can see why it might be more of a challenge than Windows or MacOS.

  • Full-Text Searching w/CosmosDB (cont…)

    July 26th, 2022

    It turns out that full-text searching requires that you enable the “Accept connections from within public Azure datacenters” option in the CosmosDB networking blade. The Cognitive Search service is not hosted in the VNET (although you can enable the private endpoint for security purposes – it doesn’t use this as its outgoing network). This presents a slight security risk that may not be tolerable for sensitive data. Now, the ability to find the exact CosmosDB you are looking for is if you were so inclined is practically non-existent. Trying to brute-force multiple CosmosDB services is likely to set off some alarms in the datacenter, and still won’t get you in (the keys are really quite difficult to break).

    So, practically speaking, I don’t feel like this represents any significant risk in terms of organizational data. But I hate checking off boxes that allow more unsolicited traffic.

  • Distributed Tracing with OpenTelemetry

    July 23rd, 2022

    I’ve spent a fair bit of time lately working on observability. To date, I’ve been using Application Insights with a fair bit of success, though it takes some work to make the traces correlate properly and give you related spans correctly. I started looking into OpenTelemetry, and found that it is a suitable replacement for Application Insights, and seems to be a little easier to use. However, as it is currently only just releasing version 1.0, it’s a little difficult to piece together things that don’t appear in the demo.

    Most notably, I’m using MassTransit/RabbitMQ for messaging, and the messaging example uses the RabbitMQ.Client libraries instead. I’ve also put all the pieces in one place with commentary to try and make things a bit more convenient. First, you define your ActivitySources in a central location so they can be shared between components:

    public static class Telemetry
    {
        public static readonly ActivitySource Service1Source = new ActivitySource("Service1");
        // ...
    }
    

    Now, you can configure OpenTelemetry in the Web API producer:

        protected async Task<IActionResult> HandleCommand<TCommand>(
            TCommand command,
            Action<TCommand>? commandModifier = null)
            where TCommand : class
        {
            using var activity = Telemetry.ApiActivitySource
                .StartActivity($"{typeof(T).Name}.{typeof(TCommand).Name}", ActivityKind.Producer);
            try
            {
                commandModifier?.Invoke(command);
                await _bus.Request<TCommand, CommandResponse>(command, callback: c =>
                {
                    var contextToInject = default(ActivityContext);
                    if (activity != null)
                        contextToInject = activity.Context;
                    else if (Activity.Current != null)
                        contextToInject = Activity.Current.Context;
                    Propagator.Inject(new PropagationContext(contextToInject, Baggage.Current), c, InjectContext);
                });
                activity?.SetStatus(ActivityStatusCode.Ok);
            }
            catch (Exception ex)
            {
                activity?.SetTag("exception", ex.StackTrace);
                activity?.SetStatus(ActivityStatusCode.Error, ex.Message);
            }
            finally
            {
                activity?.Stop();
            }
            return Ok();
        }
    
    

    Here is a generic web API controller that handles commands where the aggregate type and command type are concatenated to form the command name. The key part here is the Propagator.Inject() call, so let’s look at the method it uses to inject the ActivityContext into the RabbitMQ headers:

    private void InjectContext<TCommand>(SendContext<TCommand> context, string key, string value) where TCommand : class
        {
            context.Headers.Set(key, value);
        }
    

    It’s only a one-liner, and the Propagator does all the work for us by calling InjectContext with the key and value arguments it needs. The corresponding ExtractContext consumer looks like the following:

        private IEnumerable<string> ExtractContext<TCommand>(ConsumeContext<TCommand> context, string key) where TCommand : class
        {
            if (context.Headers.TryGetHeader(key, out var value))
            {
                return new[] { value?.ToString() ?? string.Empty };
            }
            return Enumerable.Empty<string>();
        }
    

    This method needs to return an IEnumerable<string> with all values for the given key. In our case, there will only ever be one value, but the signature is set for us by the Propagator. The service itself uses the Propagator as follows:

    public class ConsumerBase<TAggregate, TCommand> : IConsumer<TCommand>
        where TAggregate : AggregateRoot<TAggregate>
        where TCommand : class
    {
        private static readonly TextMapPropagator Propagator = Propagators.DefaultTextMapPropagator;
    
        protected readonly ApplicationService<TAggregate> Service;
        protected readonly ActivitySource ActivitySource;
    
        public ConsumerBase(ApplicationService<TAggregate> service, ActivitySource source)
        {
            Service = service;
            ActivitySource = source;
        }
    
        public virtual async Task Consume(ConsumeContext<TCommand> context)
        {
            var parentContext = Propagator.Extract(default, context, ExtractContext);
            using var activity =
                ActivitySource.StartActivity($"{typeof(TAggregate).Name}.{typeof(TCommand).Name}", ActivityKind.Consumer, parentContext.ActivityContext);
            try
            {
                await Service.Handle(context.Message);
                await context.RespondAsync(new CommandResponse { Success = true });
                activity?.SetStatus(Status.Ok);
            }
            catch (Exception ex)
            {
                await context.RespondAsync(new CommandResponse
                    { Error = ex.Message, StackTrace = ex.StackTrace, Success = false });
                activity?.SetTag("exception", ex.StackTrace);
                activity?.SetStatus(ActivityStatusCode.Error, ex.Message);
            }
        }
    
        private IEnumerable<string> ExtractContext<TCommand>(ConsumeContext<TCommand> context, string key) where TCommand : class
        {
            if (context.Headers.TryGetHeader(key, out var value))
            {
                return new[] { value?.ToString() ?? string.Empty };
            }
            return Enumerable.Empty<string>();
        }
    }
    

    The complete consumer base class is shown above.

    Finally, in each process that requires OpenTelemetry, the tracer must be initialized:

            services.AddOpenTelemetryTracing(ot => ot
                .AddSource("MyNamespace.MyService")
                .SetResourceBuilder(ResourceBuilder.CreateDefault().AddService("MyNamespace.MyService", serviceVersion: "1.0.0"))
                .AddMassTransitInstrumentation()
                .AddGrpcClientInstrumentation()
                .AddOtlpExporter(otlp =>
                {
                    otlp.Endpoint = new Uri(context.Configuration["Telemetry:OpenTelemetry"]);
                    otlp.Protocol = OtlpExportProtocol.Grpc;
                }));
    
    

    That should be it! OpenTelemetry is a very promising replacement for proprietary tracing protocols such as Application Insights and New Relic. Indeed, Application Insights now supports the use of OpenTelemetry instead of its proprietary protocol. OpenTelemetry allows for a choice of UIs for examining the telemetry. I am currently using Jaeger (jaegertracing.io), but have also looked at SigNoz (signoz.io). Either of these are very capable UIs, and OpenTelemetry seems very flexible and easy to use (once you figure out how!)

  • Bearer token too long?

    July 22nd, 2022

    I recently switched to Auth0 as an OAuth2 provider, and was a little surprised to find how little data was stored in the bearer token. I’d previously shoved pretty much everything in there: profile and roles as well. Some of you may be snickering since you already know better. Access tokens should be short. But we still need all the user claims. This is available on the Auth0 server as /userinfo. The access token provided will also have access to the /userinfo endpoint, which contains the profile and claims that we are looking for. If you do the obvious thing, and load the claims and profile on demand when receiving the access token, you’ll find out something else about Auth0 – they rate limit the userinfo endpoint.

    I put the following singleton into all of my web APIs and microservices:

    public class ClaimsHolder
    {
        private readonly ConcurrentDictionary<string, ConcurrentDictionary<string, object>> _claims = new();
    
        public void AddClaim(string userid, string name, object value)
        {
            _claims.AddOrUpdate(userid,
                k =>
                {
                    var v = new ConcurrentDictionary<string, object>();
                    v[name] = value;
                    return v;
                },
                (k, v) =>
                {
                    v[name] = value;
                    return v;
                }); 
        }
    
        public IList<Claim> this[string userid]
        {
            get
            {
                return _claims.GetOrAdd(userid, new ConcurrentDictionary<string, object>())
                    .Select(c => new Claim(c.Key, c.Value.ToString() ?? throw new Exception("null claim??")))
                    .ToList();
            }
            set
            {
    
                _claims.AddOrUpdate(userid, 
                    k => new ConcurrentDictionary<string, object>(),
                    (k, v) =>
                    {
                        foreach (var claim in value)
                            v.GetOrAdd(claim.Type, claim.Value);
                        return v;
                    });
                foreach (var claim in value)
                {
                    _claims[userid].AddOrUpdate(claim.Type, 
                        k =>
                        {
                            var v = new ConcurrentDictionary<string, object>();
                            v[claim.Type] = claim.Value;
                            return v;
                        },
                        (k, v) => claim.Value);
                }
            }
        }
    }
    

    A web API can use this claims holder as follows. First, add JWT bearer authentication to your Program.cs similar to the following:

    builder.Services.AddAuthentication(JwtBearerDefaults.AuthenticationScheme)
        .AddJwtBearer(JwtBearerDefaults.AuthenticationScheme, c =>
        {
            c.Authority = $"https://{auth0Domain}";
            c.TokenValidationParameters = new()
            {
                ValidAudience = auth0Audience,
                ValidIssuer = $"https://{auth0Domain}"
            };
            c.Events = new()
            {
                OnTokenValidated = async context =>
                {
                    if (context.SecurityToken is not JwtSecurityToken accessToken) return;
                    token.Value = accessToken.RawData;
                    if (context.Principal?.Identity is ClaimsIdentity identity)
                    {
                        var userid = identity.Claims.Single(c => c.Type == ClaimTypes.NameIdentifier).Value;
                        var claims = claimsHolder[userid ?? throw new Exception("null user!")];
                        if (!claims.Any())
                        {
                            var httpClient = new HttpClient();
                            httpClient.DefaultRequestHeaders.Add("Authorization", $"Bearer {accessToken.RawData}");
                            claims = (await
                                    httpClient.GetFromJsonAsync<Dictionary<string, object>>(
                                        $"https://{auth0Domain}/userinfo")
                                )?.Select(x =>
                                    new Claim(x.Key, x.Value?.ToString() ?? throw new Exception("null claim??"))).ToList();
                            if (claims != null)
                                foreach (var claim in claims.ToList())
                                    claimsHolder.AddClaim(userid, claim.Type, claim.Value);
                        }
    
                        identity.AddClaims(claims ?? Enumerable.Empty<Claim>().ToList());
                        identity.AddClaim(new Claim("access_token", accessToken.RawData));
                    }
                }
            };
        });
    
    

    What we end up with, then, is a wrapper around a ConcurrentDictionary that holds a per-user ConcurrentDictionary containing all of the claims. This provides a convenience singleton to minimize the amount of times the user info must be retrieved from Auth0. There is a problem with this code, however, in that the claims are not refreshed when the token is refreshed. We should allow the access token to work until the end of its lifetime using the cached claims, but once the token is refreshed we should refresh the cached claims. I don’t currently know how to do this, however. For many low-security scenarios, the code above would work as-is; the claims just don’t change that often. Still, this is a problem that must be solved eventually.

  • Data Architecture – Too much normalizing?

    July 13th, 2022

    I’m slowly breaking out of the mentality that all data must be normalized as much as possible. As I work through creating an event sourced system from an existing application, I find that a lot of data ends up in the same place even though it’s not the same data. Let’s look at a simple example, a country with provinces. In a relational database, you’d probably have a lookup table for each of these, and have a foreign key into the province table so we know which province belongs to which country. And then we’d have an address table that has a foreign key to the country and province table. Compare this to a JSON document that simply stores all the provinces as an array of objects:

    {
      "Country": {
        "Id": "CA",
        "Name": "Canada",
        "Provinces": [
          {
            "Id": "AB",
            "Name": "Alberta"
          },
          {
            // ...
          }
        ]
      }
    }
    

    The address, then, can’t even store any kind of reference id to the province; it must store some minimal amount of information about the province within the address document itself:

    {
      "StreetAddress": "123 Fake Street",
      "City": "Calgary",
      "Province": {
        "Abbreviation": "AB",
        "Name": "Alberta"
      },
      "PostalCode": "H0H 0H0"
    }
    

    So, really, what’s wrong with this? I guess the main criticism is that we don’t have a central location to update the name of the province. But that’s easy enough to deal with:

        public static async Task UpdateMultipleItems<T>(
            this IMongoCollection<T> collection,
            Expression<Func<T, bool>> query,
            Func<T, Task> update) where T : IId
        {
            var writes = new List<ReplaceOneModel<T>>();
            foreach (var item in collection.AsQueryable().Where(query))
            {
                await update(item);
                writes.Add(new ReplaceOneModel<T>(Builders<T>.Filter.Eq(e => e.Id, item.Id), item));
            }
    
            await collection.BulkWriteAsync(writes);
        }
    
    

    When we receive an event that a province name was updated, we simply update all documents that contain that province. If this is something we plan to do regularly, an index would certainly help.

    I think it’s important to stick with DDD principles here. If something isn’t an aggregate root, it shouldn’t have its own documents. The province, here, is not an aggregate root as it doesn’t even exist outside the context of its containing country. So, we’ll never see provinces as more than a property of some other aggregate root. Given the size of a province, it seems easiest just to store its value inline.

    Ok, so most small entities can be dealt with in this fashion. We do have _some_ need for normalizing, though. Consider the case where there is a relationship between two aggregate roots. In this case, we can simply store a reference id for this property, and use a lookup to get the associated document. But why not take some of what we’ve learned above? For example, instead of merely storing the id value, also store a name or description value as well. And you’re not limited to a single field. Perhaps there’s a LastUpdated field or similar that you’d want to retrieve without loading the entire linked document. Yes, you will have to use the same technique as above to update that field when it changes, but in a lot of cases, you won’t need anything more than a text identifier until a user actually triggers loading of the entire document.

    I believe this to be a sound approach. We are already working with eventually consistent databases, so a slight delay in updating subdocuments shouldn’t have a profound impact. I’ll need to work with much larger datasets before I have any basis for comparison, but there’s a benefits to working this way:

    • documents remain logically separated. we don’t link things together simply because one entity has the text we want to use. an address belonging to one type of entity is not necessarily the same thing as an address belonging to another type of entity, and indeed may have different business rules.
    • we gain speed at the expense of space. this is generally a tradeoff most people are willing to make these days.
    • the document for any given aggregate root is human-readable. It is not necessary to perform multiple lookups to obtain the necessary information.

    The flip side of this is that we do repeat ourselves, at least superficially. In the address example, there are two sets of POCOs that represent addresses. That is not itself an indication that it’s wrong, but you may need to further consider whether those addresses are, in fact, aggregate roots themselves. However, if they’re not, then I’d continue to argue that the values should be stored inline. I’ll look into the performance implications of this position and follow up. For now, though, it would seem that we are normalizing too much, and much clarity is to be gained by duplicating storage of similar information.

  • Full-text Searching with CosmosDB

    July 7th, 2022

    While I settled on CosmosDB as the final destination for the document database in my solution, I did early work on the application using the MongoDB docker container. I was happy with how easy it was to write a search method for MongoDB by defining some full-text search indexes and querying those indexes. It was an easy search, since MongoDB did all the hard work. Something like:

    public async Task Search<T>(string search)
    {
        var query = Builders<T>.Filter.Text(search);
        var results = await _collection.FindAsync(query);
        return results;
    }
    

    However, when creating those same full-text search indexes on CosmosDB, you will find that it is not supported. So, what’s the analogous solution in Azure?

    Azure does support the idea of full-text indexes, but at a larger scale. Azure Cognitive Search allows you to index a CosmosDB collection for full-text search, as well as filtering, sorting, suggestions and facets. The concept is much the same: a full-text index is defined in Cognitive Search, and it is applied to a specific data source. An indexer process is configured and triggered every time a change is projected to CosmosDB. This isn’t quite automatic, so let’s look at the process:

    Define the data source, index, and an indexer in Cognitive Search. The process of creating data sources, indexes and indexers is well-documented.

    Ensure that the defined data source includes the high watermark change detection. We are going to disable periodic indexing and use an on-demand approach instead, and need to ensure that the indexing is incremental.

    "dataChangeDetectionPolicy": {
            "@odata.type": "#Microsoft.Azure.Search.HighWaterMarkChangeDetectionPolicy",
            "highWaterMarkColumnName": "_ts"
        },
    

    I should note that the instructions given in the documentation use REST API calls from Postman or VS Code. This is not currently necessary, as the Azure Portal supports all the necessary interface elements for defining a data source, index, and indexer for use with CosmosDB MongoDB.

    Once the search components have been defined, it will now be possible to glue it all together:

    Run the indexer and verify that your documents appear as intended in the search index. Note that the search index doesn’t need to contain the whole document. Only the fields marked as retrievable will be transferred to the index. I was able to use the same read model to query the search index, with the understanding that not all fields would be available for use when retrieving search results.

    Verify that your searches work. The Cognitive Search resource has a search explorer where you can choose an index and run queries against it.

    Update your write activities on searchable collections to also run the indexer. The code below is from an Event Sourced system which has a distinct read and write side. A projection is a persistent subscription to the event store that duplicates changes into the document database. I’ve updated the Projection constructor to optionally allow passing a search endpoint and indexer to run.

        public Projection(
            IMongoClient mongo, 
            Projector projector, 
            SecretClient secrets, 
            string? searchEndpoint = null, 
            string? indexer = null)
        {
            _mongo = mongo;
            _projector = projector;
            _secrets = secrets;
            _searchEndpoint = searchEndpoint;
            _indexer = indexer;
        }
    
    

    Then, after completing the projection to the document database:

            if (!string.IsNullOrWhiteSpace(_searchEndpoint))
            {
                // run the indexer if it's been provided
                var key = await _secrets.GetSecretAsync("CognitiveSearch-ApiKey");
                var indexClient = new SearchIndexerClient(new Uri(_searchEndpoint), 
                    new AzureKeyCredential(key.Value.Value));
                await indexClient.RunIndexerAsync(_indexer);
            }
    
    

    That’s the read side taken care of. Every projection to the read side will result in the new or updated document being indexed and very quickly available to search. “Very quickly” in this case means that the human delay between triggering persistence of the document, and providing the search query is more than sufficient to allow the indexer to work. There is _some_ delay, but in practical terms, it is real-time.

    I’m not fond of having to pull the API key out of the secret vault every time, but RBAC access to the search endpoint is in public preview and not yet supported by the SDK. At some point we will presumably be able to provide an access token instead of the administrative key and reduce the permissions allowed by the application to the search service.

    Now we simply need to replace the MongoDB search code with something that queries the Cognitive Search index:

    [HttpGet("search")]
    public async Task<IReadOnlyList<ReadModels.MyModel>> Search([FromQuery] string search)
    {
        var endpoint = _configuration["Azure:Cognitive:SearchEndpoint"];
        var index = _configuration["Azure:Cognitive:SearchIndexName:MyModel"];
        var key = await _secrets.GetSecretAsync("CognitiveSearch-ApiKey");
        var credential = new AzureKeyCredential(key.Value.Value);
        var searchClient = new SearchClient(new Uri(endpoint), index, credential);
        var results = await searchClient.SearchAsync<ReadModels.MyModel>(search);
        var list = await results.Value.GetResultsAsync().Select(result => result.Document).ToListAsync();
        return list.AsReadOnly();
    }
    

    There is the end-to-end solution. Any time an aggregate is persisted to the event store, the resulting projection will also run the indexer and index the newly updated document. Because our search index schema is a subset of our read model, we can use the same model classes with the caveat that not all fields will be available from the results of a search.

    Some final notes:

    While the Azure Portal API for importing data supports a connection string using ApiKind=MongoDb, this is not enabled by default. It is necessary to join the public preview (which is linked under the Cognitive Search documentation referring to CosmosDB MongoDB API) for now to enable ApiKind in the connection string. Once it is enabled, however, you should be able to translate the instructions provided in the Cognitive Search documentation for use in the Azure Portal.

    This is obviously a more involved solution than having the full-text index stored directly in the database, but I think the benefits outweigh the costs. A search index is a relatively inexpensive thing (you get 15 of them for $100USD/mo) and provides full-text search of documents of arbitrary complexity. Depending on your search tier, you may also have unlimited numbers of documents in your index. This is an extremely scalable, customizable, and easy-to-use search that is useful in many applications. There are many features you will find useful in your own application, and the initial setup is really very easy.

  • Service Location on Kubernetes using System Environment

    July 5th, 2022

    I unexpectedly came across the following environment variables table in the AKS documentation.

    Apologies for this tangent, but I found this very difficult to troubleshoot, and I want to put it here in case it helps someone. I first noticed this when I was attempting to start my EventStore DB container, and kept getting this very strange error:

    Error while parsing options: The option ServicePortEventstore is not a known option. (Parameter ‘ServicePortEventstore’)

    This was incredibly confusing to me, since nowhere was I setting this option, as far as I could see. After examining the EventStore documentation, I noticed that EventStore was able to configure itself from the environment, by simply turning any environment variable prepended with EVENTSTORE_ into an option. So, for example, EVENTSTORE_HTTP_PORT would translate to the command line option –http-port.

    I concluded, then, that there must be an environment variable EVENTSTORE_SERVICE_PORT_EVENTSTORE being set somewhere. What a strange variable to be set! But, the documentation linked above says that a service named ‘eventstore’ would cause an environment variable EVENTSTORE_SERVICE_PORT to be created. Since there is no –service-port option, the existence of this environment variable causes EventStore to be unable to start up. The answer, then, is to not use the name “eventstore” in your deployments or services. I replaced the string eventstore with esdb, and the error went away.

    I’d note, though, that these variables could potentially be quite useful for service discovery. You can enumerate through all the environment variables to determine the service names defined in the cluster, and then use the same environment variables to find endpoint information. The example given in the documentation shows construction of a URL from these environment variables, but I’m sure your imagination can think of better 🙂

←Previous Page
1 2 3 4
Next Page→

Blog at WordPress.com.

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy
  • Follow Following
    • Brian Richardson's Blog
    • Already have a WordPress.com account? Log in now.
    • Brian Richardson's Blog
    • Edit Site
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar