Description
Performance Driven Design – This post is about defining performance criteria when handling large data operations and driving performance driven solutions.
Many performance issues really derive on how well your code is written, sometimes it doesn’t matter how much computing power you have (although it can help), time equation for solving your equation will much depend on how you write your algorithm.
Consider that you have a list of 10000000 objects. How would you go about into searching for a set of elements within this list?
I will show you one approach (may not be the best, but it serves it’s purpose).
Requirement
Design a new approach to reduce time needed to find an element in a list of many elements. The search is based on multiple properties of that element.
Problem
Problem : The linear search in the list takes too much time to process
Usefull in many situtations
Using Dynamo’s native “PruneDuplicates” I counted 4 min and then I just forced closing the application as it became not responsive
Using a not optimized custom node : 261 seconds
Using an optimized custom node : 24 seconds (92% reduction time)
Input Data
For the following class we will want to search the set of elements meeting the matching criteria for Layer, Category, Type
public class MyElement { public string Layer { get; set; } public string Category { get; set; } public string Type { get; set; } public string Value { get; set; } public MyElement(string n, string c, string t, string v) { this.Layer = n; this.Category = c; this.Type = t; this.Value = v; } }
For the following class we will want to search the set of elements meeting the matching criteria for Layer, Category, Type. We will assume we have the following criteria defined:
static List<string> Layers = new List<string> { "Layer1", "Layer2", "Layer3", "Layer4", "Layer5" }; static List<string> Categories = new List<string> { "Category1", "Category2", "Category3", "Category4", "Category5" }; static List<string> Types = new List<string> { "Type1", "Type2", "Type3", "Type4", "Type5" };
Classical Solution
Let’s first define and populate our list with 10000000 items, and for the Category, Layer and Type we will set it to be a random value within the available values:
static int noOfItems = 10000000; //initial data list static List<MyElement> DataList = new List<MyElement>(); public static void PopulateDataList() { for (int i = 0; i < noOfItems; i++) { DataList.Add(new MyElement(Layers[i % 5], Categories[i % 5], Types[i % 5], i.ToString())); } }
Our first approach into finding the list of items matching the requirements would be:
"> public static List SearchInList(string layer, string cat, string type) { List elements = new List(); foreach (var item in DataList) { if (item.Layer == layer && item.Category == cat && item.Type == type) { elements.Add(item); } } return elements; }
You might think this is efficient, because in the end it’s a linear loop, you check each item once and get your results.
What could be the problem? Well…what if you have to make many of these checks?
Let’s see how is the performance of this method


Optimal Solution
Now let’s optimize our structure for searching purpose. This optimization is specific to this matching criteria.
We will create a dictionary for each criteria and sort items based on that key.
static Dictionary<string, Dictionary<string, Dictionary<string, List>>> CategorizedDataList = new Dictionary<string, Dictionary<string, Dictionary<string, List>>>();
This does not look pretty, right? We can take care of that later.
For now let’s define the methods for populating the dictionaries (to build the dictionary we will take the elements from the already built list, since this is not a method of eliminating the original data structure but to locally optimize it)
public static void AddToDictionary(MyElement el) { if (!CategorizedDataList.ContainsKey(el.Layer)) { CategorizedDataList.Add(el.Layer, new Dictionary<string, Dictionary<string, List>>()); } if (!CategorizedDataList[el.Layer].ContainsKey(el.Category)) { CategorizedDataList[el.Layer].Add(el.Category, new Dictionary<string, List>()); } if (!CategorizedDataList[el.Layer][el.Category].ContainsKey(el.Type)) { CategorizedDataList[el.Layer][el.Category].Add(el.Type, new List()); } CategorizedDataList[el.Layer][el.Category][el.Type].Add(el); } public static void PopulateDictionary() { for (int i = 0; i < noOfItems; i++) { AddToDictionary(DataList[i]); } }
The key is in the “keys”. First we grouped them according to some keys, now we will search in the same manner:
public static List SearchInDict(string layer, string cat, string type) { if (!CategorizedDataList.ContainsKey(layer)) { return new List(); } if (!CategorizedDataList[layer].ContainsKey(cat)) { return new List(); } if (!CategorizedDataList[layer][cat].ContainsKey(type)) { return new List(); } return CategorizedDataList[layer][cat][type]; }
Of course we will need to sacrifice some execution time just to create our dictionaries , but let’s check how are we now performing:
In the following image you will see both Simple and Optimized approach to have a clear comparison:


static void Main(string[] args) { //Populating DateTime popSt = DateTime.Now; PopulateDataList(); DateTime popEnd = DateTime.Now; Console.WriteLine($"Populating List finished in {popEnd.Subtract(popSt).TotalMilliseconds} ms"); // Populating dictionary PopulateDictionary(); DateTime catEnd = DateTime.Now; Console.WriteLine($"(Optimizing)Creating the Dictionaries finished in {catEnd.Subtract(popEnd).TotalMilliseconds} ms"); //Determining the data we need to find in the list int halfWay = noOfItems / 2; string layer = DataList[halfWay].Layer; string category = DataList[halfWay].Category; string type = DataList[halfWay].Type; //finding the data iterating in the simple list DateTime searchStart = DateTime.Now; SearchInList(layer, category, type); DateTime search1 = DateTime.Now; Console.WriteLine($"(Not Optimized) Search in List finished in {search1.Subtract(searchStart).TotalMilliseconds} ms"); //finding the same item in the dicitonary searchStart = DateTime.Now; SearchInDict(layer, category, type); DateTime search2 = DateTime.Now; Console.WriteLine($"(Optimized)Searching in Dictionaries finished in {search2.Subtract(searchStart).TotalMilliseconds} ms"); }
Finally, these dictionaries might give you a headache, so let’s do something about that. We will create a new class and encapsulate this logic inside:
public class CategorizedElements : Dictionary<string, Dictionary<string, Dictionary<string, List>>> { public void AddToDictionary(MyElement el) { if (!this.ContainsKey(el.Layer)) { this.Add(el.Layer, new Dictionary<string, Dictionary<string, List>>()); } if (!this[el.Layer].ContainsKey(el.Category)) { this[el.Layer].Add(el.Category, new Dictionary<string, List>()); } if (!this[el.Layer][el.Category].ContainsKey(el.Type)) { this[el.Layer][el.Category].Add(el.Type, new List()); } this[el.Layer][el.Category][el.Type].Add(el); } public List SearchInDict(string layer, string cat, string type) { if (!this.ContainsKey(layer)) { return new List(); } if (!this[layer].ContainsKey(cat)) { return new List(); } if (!this[layer][cat].ContainsKey(type)) { return new List(); } return this[layer][cat][type]; } }