Description
Performance Driven Design – This post is about defining performance criteria when handling large data operations and driving performance driven solutions.
Requirement
Define an algorithm that would quickly iterate through a set of points and eliminate duplicate ones. The first intent of this algorithm was to determine duplicate elements (walls, beams, or point placed families) in Revit.
Problem
Using the Revit native “PruneDuplicates” does not qualify as a performer for large amount of data.
Solution Metrics for 10000 points
Using Dynamo’s native “PruneDuplicates” I counted 4 min and then I just forced closing the application as it became not responsive
Using a not optimized custom node : 261 seconds
Using an optimized custom node : 24 seconds (92% reduction time)
Input Data
A set of 10000 points made from 2 identical set of 5000 points (just to make sure we have duplicates)


Classical Solution
The 260s Solution : An easy solution is to create an empty list to which you add points. A point is added if there is no other point in the list with identical X,Y,Z within a tollerance value. In this case the X Y and Z are numerically compared after rounding them to a certain tollerance.


import re import time import clr clr.AddReference('ProtoGeometry') from Autodesk.DesignScript.Geometry import Point as DsPoint # #isIterable : Determining if an object is a list # def isIterable(unit): t1 = t2 = False; if(isinstance(unit, (tuple, set, list, dict))): t1 = True; if(hasattr(unit, "__iter__")): iterstr = str(unit.__iter__()); if(re.search("System.Array", iterstr)): t2 = True; return (t1 or t2); # #isPoint: Determining if a point is a point # def isPoint(unit): if(hasattr(unit, "GetType")): return unit.GetType().Equals(DsPoint) else: return False; tol = 3; # # isSamePoint : Determining if 2 points are identical with a certain tollerance # def isSamePoint(L1, L2): l1x = round(L1.X, tol); l1y = round(L1.Y, tol); l1z = round(L1.Z, tol); l2x = round(L2.X, tol); l2y = round(L2.Y, tol); l2z = round(L2.Z, tol); if((l1x == l2x) and (l1y == l2y) and (l1z == l2z)): return True; return False; # # UL : the exported list # UL = []; # # checkPoint : determins if an element exists in a list or not # def checkPoint(element): #hashing found = False; for item in UL: if(isSamePoint(item, element)): return False; return True;#true means line is unique # # check if the point is a point. returns false iif not # def preCheckPoint(element): if(isPoint(element)): return element; return False; # # preiterateAndDo : keeping only points in the list. THe check is recursive so that multidimensional lists can be passed # def preiterateAndDo(elements): if(isIterable(elements)): retList = []; for element in elements: retList.append(preiterateAndDo(element)); return retList; else: if(isPoint(elements)): return elements; return False; # # iterateAndDo : setting duplicate elements to false. The operations are recursive to that multidimensional lists can pe passed # def iterateAndDo(elements): if (isIterable(elements)): retList = []; for element in elements: retList.append(iterateAndDo(element)); return retList; else: if(elements != False): checkedPoint = checkPoint(elements); if(checkedPoint != False): UL.append(elements); return elements; return False; tt = time.time(); filtered = preiterateAndDo(IN[0]); uniquePts = iterateAndDo(filtered); tt2 = time.time(); OUT = uniquePts, tt2-tt
Optimal Solution
The 24s Solution : One solution to increase performance is similar to how Javascript recommends caching objects or functions. A direct reference is faster then navigating through the namespaces.
In this case the solution is to a way around all the calls after the X,Y,Z properties for each point to ease up querying for 3 different properties on each point.
Each point was replaced by a string representation and 2 points were considered identical if their string representations are identical


import re import time import clr clr.AddReference('ProtoGeometry') from Autodesk.DesignScript.Geometry import Point as DsPoint # # isIterable : determines if an object is a enumerable # def isIterable(unit): t1 = t2 = False; if(isinstance(unit, (tuple, set, list, dict))): t1 = True; if(hasattr(unit, "__iter__")): iterstr = str(unit.__iter__()); if(re.search("System.Array", iterstr)): t2 = True; return (t1 or t2); # # isPoint : determines if the object is Point # def isPoint(unit): if(hasattr(unit, "GetType")): return unit.GetType().Equals(DsPoint) else: return False; def isSameHash(L1, L2): if(L1 == L2): return True; return False; UL_HASH = []; def checkPoint(elementHash): #hashing found = False; for item in UL_HASH: if(isSameHash(item, elementHash)): return False; return True;#true means line is unique tol = 3;#tol is the precision of coordinates hashingFormat = '{:.' + str(tol) + 'f}'; # # hashPoint : function that reades X Y Z of a point and creates a string representation of the point # for tol = 3 and Point A(4.1234, 5.1234, 6.1234) => 4.1235.1236.123 # def hashPoint(element): L1X = hashingFormat.format(element.X); L1Y = hashingFormat.format(element.Y); L1Z = hashingFormat.format(element.Z); return L1X + L1Y + L1Z; # # preiterateAndDo - a recursive check of multidimensional lists that will replace the non-point with false # def preiterateAndDo(elements): if(isIterable(elements)): retList = []; for element in elements: retList.append(preiterateAndDo(element)); return retList; else: if(isPoint(elements)): return elements; return False; # # iterateAndDo : determines if the point is in the list. The comparison is made between the string representation of the points. # The check is made recursively # def iterateAndDo(elements): if (isIterable(elements)): retList = []; for element in elements: retList.append(iterateAndDo(element)); return retList; else: if(elements != False): hashedPoint = hashPoint(elements); if(hashedPoint != False): checkedPoint = checkPoint(hashedPoint); if(checkedPoint != False): #UL_HASH is a one dimension array that stores all hashed properties of single lines. UL_HASH.append(hashedPoint); return elements; return False; tt = time.time(); filteredPoints = preiterateAndDo(IN[0]); uniquePoints = iterateAndDo(filteredPoints); tt2 = time.time(); OUT = uniquePoints, tt2-tt
Hi;
thank u for sharing this solution.
How can i run this code? Is copy paste code in python script?
i was try but didn’t work.
Copy pasting doesn’T work because of the indenting. You could reindent the code or use dynamo package manager to download the Caribou package, you will find this node there.