Need advice with optimizing the running speed of OR-Tools Routing #4172
Replies: 4 comments 6 replies
-
Hi, I thought I replied via the mailing list interface, but I guess it did not post. If it comes through later, apologies in advance for the double posting. Addressing your first issue only here. Caveat: I'm not exactly certain what your problem is, but I think you mean the delay is just for the call to Assuming this is true, my guess is that you have turned on caching in your solver, or else the solver internals are (finally) automatically caching all dimensions up front. What the solver is doing is is calling your callback to get the results of the function, for every node to node pair, for every possible worker. This is a lot of individual calls, and it is necessarily single threaded because that is how the C++ to python interface works in OR Tools API. (And if you haven't explicitly turned on caching of dimensions, you should. But I can't tell without seeing how you create the routing object.) Hmm, as I look at your code, the clue phone is ringing and it is for me. My original thought was to suggest the vector API that has no callback:
But that (and the matching It seems to me one simple improvement might be to skip the check for staff no. You already know that on the python side of the fence, and you've already set it in your dimension creation call. At the very least it saves a zero-op if statement. Here's what I suggest (modifying your input code a little bit). It won't be significantly faster, but it might speed up a little bit. I'm sure you already know this, but just in case you're missing the obvious, with the "vehicle transit" type API calls, each staff has their own evaluator (because you pass it a vector of callbacks, one per xtaff). Thus your original callback test of the staff no is obviated. Also, to speed up the call further, I am going to pre-seed the vector of values for each staff based on prior knowledge of which node is their depot node. To do that I am going to swap around your resistance matrix, to have staff first, then node. reversed_resistance_matrix = {}
// preseed the zero values for each staff's depot.
for staff_no in staffs_no_list:
reversed_resistance_matrix[staff_no] = {}
for from_node in nodes:
if from_node == depot_nodes[staff_no]: // fixes the mystery use of depot_no in your call
reversed_resistance_matrix[staff_no][from_node] = 0
else:
reversed_resistance_matrix[staff_no][from_node] = resistance_matrix[from_node][staff_no]
// essentially the same function as yours, just no more check of staff no, no more check of depot node
def resistance_callback(from_index, staff_no):
from_no = manager.IndexToNode(from_index)
return reversed_resistance_matrix[staff_no][from_no]
resistance_callback_indices = []
for staff_no in staffs_no_list:
// expanding this to clarify things
// generate a callback that will ONLY be used for staff_no
staff_no_calllback_fn = partial(resistance_callback, staff_no=staff_no)
// generate the index of that callback for use by the solver
staff_no_callback_fn_idx = routing.RegisterUnaryTransitCallback(staff_no_callback_fn)
// stash the staff-specific callback index for use in the dimension creation
resistance_callback_indices.append(staff_no_callback_fn_idx)
// create a single dimension, but each vehicle has its own, private callback
// function to get the cost of each node
routing.AddDimensionWithVehicleTransits(resistance_callback_indices, 0, 999999999, True, "resistance")
resistance_dimension = routing.GetDimensionOrDie("resistance")
for staff_no in staffs_no_list:
index = routing.End(staff_no)
resistance_dimension.SetCumulVarSoftUpperBound(index, 0, 1) Hope that helps. I don't anticipate a lot of speeding up there, but at least the function has less work to do. James |
Beta Was this translation helpful? Give feedback.
-
having some |
Beta Was this translation helpful? Give feedback.
-
I think the difference is when called from within C++.
I'm just reporting my experience when embedding these structures into the OR-Tools callbacks.
I completely agree that numpy is faster within python itself...I mean, that's the whole point of numpy, right? But my guess is that the speedup I see is that there is more language overhead when calling out from C++.
If your tests prove otherwise, that would be great.
On the other hand, the fastest of all is to use the matrix and vector dimensions where you just shove all the data at once to OR Tools.
Regards,
James
…On Sat, May 18, 2024 at 05:55:13AM -0700, sschnug wrote:
> Dicts are more primitive, faster than numpy structures.
No way a dict is faster than numpy. Everything and i mean everything is performance-improving on dense numpy-arrays: index-computation (slides vs. hashing), cache-friendliness / compactness, cache-friendliness / locality, prefetching-prediction (more linear).
If you really see speedups by switching from numpy to dicts, i would immediately double-check my experiments. Also... check the internal node-speed.
--
Reply to this email directly or view it on GitHub:
#4172 (reply in thread)
You are receiving this because you commented.
Message ID: ***@***.***>
--
James E. Marca
Activimetrics LLC
|
Beta Was this translation helpful? Give feedback.
-
I also found that C++ is by far the fastest for model construction. I had one client who used C++, and the speed up was so great I started to re-learn c++. But most of the people I work with use python.
Did you try the matrix and vector dimensions? From my understanding, data is passed just once, when the dimensions are created.
James
…On Sun, May 19, 2024 at 07:09:10PM -0700, Rabbids wrote:
The largest performance overhead comes from callback functions. During model construction, data is constantly passed between Python and C++. From my experiments, using dictionaries is about 15% - 20% faster than using NumPy arrays.
To maximize performance, I have rewritten my code in C++. After testing with my own benchmark function, the performance of the C++ version has improved by 50-100 times.
--
Reply to this email directly or view it on GitHub:
#4172 (reply in thread)
You are receiving this because you commented.
Message ID: ***@***.***>
--
James E. Marca
Activimetrics LLC
|
Beta Was this translation helpful? Give feedback.
-
Python 3.10, OR-Tools 9.8.3296
I have built a dispatching system for airport operations using Routing, and it is running well now. However, there are currently two efficiency issues bothering me.
In my system, since it involves dispatching tasks and staffs, the "task" here essentially correspond to "node" in Routing, and the "staff" correspond to "vehicle" in Routing. So I will use "task" and "staff" to describe the issues.
The first issue is that I need different priorities for each task with respect to each staff, based on their position, qualifications, or other subjective reasons. I have built many dimensions similar to the following code:
For a task, if the value of a staff in the dimension is higher, the penalty value will be higher, and the task will be more resistant to being assigned to this staff.
But now I found that in a dispatching involving hundreds of tasks and hundreds of staffs, building such dimensions consumes the vast majority of the time. Building such a dimension takes over a dozen seconds, while data processing or other dimension building only takes a few milliseconds, which is a difference of several thousand times.
I further found that the most time-consuming part is
resistance_callback_indices.append(routing.RegisterUnaryTransitCallback(partial(resistance_callback, staff_no=staff_no)))
, where the callback function will be called: C = (T + 2S)² * S times, which T is number of tasks and S is number of staffs.Later, based on this, I added a benchmark-like feature to my system to evaluate server performance and estimate dispatching runtime, using the number of callbacks per millisecond as a metric. Like the following example code:
I want to know if there are any parts that can be optimized about building such kind of dimensions.
The second issue is, because I'm not sure which strategies to choose, I'm combining 11 FIRST_SOLUTION_STRATEGIES with 5 LOCAL_SEARCH_STRATEGIES in pairs, resulting in 55 sets of parameters. Then I will create a multiprocessing pool to pass all 55 sets of data and parameters for computation by
pool.starmap_async()
, and finally select the solution with the smallest penalty valuesolution.ObjectiveValue()
as the ultimate result.Throughout the computation process, each subprocess needs to run through all steps: including data preprocessing, model building, and computation. The only difference between each subprocess is the choice of strategies. As mentioned in the previous issue, model building is actually the most time-consuming part. In the current production environment, the SEARCH_TIME_LIMIT typically ranges 1 - 5 seconds to obtain a decent result. However, the overall time consumption usually varies from 30+ seconds to several minutes.
I once attempted to merge the data processing and model building steps into one execution and then distribute it to subprocesses for computation, but I couldn't accomplish it. I also tried saving a computed solution to a local file and then reading this file to switch strategies parameters for further computation, but I couldn't achieve that either, because the solution for some strategies
routing.WriteAssignment()
is True butrouting.ReadAssignment()
is None.I want to optimize the overall process time. Can you give me some advice on the above situation and issues? Thank you.
Crossposted at https://stackoverflow.com/questions/78296596/need-advice-with-optimizing-the-running-speed-of-or-tools-routing
Beta Was this translation helpful? Give feedback.
All reactions