Support for slices cache #655

weypro · 2022-04-08T15:27:25Z

weypro
Apr 8, 2022

When I try to perform sliced prediction with the default setting 256*256 (486 slices), the error occurs.

[enforce fail at ..\c10\core\CPUAllocator.cpp:79] data. DefaultCPUAllocator: not enough memory: you tried to allocate 19961856 bytes.

If I change 256*256 to 512*512 (243 slices), it can work. The size of the sample image is 5500*4000
My total memory is only 16GB, and I think it is necessary to add the slices cache, in that way the script can deal with even more bigger images without the memory limitations.

fcakyon · 2022-04-09T09:27:37Z

fcakyon
Apr 9, 2022
Maintainer

Hello, @weypro do you have any implementation detail in your mind? Should we lazily generate image patches while performing sliced inference or should we export slices to disk and read them while iterating?

0 replies

weypro · 2022-04-09T13:42:17Z

weypro
Apr 9, 2022
Author

The later seems to be a general solution and easier to be implemented. To be frank, before I skim the code slicing.py , I support this solution.
But after skimming, I find that corner values of each slice are generated by get_slice_bboxes.
So I have a sudden inspiration that why we do not use shallow copy if we already have the corners that can be interpreted as the array subscripts?
The key point is

image_pil_slice = image_pil.crop(slice_bbox)

In the function slice_image.
Image.crop returns a new object actually. If the image is very big, the memory can easily run out. numpy provides the feature view with shallow copy or even slice the array directly, so we can manually create a shallow copy as a slice lke this.

        tlx = slice_bbox[0]
        tly = slice_bbox[1]
        brx = slice_bbox[2]
        bry = slice_bbox[3]
        image_pil_slice = image_pil_arr[tly:bry, tlx:brx]

The full code is (slicing.py - slice_image)

    ...

    image_pil_arr=np.asarray(image_pil) #NEW

    # iterate over slices
    for slice_bbox in slice_bboxes:
        n_ims += 1

        # extract image
        tlx = slice_bbox[0] #NEW
        tly = slice_bbox[1]
        brx = slice_bbox[2]
        bry = slice_bbox[3]
        image_pil_slice = image_pil_arr[tly:bry, tlx:brx]
        
        # process annotations if coco_annotations is given
        ...

        # create sliced image and append to sliced_image_result
        sliced_image = SlicedImage(
            image=image_pil_slice, #NEW
            coco_image=coco_image,
            starting_pixel=[slice_bbox[0], slice_bbox[1]],
        )
        sliced_image_result.add_sliced_image(sliced_image)
    ...

Well, it works, and I can use the default setting 256*256 to slice the image now, while it sometimes can also run out of the memory by multiple experiments on the same image.
There is still a problem of high memory consumption. The following table is the peak value of memory consumption with different settings, which may be helpful.
1024*1024 4.5GB
512*512 5GB
256*256 7GB (limited by memory)

0 replies

devrimcavusoglu · 2022-04-12T12:04:30Z

devrimcavusoglu
Apr 12, 2022
Maintainer

Hi there, sorry for jumping late. It sounds a really nice improvement, my only concern was mutation on image_pil, but when I re-check the code, there is no mutation. I think we can move forward with this logic. @weypro would you like to draft a PR on this ?

0 replies

weypro · 2022-04-13T16:42:04Z

weypro
Apr 13, 2022
Author

To prove the effect, I try to watch the memory after slicing. The result is that the gap is only tens of megabytes, which means that the improvement is presumably a coincidence and can be ignored.
But the process has led me to make the following practical improvement. Let's see predict.py - get_sliced_prediction

...
        # convert sliced predictions to full predictions
        for object_prediction in prediction_result.object_prediction_list:
            if object_prediction:  # if not empty
                object_prediction_list.append(object_prediction.get_shifted_object_prediction())
        
        #NEW
        # merge matching predictions during the prediction
        if merge_during_prediction and len(object_prediction_list) > merge_buffer_length:
            object_prediction_list = postprocess(object_prediction_list)

    # perform standard prediction
...

    # merge matching predictions
    if len(object_prediction_list) > 1:
        object_prediction_list = postprocess(object_prediction_list)

    #NEW
    time_end = time.time() - time_start
    durations_in_seconds["prediction"] = time_end
...

With pympler, I have proved the fact that the memory of object_prediction_list is allocated more than 2GB when predicting the image, so it becomes my optimization objective. The original logic is merging all the sliced predictions after finishing the prediction, causing the list to occupy a lot of unnecessary temporary memory. So the change is to add the merging during the prediction. The following table is a simple conclusion. The flag is merge_during_prediction and the default value of merge_buffer_length is 30. Notice that the value is just an arbitrary number.

              Memory         Time
Before:        7GB           106s
After:
  False Flag   7GB           107s
  True Flag    5GB           107s

If you don't mind, I can draft the PR by adding a Boolean parameter merge_during_prediction for specifying whether to merge the list during the prediction and an Int parameter merge_buffer_length for customizing the constant, or wait for others to continue my research. @devrimcavusoglu @fcakyon

0 replies

devrimcavusoglu · 2022-04-14T08:31:48Z

devrimcavusoglu
Apr 14, 2022
Maintainer

@weypro Thanks for detailed comment and analysis. My only question here is can't we just refactor and update the code without preserving merge_after_prediction ? However, it'd be more solid to proceed with your suggestion just in case, so let's proceed as you said (make it parametric with merge_during_prediction). At this point you can draft a PR that we can iterate over.

0 replies

weypro · 2022-04-14T10:05:10Z

weypro
Apr 14, 2022
Author

OK. We can replace merge_during_prediction with merge_buffer_length.

if merge_buffer_length and len(object_prediction_list) > merge_buffer_length:
    object_prediction_list = postprocess(object_prediction_list)

Or even we can remove merge_buffer_length as the following. It means that object_prediction_list is needed to be merged after every prediction.

object_prediction_list = postprocess(object_prediction_list)

The time difference is nearly one second. We can take this solution if the consumption is acceptable. @devrimcavusoglu

0 replies

devrimcavusoglu · 2022-04-14T10:24:21Z

devrimcavusoglu
Apr 14, 2022
Maintainer

@weypro Yes, indeed I meant the latter. To remove the current merging (after) and replacing completely with merging (during prediction) and without making it parametric, but I consider more about if this is the case for all prediction sizes, but I will for now assume that it will never be worse than current in terms of memory, and the trade-off between time is neglible (which is the case). However, the former merge_buffer_length seems more attractive as it allows user a more modular interface. Thus, we can proceed with the former implementation you mentioned, let's make merge_buffer_length parametric and set its default value as 1 (which corresponds to the latter, merging after each prediction).

I'll leave the other related discussions to the PR for now (we may want to test some additional/edge cases).

0 replies

fcakyon · 2022-04-14T10:34:21Z

fcakyon
Apr 14, 2022
Maintainer

OK. We can replace merge_during_prediction with merge_buffer_length.
if merge_buffer_length and len(object_prediction_list) > merge_buffer_length:
    object_prediction_list = postprocess(object_prediction_list)
Or even we can remove merge_buffer_length as the following. It means that object_prediction_list is needed to be merged after every prediction.
object_prediction_list = postprocess(object_prediction_list)
The time difference is nearly one second. We can take this solution if the consumption is acceptable. @devrimcavusoglu

Which implementation is faster by 1 second?

0 replies

weypro · 2022-04-14T10:35:44Z

weypro
Apr 14, 2022
Author

OK. We can replace merge_during_prediction with merge_buffer_length.
if merge_buffer_length and len(object_prediction_list) > merge_buffer_length:
    object_prediction_list = postprocess(object_prediction_list)
Or even we can remove merge_buffer_length as the following. It means that object_prediction_list is needed to be merged after every prediction.
object_prediction_list = postprocess(object_prediction_list)
The time difference is nearly one second. We can take this solution if the consumption is acceptable. @devrimcavusoglu
Which implementation is faster by 1 second?

The former is faster.

0 replies

fcakyon · 2022-04-14T10:36:23Z

fcakyon
Apr 14, 2022
Maintainer

@weypro Yes, indeed I meant the latter. To remove the current merging (after) and replacing completely with merging (during prediction) and without making it parametric, but I consider more about if this is the case for all prediction sizes, but I will for now assume that it will never be worse than current in terms of memory, and the trade-off between time is neglible (which is the case). However, the former merge_buffer_length seems more attractive as it allows user a more modular interface. Thus, we can proceed with the former implementation you mentioned, let's make merge_buffer_length parametric and set its default value as 1 (which corresponds to the latter, merging after each prediction).

I'll leave the other related discussions to the PR for now (we may want to test some additional/edge cases).

Applying postprocess at each prediction would affect the overall accuracy (since the merged/suppressed boxes will be different). Have @weypro evaluated the effect of this change in terms of AP?

0 replies

fcakyon · 2022-04-14T10:37:57Z

fcakyon
Apr 14, 2022
Maintainer

OK. We can replace merge_during_prediction with merge_buffer_length.
if merge_buffer_length and len(object_prediction_list) > merge_buffer_length:
    object_prediction_list = postprocess(object_prediction_list)
Or even we can remove merge_buffer_length as the following. It means that object_prediction_list is needed to be merged after every prediction.
object_prediction_list = postprocess(object_prediction_list)
The time difference is nearly one second. We can take this solution if the consumption is acceptable. @devrimcavusoglu
Which implementation is faster by 1 second?
The former is faster.

Then maybe we can set the default of the buffer as inf so that average user would not face with the time delay.

0 replies

weypro · 2022-04-14T10:45:18Z

weypro
Apr 14, 2022
Author

@weypro Yes, indeed I meant the latter. To remove the current merging (after) and replacing completely with merging (during prediction) and without making it parametric, but I consider more about if this is the case for all prediction sizes, but I will for now assume that it will never be worse than current in terms of memory, and the trade-off between time is neglible (which is the case). However, the former merge_buffer_length seems more attractive as it allows user a more modular interface. Thus, we can proceed with the former implementation you mentioned, let's make merge_buffer_length parametric and set its default value as 1 (which corresponds to the latter, merging after each prediction).
I'll leave the other related discussions to the PR for now (we may want to test some additional/edge cases).

Applying postprocess at each prediction would affect the overall accuracy (since the merged/suppressed boxes will be different). Have @weypro evaluated the effect of this change in terms of AP?

No. Further research is indeed needed.

0 replies

fcakyon · 2022-04-14T10:58:16Z

fcakyon
Apr 14, 2022
Maintainer

@weypro Yes, indeed I meant the latter. To remove the current merging (after) and replacing completely with merging (during prediction) and without making it parametric, but I consider more about if this is the case for all prediction sizes, but I will for now assume that it will never be worse than current in terms of memory, and the trade-off between time is neglible (which is the case). However, the former merge_buffer_length seems more attractive as it allows user a more modular interface. Thus, we can proceed with the former implementation you mentioned, let's make merge_buffer_length parametric and set its default value as 1 (which corresponds to the latter, merging after each prediction).
I'll leave the other related discussions to the PR for now (we may want to test some additional/edge cases).

Applying postprocess at each prediction would affect the overall accuracy (since the merged/suppressed boxes will be different). Have @weypro evaluated the effect of this change in terms of AP?

No. Further research is indeed needed.

We can conduct the related research in the pr 👍

0 replies

weypro · 2022-04-14T11:05:54Z

weypro
Apr 14, 2022
Author

OK. We can replace merge_during_prediction with merge_buffer_length.
if merge_buffer_length and len(object_prediction_list) > merge_buffer_length:
    object_prediction_list = postprocess(object_prediction_list)
Or even we can remove merge_buffer_length as the following. It means that object_prediction_list is needed to be merged after every prediction.
object_prediction_list = postprocess(object_prediction_list)
The time difference is nearly one second. We can take this solution if the consumption is acceptable. @devrimcavusoglu
Which implementation is faster by 1 second?
The former is faster.
Then maybe we can set the default of the buffer as inf so that average user would not face with the time delay.

My question is how to express inf in Python? float('inf') and Decimal('Infinity') may need to be avoided because of floating point comparison, or we can ignore the problem and regard merge_buffer_length as a floating point number?

0 replies

devrimcavusoglu · 2022-04-14T11:09:47Z

devrimcavusoglu
Apr 14, 2022
Maintainer

I am not supportive for inf for a buffer-like implementation, by definition buffer should be finite. Can we instead use, -1 or None type of value as a flag ? @weypro @fcakyon

0 replies

fcakyon · 2022-04-14T11:20:01Z

fcakyon
Apr 14, 2022
Maintainer

I want the default to stay same as the current implementation (merge after all predictions), at least till we are sure the AP is not decreasing with buffering.

To manage that, default value can be merge_buffer_length=None and buffering should be applied when merge_buffer_length is not None.

0 replies

devrimcavusoglu · 2022-04-14T11:28:11Z

devrimcavusoglu
Apr 14, 2022
Maintainer

I want the default to stay same as the current implementation (merge after all predictions), at least till we are sure the AP is not decreasing with buffering.

To manage that, default value can be merge_buffer_length=None and buffering should be applied when merge_buffer_length is not None.

Yes, I'm also not opposed to that, that's why I am also for the buffer implementation. I was just pointing out that inf is not a good value to be used along with a buffer, and maybe None or -1 to indicate the default behavior (current).

0 replies

weypro · 2022-04-14T11:53:51Z

weypro
Apr 14, 2022
Author

PR is here. #445

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for slices cache #655

{{title}}

Replies: 18 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Support for slices cache #655

weypro Apr 8, 2022

Replies: 18 comments

fcakyon Apr 9, 2022 Maintainer

weypro Apr 9, 2022 Author

devrimcavusoglu Apr 12, 2022 Maintainer

weypro Apr 13, 2022 Author

devrimcavusoglu Apr 14, 2022 Maintainer

weypro Apr 14, 2022 Author

devrimcavusoglu Apr 14, 2022 Maintainer

fcakyon Apr 14, 2022 Maintainer

weypro Apr 14, 2022 Author

fcakyon Apr 14, 2022 Maintainer

fcakyon Apr 14, 2022 Maintainer

weypro Apr 14, 2022 Author

fcakyon Apr 14, 2022 Maintainer

weypro Apr 14, 2022 Author

devrimcavusoglu Apr 14, 2022 Maintainer

fcakyon Apr 14, 2022 Maintainer

devrimcavusoglu Apr 14, 2022 Maintainer

weypro Apr 14, 2022 Author

weypro
Apr 8, 2022

fcakyon
Apr 9, 2022
Maintainer

weypro
Apr 9, 2022
Author

devrimcavusoglu
Apr 12, 2022
Maintainer

weypro
Apr 13, 2022
Author

devrimcavusoglu
Apr 14, 2022
Maintainer

weypro
Apr 14, 2022
Author

devrimcavusoglu
Apr 14, 2022
Maintainer

fcakyon
Apr 14, 2022
Maintainer

weypro
Apr 14, 2022
Author

fcakyon
Apr 14, 2022
Maintainer

fcakyon
Apr 14, 2022
Maintainer

weypro
Apr 14, 2022
Author

fcakyon
Apr 14, 2022
Maintainer

weypro
Apr 14, 2022
Author

devrimcavusoglu
Apr 14, 2022
Maintainer

fcakyon
Apr 14, 2022
Maintainer

devrimcavusoglu
Apr 14, 2022
Maintainer

weypro
Apr 14, 2022
Author