Skip to content
This repository has been archived by the owner on Dec 9, 2024. It is now read-only.

Considerations on memory management #287

Closed
3 tasks done
VourMa opened this issue May 10, 2023 · 11 comments · Fixed by #292
Closed
3 tasks done

Considerations on memory management #287

VourMa opened this issue May 10, 2023 · 11 comments · Fixed by #292

Comments

@VourMa
Copy link
Contributor

VourMa commented May 10, 2023

While investigating issue #285, I came across a couple of points in our code where the memory allocations seemed a bit weird/bug-prone. Let me mention them below, so that more people can take a look and let me know if I am missing something:

  • 1. The allocation of the indicesOfEligibleT5Modules happens in Event.cu

    TrackLooper/SDL/Event.cu

    Lines 1257 to 1261 in 4d0158d

    #ifdef CACHE_ALLOC
    rangesInGPU->indicesOfEligibleT5Modules = (uint16_t*)cms::cuda::allocate_device(dev, nLowerModules * sizeof(uint16_t), stream);
    #else
    cudaMalloc(&(rangesInGPU->indicesOfEligibleT5Modules), nLowerModules * sizeof(uint16_t));
    #endif
    instead of in Module.cu
    void SDL::createRangesInExplicitMemory(struct objectRanges& rangesInGPU,unsigned int nModules,cudaStream_t stream, unsigned int nLowerModules)
    where the relevant memory is then freed
    void SDL::objectRanges::freeMemoryCache()//struct objectRanges& rangesInGPU)
    I see no reason for this, and I would propose to move the memory allocation to Module.cu.

  • 2. We have only one way to allocate the modulesInGPU

    TrackLooper/SDL/Module.cu

    Lines 76 to 106 in 4d0158d

    void SDL::createModulesInExplicitMemory(struct modules& modulesInGPU,unsigned int nModules,cudaStream_t stream)
    {
    /* modules stucture object will be created in Event.cu*/
    cudaMalloc(&(modulesInGPU.detIds),nModules * sizeof(unsigned int));
    cudaMalloc(&modulesInGPU.moduleMap,nModules * 40 * sizeof(uint16_t));
    cudaMalloc(&modulesInGPU.mapIdx, nModules*sizeof(uint16_t));
    cudaMalloc(&modulesInGPU.mapdetId, nModules*sizeof(unsigned int));
    cudaMalloc(&modulesInGPU.nConnectedModules,nModules * sizeof(uint16_t));
    cudaMalloc(&modulesInGPU.drdzs,nModules * sizeof(float));
    cudaMalloc(&modulesInGPU.slopes,nModules * sizeof(float));
    cudaMalloc(&modulesInGPU.nModules,sizeof(uint16_t));
    cudaMalloc(&modulesInGPU.nLowerModules,sizeof(uint16_t));
    cudaMalloc(&modulesInGPU.partnerModuleIndices, nModules * sizeof(uint16_t));
    cudaMalloc(&modulesInGPU.layers,nModules * sizeof(short));
    cudaMalloc(&modulesInGPU.rings,nModules * sizeof(short));
    cudaMalloc(&modulesInGPU.modules,nModules * sizeof(short));
    cudaMalloc(&modulesInGPU.rods,nModules * sizeof(short));
    cudaMalloc(&modulesInGPU.subdets,nModules * sizeof(short));
    cudaMalloc(&modulesInGPU.sides,nModules * sizeof(short));
    cudaMalloc(&modulesInGPU.eta,nModules * sizeof(float));
    cudaMalloc(&modulesInGPU.r,nModules * sizeof(float));
    cudaMalloc(&modulesInGPU.isInverted, nModules * sizeof(bool));
    cudaMalloc(&modulesInGPU.isLower, nModules * sizeof(bool));
    cudaMalloc(&modulesInGPU.isAnchor, nModules * sizeof(bool));
    cudaMalloc(&modulesInGPU.moduleType,nModules * sizeof(ModuleType));
    cudaMalloc(&modulesInGPU.moduleLayerType,nModules * sizeof(ModuleLayerType));
    cudaMemcpyAsync(modulesInGPU.nModules,&nModules,sizeof(uint16_t),cudaMemcpyHostToDevice,stream);
    cudaStreamSynchronize(stream);
    }
    but two ways to free the memory

    TrackLooper/SDL/Module.cu

    Lines 166 to 203 in 4d0158d

    void SDL::freeModulesCache(struct modules& modulesInGPU,struct pixelMap& pixelMapping)
    {
    int dev;
    cudaGetDevice(&dev);
    cms::cuda::free_device(dev,modulesInGPU.detIds);
    cms::cuda::free_device(dev,modulesInGPU.moduleMap);
    cms::cuda::free_device(dev,modulesInGPU.mapIdx);
    cms::cuda::free_device(dev,modulesInGPU.mapdetId);
    cms::cuda::free_device(dev,modulesInGPU.nConnectedModules);
    cms::cuda::free_device(dev,modulesInGPU.drdzs);
    cms::cuda::free_device(dev,modulesInGPU.slopes);
    cms::cuda::free_device(dev,modulesInGPU.nModules);
    cms::cuda::free_device(dev,modulesInGPU.nLowerModules);
    cms::cuda::free_device(dev,modulesInGPU.layers);
    cms::cuda::free_device(dev,modulesInGPU.rings);
    cms::cuda::free_device(dev,modulesInGPU.modules);
    cms::cuda::free_device(dev,modulesInGPU.rods);
    cms::cuda::free_device(dev,modulesInGPU.subdets);
    cms::cuda::free_device(dev,modulesInGPU.sides);
    cms::cuda::free_device(dev,modulesInGPU.isInverted);
    cms::cuda::free_device(dev,modulesInGPU.isLower);
    cms::cuda::free_device(dev,modulesInGPU.isAnchor);
    cms::cuda::free_device(dev,modulesInGPU.moduleType);
    cms::cuda::free_device(dev,modulesInGPU.moduleLayerType);
    cms::cuda::free_device(dev,modulesInGPU.connectedPixels);
    cudaFreeHost(pixelMapping.connectedPixelsSizes);
    cudaFreeHost(pixelMapping.connectedPixelsSizesPos);
    cudaFreeHost(pixelMapping.connectedPixelsSizesNeg);
    cudaFreeHost(pixelMapping.connectedPixelsIndex);
    cudaFreeHost(pixelMapping.connectedPixelsIndexPos);
    cudaFreeHost(pixelMapping.connectedPixelsIndexNeg);
    //cms::cuda::free_host(pixelMapping.connectedPixelsSizes);
    //cms::cuda::free_host(pixelMapping.connectedPixelsSizesPos);
    //cms::cuda::free_host(pixelMapping.connectedPixelsSizesNeg);
    //cms::cuda::free_host(pixelMapping.connectedPixelsIndex);
    //cms::cuda::free_host(pixelMapping.connectedPixelsIndexPos);
    //cms::cuda::free_host(pixelMapping.connectedPixelsIndexNeg);
    }
    and

    TrackLooper/SDL/Module.cu

    Lines 204 to 244 in 4d0158d

    void SDL::freeModules(struct modules& modulesInGPU, struct pixelMap& pixelMapping)
    {
    cudaFree(modulesInGPU.detIds);
    cudaFree(modulesInGPU.moduleMap);
    cudaFree(modulesInGPU.mapIdx);
    cudaFree(modulesInGPU.mapdetId);
    cudaFree(modulesInGPU.nConnectedModules);
    cudaFree(modulesInGPU.drdzs);
    cudaFree(modulesInGPU.slopes);
    cudaFree(modulesInGPU.nModules);
    cudaFree(modulesInGPU.nLowerModules);
    cudaFree(modulesInGPU.layers);
    cudaFree(modulesInGPU.rings);
    cudaFree(modulesInGPU.modules);
    cudaFree(modulesInGPU.rods);
    cudaFree(modulesInGPU.subdets);
    cudaFree(modulesInGPU.sides);
    cudaFree(modulesInGPU.eta);
    cudaFree(modulesInGPU.r);
    cudaFree(modulesInGPU.isInverted);
    cudaFree(modulesInGPU.isLower);
    cudaFree(modulesInGPU.isAnchor);
    cudaFree(modulesInGPU.moduleType);
    cudaFree(modulesInGPU.moduleLayerType);
    cudaFree(modulesInGPU.connectedPixels);
    cudaFree(modulesInGPU.partnerModuleIndices);
    cudaFreeHost(pixelMapping.connectedPixelsSizes);
    cudaFreeHost(pixelMapping.connectedPixelsSizesPos);
    cudaFreeHost(pixelMapping.connectedPixelsSizesNeg);
    cudaFreeHost(pixelMapping.connectedPixelsIndex);
    cudaFreeHost(pixelMapping.connectedPixelsIndexPos);
    cudaFreeHost(pixelMapping.connectedPixelsIndexNeg);
    //cms::cuda::free_host(pixelMapping.connectedPixelsSizes);
    //cms::cuda::free_host(pixelMapping.connectedPixelsSizesPos);
    //cms::cuda::free_host(pixelMapping.connectedPixelsSizesNeg);
    //cms::cuda::free_host(pixelMapping.connectedPixelsIndex);
    //cms::cuda::free_host(pixelMapping.connectedPixelsIndexPos);
    //cms::cuda::free_host(pixelMapping.connectedPixelsIndexNeg);
    }
    Any reason for this that I am missing? Otherwise, I think we should just remove the extra memory-freeing functions.

  • 3. Finally, I would like to ask why these variables

    struct SDL::modules* SDL::modulesInGPU = nullptr;
    struct SDL::pixelMap* SDL::pixelMapping = nullptr;
    uint16_t SDL::nModules;
    uint16_t SDL::nLowerModules;
    are defined in the global scope. I would expect them to be inside the Event class or defined in the appropriate file where they are relevant.

@GNiendorf
Copy link
Member

GNiendorf commented Jun 2, 2023

#292 Deals with issue 2 by deleting the cached version since we've agreed previously to not use the caching allocator here. Both will actually be deleted when that area of the code moves to Alpaka since the buffers are freed automatically. It also deals with issue 1, moving the allocation to module.cu.

@VourMa
Copy link
Contributor Author

VourMa commented Jul 13, 2023

More instances of variables in global scope (sub-issue 3 above):

std::map <unsigned int, uint16_t> *SDL::detIdToIndex;
std::map <unsigned int, float> *SDL::module_x;
std::map <unsigned int, float> *SDL::module_y;
std::map <unsigned int, float> *SDL::module_z;
std::map <unsigned int, unsigned int> *SDL::module_type; // 23 : Ph2PSP, 24 : Ph2PSS, 25 : Ph2SS

SDL::ModuleConnectionMap SDL::moduleConnectionMap;
SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer1Subdet5;
SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer2Subdet5;
SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer3Subdet5;
SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer1Subdet4;
SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer2Subdet4;
SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer3Subdet4;
SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer4Subdet4;
SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer1Subdet5_pos;
SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer2Subdet5_pos;
SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer3Subdet5_pos;
SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer1Subdet4_pos;
SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer2Subdet4_pos;
SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer3Subdet4_pos;
SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer4Subdet4_pos;
SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer1Subdet5_neg;
SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer2Subdet5_neg;
SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer3Subdet5_neg;
SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer1Subdet4_neg;
SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer2Subdet4_neg;
SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer3Subdet4_neg;
SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer4Subdet4_neg;

@GNiendorf
Copy link
Member

GNiendorf commented Jul 16, 2023

One more instance of variables in global scope (sub issue 3 like mentioned above):

SDL::TiltedGeometry SDL::tiltedGeometry;

@VourMa
Copy link
Contributor Author

VourMa commented Jul 28, 2023

@GNiendorf subissue 3 is still open but was subissue 1 dealt with during the Alpaka migration?

@GNiendorf
Copy link
Member

@VourMa Yup, subissue 1 is fixed!

@VourMa
Copy link
Contributor Author

VourMa commented Aug 24, 2023

Given that we found a work-around for thread-safety of global variables, should we close this issue?

@slava77
Copy link
Contributor

slava77 commented Aug 24, 2023

Given that we found a work-around for thread-safety of global variables, should we close this issue?

I think that call_once is a hack and we need to get a proper solution eventually.

@VourMa
Copy link
Contributor Author

VourMa commented Apr 21, 2024

@slava77 @GNiendorf Is subissue 3 of this issue closed by the #377?

@slava77
Copy link
Contributor

slava77 commented Apr 21, 2024

@slava77 @GNiendorf Is subissue 3 of this issue closed by the #377?

do you mean

uint16_t SDL::nModules; 
 uint16_t SDL::nLowerModules; 

?

Well, they are still in a "global" space in SDL::Gloabals.
I'm not sure though what the text in subissue 3 implied.

@GNiendorf
Copy link
Member

I think this can be closed now, right @VourMa?

@VourMa
Copy link
Contributor Author

VourMa commented May 29, 2024

I would say so, yes. Closing and we can reopen a new, more dedicated one if we see anymore problems.

@VourMa VourMa closed this as completed May 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants