Considerations on memory management #287

VourMa · 2023-05-10T18:26:32Z

While investigating issue #285, I came across a couple of points in our code where the memory allocations seemed a bit weird/bug-prone. Let me mention them below, so that more people can take a look and let me know if I am missing something:

1. The allocation of the indicesOfEligibleT5Modules happens in Event.cu

TrackLooper/SDL/Event.cu

Lines 1257 to 1261 in 4d0158d

    
           #ifdef CACHE_ALLOC 
        
                   rangesInGPU->indicesOfEligibleT5Modules = (uint16_t*)cms::cuda::allocate_device(dev, nLowerModules * sizeof(uint16_t), stream); 
        
           #else 
        
                   cudaMalloc(&(rangesInGPU->indicesOfEligibleT5Modules), nLowerModules * sizeof(uint16_t)); 
        
           #endif

instead of in Module.cu

TrackLooper/SDL/Module.cu

Line 12 in 4d0158d

    
           void SDL::createRangesInExplicitMemory(struct objectRanges& rangesInGPU,unsigned int nModules,cudaStream_t stream, unsigned int nLowerModules)

where the relevant memory is then freed

TrackLooper/SDL/Module.cu

Line 108 in 4d0158d

void SDL::objectRanges::freeMemoryCache()//struct objectRanges& rangesInGPU)

I see no reason for this, and I would propose to move the memory allocation to Module.cu.

2. We have only one way to allocate the modulesInGPU

TrackLooper/SDL/Module.cu

Lines 76 to 106 in 4d0158d

    
           void SDL::createModulesInExplicitMemory(struct modules& modulesInGPU,unsigned int nModules,cudaStream_t stream) 
        
           { 
        
               /* modules stucture object will be created in Event.cu*/ 
        
               cudaMalloc(&(modulesInGPU.detIds),nModules * sizeof(unsigned int)); 
        
               cudaMalloc(&modulesInGPU.moduleMap,nModules * 40 * sizeof(uint16_t)); 
        
               cudaMalloc(&modulesInGPU.mapIdx, nModules*sizeof(uint16_t)); 
        
               cudaMalloc(&modulesInGPU.mapdetId, nModules*sizeof(unsigned int)); 
        
               cudaMalloc(&modulesInGPU.nConnectedModules,nModules * sizeof(uint16_t)); 
        
               cudaMalloc(&modulesInGPU.drdzs,nModules * sizeof(float)); 
        
               cudaMalloc(&modulesInGPU.slopes,nModules * sizeof(float)); 
        
               cudaMalloc(&modulesInGPU.nModules,sizeof(uint16_t)); 
        
               cudaMalloc(&modulesInGPU.nLowerModules,sizeof(uint16_t)); 
        
               cudaMalloc(&modulesInGPU.partnerModuleIndices, nModules * sizeof(uint16_t)); 
        
               cudaMalloc(&modulesInGPU.layers,nModules * sizeof(short)); 
        
               cudaMalloc(&modulesInGPU.rings,nModules * sizeof(short)); 
        
               cudaMalloc(&modulesInGPU.modules,nModules * sizeof(short)); 
        
               cudaMalloc(&modulesInGPU.rods,nModules * sizeof(short)); 
        
               cudaMalloc(&modulesInGPU.subdets,nModules * sizeof(short)); 
        
               cudaMalloc(&modulesInGPU.sides,nModules * sizeof(short)); 
        
               cudaMalloc(&modulesInGPU.eta,nModules * sizeof(float)); 
        
               cudaMalloc(&modulesInGPU.r,nModules * sizeof(float)); 
        
               cudaMalloc(&modulesInGPU.isInverted, nModules * sizeof(bool)); 
        
               cudaMalloc(&modulesInGPU.isLower, nModules * sizeof(bool)); 
        
               cudaMalloc(&modulesInGPU.isAnchor, nModules * sizeof(bool)); 
        
               cudaMalloc(&modulesInGPU.moduleType,nModules * sizeof(ModuleType)); 
        
               cudaMalloc(&modulesInGPU.moduleLayerType,nModules * sizeof(ModuleLayerType)); 
        
               cudaMemcpyAsync(modulesInGPU.nModules,&nModules,sizeof(uint16_t),cudaMemcpyHostToDevice,stream); 
        
               cudaStreamSynchronize(stream); 
        
           }

but two ways to free the memory

TrackLooper/SDL/Module.cu

Lines 166 to 203 in 4d0158d

    
           void SDL::freeModulesCache(struct modules& modulesInGPU,struct pixelMap& pixelMapping) 
        
           { 
        
             int dev; 
        
             cudaGetDevice(&dev); 
        
             cms::cuda::free_device(dev,modulesInGPU.detIds); 
        
             cms::cuda::free_device(dev,modulesInGPU.moduleMap); 
        
             cms::cuda::free_device(dev,modulesInGPU.mapIdx); 
        
             cms::cuda::free_device(dev,modulesInGPU.mapdetId); 
        
             cms::cuda::free_device(dev,modulesInGPU.nConnectedModules); 
        
             cms::cuda::free_device(dev,modulesInGPU.drdzs); 
        
             cms::cuda::free_device(dev,modulesInGPU.slopes); 
        
             cms::cuda::free_device(dev,modulesInGPU.nModules); 
        
             cms::cuda::free_device(dev,modulesInGPU.nLowerModules); 
        
             cms::cuda::free_device(dev,modulesInGPU.layers); 
        
             cms::cuda::free_device(dev,modulesInGPU.rings); 
        
             cms::cuda::free_device(dev,modulesInGPU.modules); 
        
             cms::cuda::free_device(dev,modulesInGPU.rods); 
        
             cms::cuda::free_device(dev,modulesInGPU.subdets); 
        
             cms::cuda::free_device(dev,modulesInGPU.sides); 
        
             cms::cuda::free_device(dev,modulesInGPU.isInverted); 
        
             cms::cuda::free_device(dev,modulesInGPU.isLower); 
        
             cms::cuda::free_device(dev,modulesInGPU.isAnchor); 
        
             cms::cuda::free_device(dev,modulesInGPU.moduleType); 
        
             cms::cuda::free_device(dev,modulesInGPU.moduleLayerType); 
        
             cms::cuda::free_device(dev,modulesInGPU.connectedPixels); 
        
             cudaFreeHost(pixelMapping.connectedPixelsSizes); 
        
             cudaFreeHost(pixelMapping.connectedPixelsSizesPos); 
        
             cudaFreeHost(pixelMapping.connectedPixelsSizesNeg); 
        
             cudaFreeHost(pixelMapping.connectedPixelsIndex); 
        
             cudaFreeHost(pixelMapping.connectedPixelsIndexPos); 
        
             cudaFreeHost(pixelMapping.connectedPixelsIndexNeg); 
        
             //cms::cuda::free_host(pixelMapping.connectedPixelsSizes); 
        
             //cms::cuda::free_host(pixelMapping.connectedPixelsSizesPos); 
        
             //cms::cuda::free_host(pixelMapping.connectedPixelsSizesNeg); 
        
             //cms::cuda::free_host(pixelMapping.connectedPixelsIndex); 
        
             //cms::cuda::free_host(pixelMapping.connectedPixelsIndexPos); 
        
             //cms::cuda::free_host(pixelMapping.connectedPixelsIndexNeg); 
        
           }

and

TrackLooper/SDL/Module.cu

Lines 204 to 244 in 4d0158d

    
           void SDL::freeModules(struct modules& modulesInGPU, struct pixelMap& pixelMapping) 
        
           { 
        
             cudaFree(modulesInGPU.detIds); 
        
             cudaFree(modulesInGPU.moduleMap); 
        
             cudaFree(modulesInGPU.mapIdx); 
        
             cudaFree(modulesInGPU.mapdetId); 
        
             cudaFree(modulesInGPU.nConnectedModules); 
        
             cudaFree(modulesInGPU.drdzs); 
        
             cudaFree(modulesInGPU.slopes); 
        
             cudaFree(modulesInGPU.nModules); 
        
             cudaFree(modulesInGPU.nLowerModules); 
        
             cudaFree(modulesInGPU.layers); 
        
             cudaFree(modulesInGPU.rings); 
        
             cudaFree(modulesInGPU.modules); 
        
             cudaFree(modulesInGPU.rods); 
        
             cudaFree(modulesInGPU.subdets); 
        
             cudaFree(modulesInGPU.sides); 
        
             cudaFree(modulesInGPU.eta); 
        
             cudaFree(modulesInGPU.r); 
        
             cudaFree(modulesInGPU.isInverted); 
        
             cudaFree(modulesInGPU.isLower); 
        
             cudaFree(modulesInGPU.isAnchor); 
        
             cudaFree(modulesInGPU.moduleType); 
        
             cudaFree(modulesInGPU.moduleLayerType); 
        
             cudaFree(modulesInGPU.connectedPixels); 
        
             cudaFree(modulesInGPU.partnerModuleIndices); 
        
             cudaFreeHost(pixelMapping.connectedPixelsSizes); 
        
             cudaFreeHost(pixelMapping.connectedPixelsSizesPos); 
        
             cudaFreeHost(pixelMapping.connectedPixelsSizesNeg); 
        
             cudaFreeHost(pixelMapping.connectedPixelsIndex); 
        
             cudaFreeHost(pixelMapping.connectedPixelsIndexPos); 
        
             cudaFreeHost(pixelMapping.connectedPixelsIndexNeg); 
        
             //cms::cuda::free_host(pixelMapping.connectedPixelsSizes); 
        
             //cms::cuda::free_host(pixelMapping.connectedPixelsSizesPos); 
        
             //cms::cuda::free_host(pixelMapping.connectedPixelsSizesNeg); 
        
             //cms::cuda::free_host(pixelMapping.connectedPixelsIndex); 
        
             //cms::cuda::free_host(pixelMapping.connectedPixelsIndexPos); 
        
             //cms::cuda::free_host(pixelMapping.connectedPixelsIndexNeg); 
        
           }

Any reason for this that I am missing? Otherwise, I think we should just remove the extra memory-freeing functions.

3. Finally, I would like to ask why these variables

TrackLooper/SDL/Event.cu

Lines 4 to 7 in 4d0158d

    
           struct SDL::modules* SDL::modulesInGPU = nullptr; 
        
           struct SDL::pixelMap* SDL::pixelMapping = nullptr; 
        
           uint16_t SDL::nModules; 
        
           uint16_t SDL::nLowerModules;

are defined in the global scope. I would expect them to be inside the Event class or defined in the appropriate file where they are relevant.

The text was updated successfully, but these errors were encountered:

GNiendorf · 2023-06-02T23:26:02Z

#292 Deals with issue 2 by deleting the cached version since we've agreed previously to not use the caching allocator here. Both will actually be deleted when that area of the code moves to Alpaka since the buffers are freed automatically. It also deals with issue 1, moving the allocation to module.cu.

VourMa · 2023-07-13T08:23:59Z

More instances of variables in global scope (sub-issue 3 above):

TrackLooper/SDL/Module.cu

Lines 5 to 9 in 6e6e2d0

    
           std::map <unsigned int, uint16_t> *SDL::detIdToIndex; 
        
           std::map <unsigned int, float> *SDL::module_x; 
        
           std::map <unsigned int, float> *SDL::module_y; 
        
           std::map <unsigned int, float> *SDL::module_z; 
        
           std::map <unsigned int, unsigned int> *SDL::module_type; // 23 : Ph2PSP, 24 : Ph2PSS, 25 : Ph2SS

TrackLooper/SDL/ModuleConnectionMap.cc

Lines 3 to 24 in 6e6e2d0

    
           SDL::ModuleConnectionMap SDL::moduleConnectionMap; 
        
           SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer1Subdet5; 
        
           SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer2Subdet5; 
        
           SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer3Subdet5; 
        
           SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer1Subdet4; 
        
           SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer2Subdet4; 
        
           SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer3Subdet4; 
        
           SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer4Subdet4; 
        
           SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer1Subdet5_pos; 
        
           SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer2Subdet5_pos; 
        
           SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer3Subdet5_pos; 
        
           SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer1Subdet4_pos; 
        
           SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer2Subdet4_pos; 
        
           SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer3Subdet4_pos; 
        
           SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer4Subdet4_pos; 
        
           SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer1Subdet5_neg; 
        
           SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer2Subdet5_neg; 
        
           SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer3Subdet5_neg; 
        
           SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer1Subdet4_neg; 
        
           SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer2Subdet4_neg; 
        
           SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer3Subdet4_neg; 
        
           SDL::ModuleConnectionMap SDL::moduleConnectionMap_pLStoLayer4Subdet4_neg;

GNiendorf · 2023-07-16T17:01:23Z

One more instance of variables in global scope (sub issue 3 like mentioned above):

TrackLooper/SDL/TiltedGeometry.cc

Line 3 in c58c36e

SDL::TiltedGeometry SDL::tiltedGeometry;

VourMa · 2023-07-28T18:49:05Z

@GNiendorf subissue 3 is still open but was subissue 1 dealt with during the Alpaka migration?

GNiendorf · 2023-07-28T18:49:24Z

@VourMa Yup, subissue 1 is fixed!

VourMa · 2023-08-24T15:46:09Z

Given that we found a work-around for thread-safety of global variables, should we close this issue?

slava77 · 2023-08-24T16:11:20Z

Given that we found a work-around for thread-safety of global variables, should we close this issue?

I think that call_once is a hack and we need to get a proper solution eventually.

VourMa · 2024-04-21T11:51:22Z

@slava77 @GNiendorf Is subissue 3 of this issue closed by the #377?

slava77 · 2024-04-21T13:44:28Z

@slava77 @GNiendorf Is subissue 3 of this issue closed by the #377?

do you mean

uint16_t SDL::nModules; 
 uint16_t SDL::nLowerModules;

?

Well, they are still in a "global" space in SDL::Gloabals.
I'm not sure though what the text in subissue 3 implied.

GNiendorf · 2024-05-29T02:00:37Z

I think this can be closed now, right @VourMa?

VourMa · 2024-05-29T09:00:53Z

I would say so, yes. Closing and we can reopen a new, more dedicated one if we see anymore problems.

GNiendorf linked a pull request Jun 2, 2023 that will close this issue

Move to Alpaka Memory + Alpaka 0.9 + CMSSW Alpaka Caching Allocator #292

Merged

slava77 mentioned this issue Jul 28, 2023

Segmentation violation when running multithreaded in CMSSW #311

Closed

VourMa closed this as completed May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Considerations on memory management #287

Considerations on memory management #287

VourMa commented May 10, 2023 •

edited

Loading

GNiendorf commented Jun 2, 2023 •

edited

Loading

VourMa commented Jul 13, 2023

GNiendorf commented Jul 16, 2023 •

edited

Loading

VourMa commented Jul 28, 2023

GNiendorf commented Jul 28, 2023

VourMa commented Aug 24, 2023

slava77 commented Aug 24, 2023

VourMa commented Apr 21, 2024

slava77 commented Apr 21, 2024

GNiendorf commented May 29, 2024

VourMa commented May 29, 2024

Considerations on memory management #287

Considerations on memory management #287

Comments

VourMa commented May 10, 2023 • edited Loading

GNiendorf commented Jun 2, 2023 • edited Loading

VourMa commented Jul 13, 2023

GNiendorf commented Jul 16, 2023 • edited Loading

VourMa commented Jul 28, 2023

GNiendorf commented Jul 28, 2023

VourMa commented Aug 24, 2023

slava77 commented Aug 24, 2023

VourMa commented Apr 21, 2024

slava77 commented Apr 21, 2024

GNiendorf commented May 29, 2024

VourMa commented May 29, 2024

VourMa commented May 10, 2023 •

edited

Loading

GNiendorf commented Jun 2, 2023 •

edited

Loading

GNiendorf commented Jul 16, 2023 •

edited

Loading