Skip to content

Commit

Permalink
Huge page support for composite images loaded on Linux (#37673)
Browse files Browse the repository at this point in the history
Add support for loading composite R2R images utilizing huge pages on Linux

Support is broken into 3 major portions

- Changes to the compiler to add a switch which can compile the composite image with higher than normal alignment
- Changes to the runtime to make some slight tweaks to PE file loading on Linux to support these images correctly
- Documentation on how to tie these various features together to achieve large page loading on Linux
  • Loading branch information
davidwrighton authored Jun 12, 2020
1 parent cd02b06 commit 579d883
Show file tree
Hide file tree
Showing 9 changed files with 302 additions and 23 deletions.
102 changes: 102 additions & 0 deletions docs/design/features/Linux-Hugepage-Crossgen2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
Configuring Huge Pages for loading composite binaries using CoreCLR on Linux
----

Huge pages can provide performance benefits to reduce the cost of TLB cache misses when
executing code. In general, the largest available wins may be achieved by enabling huge
pages for use by the GC, which will dominate the memory use in the process, but in some
circumstances, if the application is sufficiently large, there may be a benefit to using
huge pages to map in code.

It is expected that consumers who have these needs have very large applications, and are
able to tolerate somewhat complex solutions. CoreCLR supports loading composite R2R
images using the hugetlbfs. Doing some requires several steps.

1. The composite image must be created with a switch such as `--custom-pe-section-alignment=2097152`. This will align the PE sections in the R2R file on 2MB virtual address boundaries, and align the sections in the PE file itself on the same boundaries.
- This will increase the size of the image by up to 5 * the specified alignment. Typical increases will be more similar to 3 * the specified alignment
2. The composite image must be copied into a hugetlbfs filesystem which is visible to the .NET process instead of the composite image being loaded from the normal path.
- IMPORTANT: The composite image must NOT be located in the normal path next to the application binary, or that file will be used instead of the huge page version.
- The environment variable `COMPlus_NativeImageSearchPaths` must be set to point at the location of the hugetlbfs in use. For instance, `COMPlus_NativeImageSearchPaths` might be set to `/var/lib/hugetlbfs/user/USER/pagesize-2MB`
- As the cp command does not support copying into a hugetlbfs due to lack of support for the write syscall in that file system, a custom copy application must be used. A sample application that may be used to perform this task has a source listing in Appendix A.
3. The machine must be configured to have sufficient huge pages available in the appropriate huge page pool. The memory requirements of huge page PE loading are as follows.
- Sufficient pages to hold the unmodified copy of the composite image in the hugetlbfs. These pages will be used by the initial copy which emplaces the composite image into huge pages.
- By default the runtime will map each page of the composite image using a MAP_PRIVATE mapping. This will require that the maximum number of huge pages is large enough to hold a completely separate copy of the image as loaded.
- To reduce that cost, launch the application with the PAL_MAP_READONLY_PE_HUGE_PAGE_AS_SHARED environment variable set to 1. This environment variable will change the way that the composite image R2R files are mapped into the process to create the mappings to read only sections as MAP_SHARED mappings. This will reduce the extra huge pages needed to only be the sections marked as RW in the PE file. On a Windows machine use the link tool (`link /dump /header compositeimage.dll` to determine the number of pages needed for the these `.data` section of the PE file.)
- If the PAL_MAP_READONLY_PE_HUGE_PE_AS_SHARED is set, the number of huge pages needed is `<Count of huge pages for composite file> + <count of processes to run> * <count of huge pages needed for the .data section of the composite file>`

Appendix A - Source for a simple copy into hugetlbfs program.

```
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <unistd.h>
int main(int argc, char** argv)
{
if (argc != 3)
{
printf("Incorrect number arguments specified. Arguments are <src> <dest>");
return 1;
}
void *addrSrc, *addrDest;
int fdSrc, fdDest, ret;
fdSrc = open(argv[1], O_RDWR);
if (fdSrc < 0)
{
printf("Open src failed\n");
return 1;
}
struct stat st;
if (fstat(fdSrc, &st) < 0)
{
printf("fdSrc fstat failed\n");
return 1;
}
addrSrc = mmap(0, st.st_size, PROT_READ | PROT_WRITE, MAP_SHARED, fdSrc, 0);
if (addrSrc == MAP_FAILED)
{
printf("fdSrc mmap failed\n");
return 1;
}
fdDest = open(argv[2], O_CREAT | O_RDWR, 0755);
if (fdDest < 0)
{
printf("Open dest failed\n");
return 1;
}
if (ftruncate(fdDest, st.st_size) < 0)
{
printf("ftruncate failed\n");
return 1;
}
addrDest = mmap(0, st.st_size, PROT_READ | PROT_WRITE, MAP_SHARED, fdDest, 0);
if (addrDest == MAP_FAILED)
{
printf("fdDest mmap failed\n");
return 1;
}
memcpy(addrDest, addrSrc, st.st_size);
munmap(addrSrc, st.st_size);
munmap(addrDest, st.st_size);
close(fdSrc);
close(fdDest);
return 0;
}
```
40 changes: 34 additions & 6 deletions src/coreclr/src/pal/src/map/map.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2233,6 +2233,10 @@ void * MAPMapPEFile(HANDLE hFile, off_t offset)
bool forceRelocs = false;
char* envVar;
#endif
SIZE_T reserveSize = 0;
bool forceOveralign = false;
int readWriteFlags = MAP_FILE|MAP_PRIVATE|MAP_FIXED;
int readOnlyFlags = readWriteFlags;

ENTRY("MAPMapPEFile (hFile=%p offset=%zx)\n", hFile, offset);

Expand Down Expand Up @@ -2357,13 +2361,20 @@ void * MAPMapPEFile(HANDLE hFile, off_t offset)
// We're going to start adding mappings to the mapping list, so take the critical section
InternalEnterCriticalSection(pThread, &mapping_critsec);

reserveSize = virtualSize;
if ((ntHeader.OptionalHeader.SectionAlignment) > GetVirtualPageSize())
{
reserveSize += ntHeader.OptionalHeader.SectionAlignment;
forceOveralign = true;
}

#ifdef HOST_64BIT
// First try to reserve virtual memory using ExecutableAllocator. This allows all PE images to be
// near each other and close to the coreclr library which also allows the runtime to generate
// more efficient code (by avoiding usage of jump stubs). Alignment to a 64 KB granularity should
// not be necessary (alignment to page size should be sufficient), but see
// ExecutableMemoryAllocator::AllocateMemory() for the reason why it is done.
loadedBase = ReserveMemoryFromExecutableAllocator(pThread, ALIGN_UP(virtualSize, VIRTUAL_64KB));
loadedBase = ReserveMemoryFromExecutableAllocator(pThread, ALIGN_UP(reserveSize, VIRTUAL_64KB));
#endif // HOST_64BIT

if (loadedBase == NULL)
Expand All @@ -2384,7 +2395,7 @@ void * MAPMapPEFile(HANDLE hFile, off_t offset)
mapFlags |= MAP_JIT;
}
#endif // __APPLE__
loadedBase = mmap(usedBaseAddr, virtualSize, PROT_NONE, mapFlags, -1, 0);
loadedBase = mmap(usedBaseAddr, reserveSize, PROT_NONE, mapFlags, -1, 0);
}

if (MAP_FAILED == loadedBase)
Expand Down Expand Up @@ -2413,15 +2424,28 @@ void * MAPMapPEFile(HANDLE hFile, off_t offset)
}
#endif // _DEBUG

size_t headerSize;
headerSize = GetVirtualPageSize(); // if there are lots of sections, this could be wrong

if (forceOveralign)
{
loadedBase = ALIGN_UP(loadedBase, ntHeader.OptionalHeader.SectionAlignment);
headerSize = ntHeader.OptionalHeader.SectionAlignment;
char *mapAsShared = EnvironGetenv("PAL_MAP_READONLY_PE_HUGE_PAGE_AS_SHARED");

// If PAL_MAP_READONLY_PE_HUGE_PAGE_AS_SHARED is set to 1. map the readonly sections as shared
// which works well with the behavior of the hugetlbfs
if (mapAsShared != NULL && (strcmp(mapAsShared, "1") == 0))
readOnlyFlags = MAP_FILE|MAP_SHARED|MAP_FIXED;
}

//we have now reserved memory (potentially we got rebased). Walk the PE sections and map each part
//separately.

size_t headerSize;
headerSize = GetVirtualPageSize(); // if there are lots of sections, this could be wrong

//first, map the PE header to the first page in the image. Get pointers to the section headers
palError = MAPmmapAndRecord(pFileObject, loadedBase,
loadedBase, headerSize, PROT_READ, MAP_FILE|MAP_PRIVATE|MAP_FIXED, fd, offset,
loadedBase, headerSize, PROT_READ, readOnlyFlags, fd, offset,
(void**)&loadedHeader);
if (NO_ERROR != palError)
{
Expand Down Expand Up @@ -2501,18 +2525,22 @@ void * MAPMapPEFile(HANDLE hFile, off_t offset)
//Don't discard these sections. We need them to verify PE files
//if (currentHeader.Characteristics & IMAGE_SCN_MEM_DISCARDABLE)
// continue;
int flags = readOnlyFlags;
if (currentHeader.Characteristics & IMAGE_SCN_MEM_EXECUTE)
prot |= PROT_EXEC;
if (currentHeader.Characteristics & IMAGE_SCN_MEM_READ)
prot |= PROT_READ;
if (currentHeader.Characteristics & IMAGE_SCN_MEM_WRITE)
{
prot |= PROT_WRITE;
flags = readWriteFlags;
}

palError = MAPmmapAndRecord(pFileObject, loadedBase,
sectionBase,
currentHeader.SizeOfRawData,
prot,
MAP_FILE|MAP_PRIVATE|MAP_FIXED,
flags,
fd,
offset + currentHeader.PointerToRawData,
&sectionData);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,13 @@ internal class ReadyToRunObjectWriter
/// </summary>
private readonly MapFileBuilder _mapFileBuilder;

/// <summary>
/// If non-null, the PE file will be laid out such that it can naturally be mapped with a higher alignment than 4KB
/// This is used to support loading via large pages on Linux
/// </summary>
private readonly int? _customPESectionAlignment;


#if DEBUG
private struct NodeInfo
{
Expand All @@ -72,12 +79,13 @@ public NodeInfo(ISymbolNode node, int nodeIndex, int symbolIndex)
Dictionary<string, NodeInfo> _previouslyWrittenNodeNames = new Dictionary<string, NodeInfo>();
#endif

public ReadyToRunObjectWriter(string objectFilePath, EcmaModule componentModule, IEnumerable<DependencyNode> nodes, NodeFactory factory, bool generateMapFile)
public ReadyToRunObjectWriter(string objectFilePath, EcmaModule componentModule, IEnumerable<DependencyNode> nodes, NodeFactory factory, bool generateMapFile, int? customPESectionAlignment)
{
_objectFilePath = objectFilePath;
_componentModule = componentModule;
_nodes = nodes;
_nodeFactory = factory;
_customPESectionAlignment = customPESectionAlignment;

if (generateMapFile)
{
Expand Down Expand Up @@ -127,7 +135,8 @@ public void EmitPortableExecutable()
headerBuilder,
r2rHeaderExportSymbol,
Path.GetFileName(_objectFilePath),
getRuntimeFunctionsTable);
getRuntimeFunctionsTable,
_customPESectionAlignment);

NativeDebugDirectoryEntryNode nativeDebugDirectoryEntryNode = null;

Expand Down Expand Up @@ -270,10 +279,10 @@ private void EmitObjectData(R2RPEBuilder r2rPeBuilder, ObjectData data, int node
r2rPeBuilder.AddObjectData(data, section, name, mapFileBuilder);
}

public static void EmitObject(string objectFilePath, EcmaModule componentModule, IEnumerable<DependencyNode> nodes, NodeFactory factory, bool generateMapFile)
public static void EmitObject(string objectFilePath, EcmaModule componentModule, IEnumerable<DependencyNode> nodes, NodeFactory factory, bool generateMapFile, int? customPESectionAlignment)
{
Console.WriteLine($@"Emitting R2R PE file: {objectFilePath}");
ReadyToRunObjectWriter objectWriter = new ReadyToRunObjectWriter(objectFilePath, componentModule, nodes, factory, generateMapFile);
ReadyToRunObjectWriter objectWriter = new ReadyToRunObjectWriter(objectFilePath, componentModule, nodes, factory, generateMapFile, customPESectionAlignment);
objectWriter.EmitPortableExecutable();
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,7 @@ public sealed class ReadyToRunCodegenCompilation : Compilation

public ReadyToRunSymbolNodeFactory SymbolNodeFactory { get; }
public ReadyToRunCompilationModuleGroupBase CompilationModuleGroup { get; }
private readonly int? _customPESectionAlignment;

internal ReadyToRunCodegenCompilation(
DependencyAnalyzerBase<NodeFactory> dependencyGraph,
Expand All @@ -248,7 +249,8 @@ internal ReadyToRunCodegenCompilation(
int parallelism,
ProfileDataManager profileData,
ReadyToRunMethodLayoutAlgorithm methodLayoutAlgorithm,
ReadyToRunFileLayoutAlgorithm fileLayoutAlgorithm)
ReadyToRunFileLayoutAlgorithm fileLayoutAlgorithm,
int? customPESectionAlignment)
: base(
dependencyGraph,
nodeFactory,
Expand All @@ -262,6 +264,7 @@ internal ReadyToRunCodegenCompilation(
_resilient = resilient;
_parallelism = parallelism;
_generateMapFile = generateMapFile;
_customPESectionAlignment = customPESectionAlignment;
SymbolNodeFactory = new ReadyToRunSymbolNodeFactory(nodeFactory);
_corInfoImpls = new ConditionalWeakTable<Thread, CorInfoImpl>();
_inputFiles = inputFiles;
Expand Down Expand Up @@ -290,7 +293,7 @@ public override void Compile(string outputFile)
using (PerfEventSource.StartStopEvents.EmittingEvents())
{
NodeFactory.SetMarkingComplete();
ReadyToRunObjectWriter.EmitObject(outputFile, componentModule: null, nodes, NodeFactory, _generateMapFile);
ReadyToRunObjectWriter.EmitObject(outputFile, componentModule: null, nodes, NodeFactory, _generateMapFile, _customPESectionAlignment);
CompilationModuleGroup moduleGroup = _nodeFactory.CompilationModuleGroup;

if (moduleGroup.IsCompositeBuildMode)
Expand Down Expand Up @@ -339,7 +342,7 @@ private void RewriteComponentFile(string inputFile, string outputFile, string ow
}
componentGraph.ComputeMarkedNodes();
componentFactory.Header.Add(Internal.Runtime.ReadyToRunSectionType.OwnerCompositeExecutable, ownerExecutableNode, ownerExecutableNode);
ReadyToRunObjectWriter.EmitObject(outputFile, componentModule: inputModule, componentGraph.MarkedNodeList, componentFactory, generateMapFile: false);
ReadyToRunObjectWriter.EmitObject(outputFile, componentModule: inputModule, componentGraph.MarkedNodeList, componentFactory, generateMapFile: false, customPESectionAlignment: null);
}

public override void WriteDependencyLog(string outputFileName)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ public sealed class ReadyToRunCodegenCompilationBuilder : CompilationBuilder
private ProfileDataManager _profileData;
private ReadyToRunMethodLayoutAlgorithm _r2rMethodLayoutAlgorithm;
private ReadyToRunFileLayoutAlgorithm _r2rFileLayoutAlgorithm;
private int? _customPESectionAlignment;

private string _jitPath;
private string _outputFile;
Expand Down Expand Up @@ -137,6 +138,12 @@ public ReadyToRunCodegenCompilationBuilder GenerateOutputFile(string outputFile)
return this;
}

public ReadyToRunCodegenCompilationBuilder UseCustomPESectionAlignment(int? customPESectionAlignment)
{
_customPESectionAlignment = customPESectionAlignment;
return this;
}

public override ICompilation ToCompilation()
{
// TODO: only copy COR headers for single-assembly build and for composite build with embedded MSIL
Expand Down Expand Up @@ -223,7 +230,8 @@ public override ICompilation ToCompilation()
_parallelism,
_profileData,
_r2rMethodLayoutAlgorithm,
_r2rFileLayoutAlgorithm);
_r2rFileLayoutAlgorithm,
_customPESectionAlignment);
}
}
}
Loading

0 comments on commit 579d883

Please sign in to comment.