Skip to content

Conversation

@simpal01
Copy link
Contributor

@simpal01 simpal01 commented Dec 17, 2025

When targeting architectures that do not support unaligned memory accesses or when explictly pass -mno-unaligned-access, it requires the compiler to expand each unaligned load/store into an inline sequences. For 32-bit operations this typically involves:

1. 4× LDRB (or 2× LDRH),
2. multiple shift/or instructions

These sequences are emitted at every unaligned access site, and therefore contribute significant code size in workloads that touch packed or misaligned structures.

When compiling with -Os or -Oz in combination with -mno-unaligned-access, this patch lowers unaligned 32 bit and 64 bit loads and stores to below AEABI heper calls:

         aeabi_uread4
	aeabi_uread8
	aeabi_uwrite4
	aeabi_uwrite8

And it provide a way to perform unaligned memory accesses on targets that do not support them, such as ARMv6-M or when compiling with -mno-unaligned-access. Although each use introduces a function call making it less straightforward than using raw loads and stores the call itself is often much smaller than the compiler emitted sequence of multiple ldrb/strb operations. As a result, these helpers can greatly reduce code-size providing they are invoked more than once across a program.

  1. Functions become smaller in AEABI mode once they contain more than a few unaligned accesses.
  2. The total image .text size becomes smaller whenever multiple functions call the same helpers.

This PR is derived from https://reviews.llvm.org/D57595, with some minor changes.
Co-authored-by: David Green

When targeting architectures that do not support unaligned
memory accesses or when explictly pass -mno-unaligned-access,
it requires the compiler to expand each unaligned load/store
into an inline sequences. For 32-bit operations this
typically involves:

	1. 4× LDRB (or 2× LDRH),
	2. multiple shift/or instructions

These sequences are emitted at every unaligned access site, and
therefore contribute significant code size in workloads that touch
packed or misaligned structures.

When compiling with -Os or -Oz in combination with
-mno-unaligned-access, this patch lowers unaligned 32 bit and 64 bit
loads and stores to below AEABI heper calls:
        aeabi_uread4
	aeabi_uread8
	aeabi_uwrite4
	aeabi_uwrite8

And it provide a way to perform unaligned memory accesses on
targets that do not support them, such as ARMv6-M or when
compiling with -mno-unaligned-access. Although each use
introduces a function call making it less straightforward
than using raw loads and stores the call itself is often
much smaller than the compiler emitted sequence of multiple
ldrb/strb operations. As a result, these helpers can greatly
reduce code-size providing they are invoked more than
once across a program.

1. Functions become smaller in AEABI mode once they contain more
   than a few unaligned accesses.
2. The total image .text size becomes smaller whenever multiple
   functions call the same helpers.

This PR is derived from https://reviews.llvm.org/D57595, with additional changes.
Co-authored-by: David Green
@llvmbot
Copy link
Member

llvmbot commented Dec 17, 2025

@llvm/pr-subscribers-backend-arm

Author: Simi Pallipurath (simpal01)

Changes

When targeting architectures that do not support unaligned memory accesses or when explictly pass -mno-unaligned-access, it requires the compiler to expand each unaligned load/store into an inline sequences. For 32-bit operations this typically involves:

1. 4× LDRB (or 2× LDRH),
2. multiple shift/or instructions

These sequences are emitted at every unaligned access site, and therefore contribute significant code size in workloads that touch packed or misaligned structures.

When compiling with -Os or -Oz in combination with -mno-unaligned-access, this patch lowers unaligned 32 bit and 64 bit loads and stores to below AEABI heper calls:
aeabi_uread4
aeabi_uread8
aeabi_uwrite4
aeabi_uwrite8

And it provide a way to perform unaligned memory accesses on targets that do not support them, such as ARMv6-M or when compiling with -mno-unaligned-access. Although each use introduces a function call making it less straightforward than using raw loads and stores the call itself is often much smaller than the compiler emitted sequence of multiple ldrb/strb operations. As a result, these helpers can greatly reduce code-size providing they are invoked more than once across a program.

  1. Functions become smaller in AEABI mode once they contain more than a few unaligned accesses.
  2. The total image .text size becomes smaller whenever multiple functions call the same helpers.

This PR is derived from https://reviews.llvm.org/D57595, with some minor changes.
Co-authored-by: David Green


Patch is 33.02 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/172672.diff

4 Files Affected:

  • (modified) llvm/lib/Target/ARM/ARMISelLowering.cpp (+174-7)
  • (modified) llvm/lib/Target/ARM/ARMISelLowering.h (+5-1)
  • (modified) llvm/test/CodeGen/ARM/i64_volatile_load_store.ll (+59-63)
  • (added) llvm/test/CodeGen/ARM/unaligned_load_store_aeabi.ll (+425)
diff --git a/llvm/lib/Target/ARM/ARMISelLowering.cpp b/llvm/lib/Target/ARM/ARMISelLowering.cpp
index f28640ce7b107..f9d1c8f451f4c 100644
--- a/llvm/lib/Target/ARM/ARMISelLowering.cpp
+++ b/llvm/lib/Target/ARM/ARMISelLowering.cpp
@@ -993,6 +993,14 @@ ARMTargetLowering::ARMTargetLowering(const TargetMachine &TM_,
     setIndexedStoreAction(ISD::POST_INC, MVT::i32,  Legal);
   }
 
+  // Custom loads/stores to possible use __aeabi_uread/write*
+  if (Subtarget->isTargetAEABI() && !Subtarget->allowsUnalignedMem()) {
+    setOperationAction(ISD::STORE, MVT::i32, Custom);
+    setOperationAction(ISD::STORE, MVT::i64, Custom);
+    setOperationAction(ISD::LOAD, MVT::i32, Custom);
+    setOperationAction(ISD::LOAD, MVT::i64, Custom);
+  }
+
   setOperationAction(ISD::SADDO, MVT::i32, Custom);
   setOperationAction(ISD::UADDO, MVT::i32, Custom);
   setOperationAction(ISD::SSUBO, MVT::i32, Custom);
@@ -10012,6 +10020,130 @@ void ARMTargetLowering::ExpandDIV_Windows(
   Results.push_back(DAG.getNode(ISD::BUILD_PAIR, dl, MVT::i64, Lower, Upper));
 }
 
+std::pair<SDValue, SDValue>
+ARMTargetLowering::LowerAEABIUnalignedLoad(SDValue Op,
+                                           SelectionDAG &DAG) const {
+  // If we have an unaligned load from a i32 or i64 that would normally be
+  // split into separate ldrb's, we can use the __aeabi_uread4/__aeabi_uread8
+  // functions instead.
+  LoadSDNode *LD = cast<LoadSDNode>(Op.getNode());
+  EVT MemVT = LD->getMemoryVT();
+  if (MemVT != MVT::i32 && MemVT != MVT::i64)
+    return std::make_pair(SDValue(), SDValue());
+
+  const auto &MF = DAG.getMachineFunction();
+  unsigned AS = LD->getAddressSpace();
+  Align Alignment = LD->getAlign();
+  const DataLayout &DL = DAG.getDataLayout();
+  bool AllowsUnaligned = Subtarget->allowsUnalignedMem();
+
+  const char *LibcallName = nullptr;
+  if ((MF.getFunction().hasMinSize() || MF.getFunction().hasOptSize()) &&
+      !AllowsUnaligned) {
+    if (MemVT == MVT::i32 && Alignment <= llvm::Align(2))
+      LibcallName = "__aeabi_uread4";
+    else if (MemVT == MVT::i64 && Alignment <= llvm::Align(2))
+      LibcallName = "__aeabi_uread8";
+  }
+
+  if (LibcallName) {
+    LLVM_DEBUG(dbgs() << "Expanding unsupported unaligned load to "
+                      << LibcallName << "\n");
+    CallingConv::ID CC = CallingConv::ARM_AAPCS;
+    SDValue Callee = DAG.getExternalSymbol(LibcallName, getPointerTy(DL));
+    TargetLowering::ArgListTy Args;
+    TargetLowering::ArgListEntry Entry(
+        LD->getBasePtr(),
+        LD->getBasePtr().getValueType().getTypeForEVT(*DAG.getContext()));
+    SDLoc dl(Op);
+
+    Args.push_back(Entry);
+
+    Type *RetTy = MemVT.getTypeForEVT(*DAG.getContext());
+    TargetLowering::CallLoweringInfo CLI(DAG);
+    CLI.setDebugLoc(dl)
+        .setChain(LD->getChain())
+        .setCallee(CC, RetTy, Callee, std::move(Args));
+    auto Pair = LowerCallTo(CLI);
+
+    // If necessary, extend the node to 64bit
+    if (LD->getExtensionType() != ISD::NON_EXTLOAD) {
+      unsigned ExtType = LD->getExtensionType() == ISD::SEXTLOAD
+                             ? ISD::SIGN_EXTEND
+                             : ISD::ZERO_EXTEND;
+      SDValue EN = DAG.getNode(ExtType, dl, LD->getValueType(0), Pair.first);
+      Pair.first = EN;
+    }
+    return Pair;
+  }
+
+  // Default expand to individual loads
+  if (!allowsMemoryAccess(*DAG.getContext(), DL, MemVT, AS, Alignment))
+    return expandUnalignedLoad(LD, DAG);
+  return std::make_pair(SDValue(), SDValue());
+}
+
+SDValue ARMTargetLowering::LowerAEABIUnalignedStore(SDValue Op,
+                                                    SelectionDAG &DAG) const {
+  // If we have an unaligned store to a i32 or i64 that would normally be
+  // split into separate ldrb's, we can use the __aeabi_uwrite4/__aeabi_uwrite8
+  // functions instead.
+  StoreSDNode *ST = cast<StoreSDNode>(Op.getNode());
+  EVT MemVT = ST->getMemoryVT();
+  if (MemVT != MVT::i32 && MemVT != MVT::i64)
+    return SDValue();
+
+  const auto &MF = DAG.getMachineFunction();
+  unsigned AS = ST->getAddressSpace();
+  Align Alignment = ST->getAlign();
+  const DataLayout &DL = DAG.getDataLayout();
+  bool AllowsUnaligned = Subtarget->allowsUnalignedMem();
+
+  const char *LibcallName = nullptr;
+  if ((MF.getFunction().hasMinSize() || MF.getFunction().hasOptSize()) &&
+      !AllowsUnaligned) {
+    if (MemVT == MVT::i32 && Alignment <= llvm::Align(2))
+      LibcallName = "__aeabi_uwrite4";
+    else if (MemVT == MVT::i64 && Alignment <= llvm::Align(2))
+      LibcallName = "__aeabi_uwrite8";
+  }
+
+  if (LibcallName) {
+    LLVM_DEBUG(dbgs() << "Expanding unsupported unaligned store to "
+                      << LibcallName << "\n");
+    CallingConv::ID CC = CallingConv::ARM_AAPCS;
+    SDValue Callee = DAG.getExternalSymbol(LibcallName, getPointerTy(DL));
+    TargetLowering::ArgListTy Args;
+    SDLoc dl(Op);
+
+    // If necessary, trunc the value to 32bit
+    SDValue StoreVal = ST->getOperand(1);
+    if (ST->isTruncatingStore())
+      StoreVal = DAG.getNode(ISD::TRUNCATE, dl, MemVT, ST->getOperand(1));
+
+    TargetLowering::ArgListEntry Entry(
+        StoreVal, StoreVal.getValueType().getTypeForEVT(*DAG.getContext()));
+    Args.push_back(Entry);
+
+    Entry.Node = ST->getBasePtr();
+    Entry.Ty = ST->getBasePtr().getValueType().getTypeForEVT(*DAG.getContext());
+    Args.push_back(Entry);
+
+    Type *RetTy = Type::getVoidTy(*DAG.getContext());
+    TargetLowering::CallLoweringInfo CLI(DAG);
+    CLI.setDebugLoc(dl)
+        .setChain(ST->getChain())
+        .setCallee(CC, RetTy, Callee, std::move(Args));
+    std::pair<SDValue, SDValue> CallResult = LowerCallTo(CLI);
+    return CallResult.second;
+  }
+
+  // Default expand to individual stores
+  if (!allowsMemoryAccess(*DAG.getContext(), DL, MemVT, AS, Alignment))
+    return expandUnalignedStore(ST, DAG);
+  return SDValue();
+}
+
 static SDValue LowerPredicateLoad(SDValue Op, SelectionDAG &DAG) {
   LoadSDNode *LD = cast<LoadSDNode>(Op.getNode());
   EVT MemVT = LD->getMemoryVT();
@@ -10054,11 +10186,11 @@ void ARMTargetLowering::LowerLOAD(SDNode *N, SmallVectorImpl<SDValue> &Results,
                                   SelectionDAG &DAG) const {
   LoadSDNode *LD = cast<LoadSDNode>(N);
   EVT MemVT = LD->getMemoryVT();
-  assert(LD->isUnindexed() && "Loads should be unindexed at this point.");
 
   if (MemVT == MVT::i64 && Subtarget->hasV5TEOps() &&
       !Subtarget->isThumb1Only() && LD->isVolatile() &&
       LD->getAlign() >= Subtarget->getDualLoadStoreAlignment()) {
+    assert(LD->isUnindexed() && "Loads should be unindexed at this point.");
     SDLoc dl(N);
     SDValue Result = DAG.getMemIntrinsicNode(
         ARMISD::LDRD, dl, DAG.getVTList({MVT::i32, MVT::i32, MVT::Other}),
@@ -10067,6 +10199,12 @@ void ARMTargetLowering::LowerLOAD(SDNode *N, SmallVectorImpl<SDValue> &Results,
     SDValue Hi = Result.getValue(DAG.getDataLayout().isLittleEndian() ? 1 : 0);
     SDValue Pair = DAG.getNode(ISD::BUILD_PAIR, dl, MVT::i64, Lo, Hi);
     Results.append({Pair, Result.getValue(2)});
+  } else if ((MemVT == MVT::i32 || MemVT == MVT::i64)) {
+    auto Pair = LowerAEABIUnalignedLoad(SDValue(N, 0), DAG);
+    if (Pair.first) {
+      Results.push_back(Pair.first);
+      Results.push_back(Pair.second);
+    }
   }
 }
 
@@ -10108,15 +10246,15 @@ static SDValue LowerPredicateStore(SDValue Op, SelectionDAG &DAG) {
       ST->getMemOperand());
 }
 
-static SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG,
-                          const ARMSubtarget *Subtarget) {
+SDValue ARMTargetLowering::LowerSTORE(SDValue Op, SelectionDAG &DAG,
+                                      const ARMSubtarget *Subtarget) const {
   StoreSDNode *ST = cast<StoreSDNode>(Op.getNode());
   EVT MemVT = ST->getMemoryVT();
-  assert(ST->isUnindexed() && "Stores should be unindexed at this point.");
 
   if (MemVT == MVT::i64 && Subtarget->hasV5TEOps() &&
       !Subtarget->isThumb1Only() && ST->isVolatile() &&
       ST->getAlign() >= Subtarget->getDualLoadStoreAlignment()) {
+    assert(ST->isUnindexed() && "Stores should be unindexed at this point.");
     SDNode *N = Op.getNode();
     SDLoc dl(N);
 
@@ -10136,8 +10274,9 @@ static SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG,
              ((MemVT == MVT::v2i1 || MemVT == MVT::v4i1 || MemVT == MVT::v8i1 ||
                MemVT == MVT::v16i1))) {
     return LowerPredicateStore(Op, DAG);
+  } else if ((MemVT == MVT::i32 || MemVT == MVT::i64)) {
+    return LowerAEABIUnalignedStore(Op, DAG);
   }
-
   return SDValue();
 }
 
@@ -10669,8 +10808,19 @@ SDValue ARMTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
   case ISD::UADDSAT:
   case ISD::USUBSAT:
     return LowerADDSUBSAT(Op, DAG, Subtarget);
-  case ISD::LOAD:
-    return LowerPredicateLoad(Op, DAG);
+  case ISD::LOAD: {
+    auto *LD = cast<LoadSDNode>(Op);
+    EVT MemVT = LD->getMemoryVT();
+    if (Subtarget->hasMVEIntegerOps() &&
+        ((MemVT == MVT::v2i1 || MemVT == MVT::v4i1 || MemVT == MVT::v8i1 ||
+          MemVT == MVT::v16i1)))
+      return LowerPredicateLoad(Op, DAG);
+
+    auto Pair = LowerAEABIUnalignedLoad(Op, DAG);
+    if (Pair.first)
+      return DAG.getMergeValues({Pair.first, Pair.second}, SDLoc(Pair.first));
+    return SDValue();
+  }
   case ISD::STORE:
     return LowerSTORE(Op, DAG, Subtarget);
   case ISD::MLOAD:
@@ -10811,6 +10961,9 @@ void ARMTargetLowering::ReplaceNodeResults(SDNode *N,
   case ISD::LOAD:
     LowerLOAD(N, Results, DAG);
     break;
+  case ISD::STORE:
+    Res = LowerAEABIUnalignedStore(SDValue(N, 0), DAG);
+    break;
   case ISD::TRUNCATE:
     Res = LowerTruncate(N, DAG, Subtarget);
     break;
@@ -19859,31 +20012,45 @@ ARMTargetLowering::getPreIndexedAddressParts(SDNode *N, SDValue &Base,
   EVT VT;
   SDValue Ptr;
   Align Alignment;
+  unsigned AS = 0;
   bool isSEXTLoad = false;
   bool IsMasked = false;
   if (LoadSDNode *LD = dyn_cast<LoadSDNode>(N)) {
     Ptr = LD->getBasePtr();
     VT = LD->getMemoryVT();
     Alignment = LD->getAlign();
+    AS = LD->getAddressSpace();
     isSEXTLoad = LD->getExtensionType() == ISD::SEXTLOAD;
   } else if (StoreSDNode *ST = dyn_cast<StoreSDNode>(N)) {
     Ptr = ST->getBasePtr();
     VT = ST->getMemoryVT();
     Alignment = ST->getAlign();
+    AS = ST->getAddressSpace();
   } else if (MaskedLoadSDNode *LD = dyn_cast<MaskedLoadSDNode>(N)) {
     Ptr = LD->getBasePtr();
     VT = LD->getMemoryVT();
     Alignment = LD->getAlign();
+    AS = LD->getAddressSpace();
     isSEXTLoad = LD->getExtensionType() == ISD::SEXTLOAD;
     IsMasked = true;
   } else if (MaskedStoreSDNode *ST = dyn_cast<MaskedStoreSDNode>(N)) {
     Ptr = ST->getBasePtr();
     VT = ST->getMemoryVT();
     Alignment = ST->getAlign();
+    AS = ST->getAddressSpace();
     IsMasked = true;
   } else
     return false;
 
+  unsigned Fast = 0;
+  if (!allowsMisalignedMemoryAccesses(VT, AS, Alignment,
+                                      MachineMemOperand::MONone, &Fast)) {
+    // Only generate post-increment or pre-increment forms when a real
+    // hardware instruction exists for them. Do not emit postinc/preinc
+    // if the operation will end up as a libcall.
+    return false;
+  }
+
   bool isInc;
   bool isLegal = false;
   if (VT.isVector())
diff --git a/llvm/lib/Target/ARM/ARMISelLowering.h b/llvm/lib/Target/ARM/ARMISelLowering.h
index bc2fec3c1bdb5..ae93fdf6d619b 100644
--- a/llvm/lib/Target/ARM/ARMISelLowering.h
+++ b/llvm/lib/Target/ARM/ARMISelLowering.h
@@ -919,10 +919,14 @@ class VectorType;
     SDValue LowerSPONENTRY(SDValue Op, SelectionDAG &DAG) const;
     void LowerLOAD(SDNode *N, SmallVectorImpl<SDValue> &Results,
                    SelectionDAG &DAG) const;
+    SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG,
+                       const ARMSubtarget *Subtarget) const;
+    std::pair<SDValue, SDValue>
+    LowerAEABIUnalignedLoad(SDValue Op, SelectionDAG &DAG) const;
+    SDValue LowerAEABIUnalignedStore(SDValue Op, SelectionDAG &DAG) const;
     SDValue LowerFP_TO_BF16(SDValue Op, SelectionDAG &DAG) const;
     SDValue LowerCMP(SDValue Op, SelectionDAG &DAG) const;
     SDValue LowerABS(SDValue Op, SelectionDAG &DAG) const;
-
     Register getRegisterByName(const char* RegName, LLT VT,
                                const MachineFunction &MF) const override;
 
diff --git a/llvm/test/CodeGen/ARM/i64_volatile_load_store.ll b/llvm/test/CodeGen/ARM/i64_volatile_load_store.ll
index ca5fd2bc14f40..125326fd754fa 100644
--- a/llvm/test/CodeGen/ARM/i64_volatile_load_store.ll
+++ b/llvm/test/CodeGen/ARM/i64_volatile_load_store.ll
@@ -121,23 +121,22 @@ define void @test_unaligned() {
 ; CHECK-ARMV5TE-NEXT:    push {r4, r5, r6, lr}
 ; CHECK-ARMV5TE-NEXT:    ldr r0, .LCPI1_0
 ; CHECK-ARMV5TE-NEXT:    ldr r6, .LCPI1_1
-; CHECK-ARMV5TE-NEXT:    mov r1, r0
-; CHECK-ARMV5TE-NEXT:    ldrb lr, [r1, #4]!
-; CHECK-ARMV5TE-NEXT:    ldrb r3, [r1, #2]
-; CHECK-ARMV5TE-NEXT:    ldrb r12, [r1, #3]
-; CHECK-ARMV5TE-NEXT:    ldrb r1, [r0]
-; CHECK-ARMV5TE-NEXT:    ldrb r2, [r0, #1]
-; CHECK-ARMV5TE-NEXT:    ldrb r4, [r0, #2]
-; CHECK-ARMV5TE-NEXT:    ldrb r5, [r0, #3]
-; CHECK-ARMV5TE-NEXT:    ldrb r0, [r0, #5]
-; CHECK-ARMV5TE-NEXT:    strb r0, [r6, #5]
-; CHECK-ARMV5TE-NEXT:    strb r4, [r6, #2]
-; CHECK-ARMV5TE-NEXT:    strb r5, [r6, #3]
-; CHECK-ARMV5TE-NEXT:    strb r1, [r6]
-; CHECK-ARMV5TE-NEXT:    strb r2, [r6, #1]
-; CHECK-ARMV5TE-NEXT:    strb lr, [r6, #4]!
+; CHECK-ARMV5TE-NEXT:    ldrb r12, [r0]
+; CHECK-ARMV5TE-NEXT:    ldrb lr, [r0, #1]
+; CHECK-ARMV5TE-NEXT:    ldrb r3, [r0, #2]
+; CHECK-ARMV5TE-NEXT:    ldrb r1, [r0, #3]
+; CHECK-ARMV5TE-NEXT:    ldrb r2, [r0, #5]
+; CHECK-ARMV5TE-NEXT:    ldrb r4, [r0, #4]
+; CHECK-ARMV5TE-NEXT:    ldrb r5, [r0, #7]
+; CHECK-ARMV5TE-NEXT:    ldrb r0, [r0, #6]
+; CHECK-ARMV5TE-NEXT:    strb r0, [r6, #6]
+; CHECK-ARMV5TE-NEXT:    strb r5, [r6, #7]
+; CHECK-ARMV5TE-NEXT:    strb r4, [r6, #4]
+; CHECK-ARMV5TE-NEXT:    strb r2, [r6, #5]
 ; CHECK-ARMV5TE-NEXT:    strb r3, [r6, #2]
-; CHECK-ARMV5TE-NEXT:    strb r12, [r6, #3]
+; CHECK-ARMV5TE-NEXT:    strb r1, [r6, #3]
+; CHECK-ARMV5TE-NEXT:    strb r12, [r6]
+; CHECK-ARMV5TE-NEXT:    strb lr, [r6, #1]
 ; CHECK-ARMV5TE-NEXT:    pop {r4, r5, r6, pc}
 ; CHECK-ARMV5TE-NEXT:    .p2align 2
 ; CHECK-ARMV5TE-NEXT:  @ %bb.1:
@@ -164,23 +163,22 @@ define void @test_unaligned() {
 ; CHECK-ARMV4T-NEXT:    push {r4, r5, r6, lr}
 ; CHECK-ARMV4T-NEXT:    ldr r0, .LCPI1_0
 ; CHECK-ARMV4T-NEXT:    ldr r6, .LCPI1_1
-; CHECK-ARMV4T-NEXT:    mov r1, r0
-; CHECK-ARMV4T-NEXT:    ldrb lr, [r1, #4]!
-; CHECK-ARMV4T-NEXT:    ldrb r3, [r1, #2]
-; CHECK-ARMV4T-NEXT:    ldrb r12, [r1, #3]
-; CHECK-ARMV4T-NEXT:    ldrb r1, [r0]
-; CHECK-ARMV4T-NEXT:    ldrb r2, [r0, #1]
-; CHECK-ARMV4T-NEXT:    ldrb r4, [r0, #2]
-; CHECK-ARMV4T-NEXT:    ldrb r5, [r0, #3]
-; CHECK-ARMV4T-NEXT:    ldrb r0, [r0, #5]
-; CHECK-ARMV4T-NEXT:    strb r0, [r6, #5]
-; CHECK-ARMV4T-NEXT:    strb r4, [r6, #2]
-; CHECK-ARMV4T-NEXT:    strb r5, [r6, #3]
-; CHECK-ARMV4T-NEXT:    strb r1, [r6]
-; CHECK-ARMV4T-NEXT:    strb r2, [r6, #1]
-; CHECK-ARMV4T-NEXT:    strb lr, [r6, #4]!
+; CHECK-ARMV4T-NEXT:    ldrb r12, [r0]
+; CHECK-ARMV4T-NEXT:    ldrb lr, [r0, #1]
+; CHECK-ARMV4T-NEXT:    ldrb r3, [r0, #2]
+; CHECK-ARMV4T-NEXT:    ldrb r1, [r0, #3]
+; CHECK-ARMV4T-NEXT:    ldrb r2, [r0, #5]
+; CHECK-ARMV4T-NEXT:    ldrb r4, [r0, #4]
+; CHECK-ARMV4T-NEXT:    ldrb r5, [r0, #7]
+; CHECK-ARMV4T-NEXT:    ldrb r0, [r0, #6]
+; CHECK-ARMV4T-NEXT:    strb r0, [r6, #6]
+; CHECK-ARMV4T-NEXT:    strb r5, [r6, #7]
+; CHECK-ARMV4T-NEXT:    strb r4, [r6, #4]
+; CHECK-ARMV4T-NEXT:    strb r2, [r6, #5]
 ; CHECK-ARMV4T-NEXT:    strb r3, [r6, #2]
-; CHECK-ARMV4T-NEXT:    strb r12, [r6, #3]
+; CHECK-ARMV4T-NEXT:    strb r1, [r6, #3]
+; CHECK-ARMV4T-NEXT:    strb r12, [r6]
+; CHECK-ARMV4T-NEXT:    strb lr, [r6, #1]
 ; CHECK-ARMV4T-NEXT:    pop {r4, r5, r6, lr}
 ; CHECK-ARMV4T-NEXT:    bx lr
 ; CHECK-ARMV4T-NEXT:    .p2align 2
@@ -210,23 +208,22 @@ define void @test_unaligned() {
 ; CHECK-ARMV7-STRICT-NEXT:    movw r6, :lower16:y_unaligned
 ; CHECK-ARMV7-STRICT-NEXT:    movt r0, :upper16:x_unaligned
 ; CHECK-ARMV7-STRICT-NEXT:    movt r6, :upper16:y_unaligned
-; CHECK-ARMV7-STRICT-NEXT:    mov r1, r0
-; CHECK-ARMV7-STRICT-NEXT:    ldrb r12, [r1, #4]!
-; CHECK-ARMV7-STRICT-NEXT:    ldrb r3, [r0]
+; CHECK-ARMV7-STRICT-NEXT:    ldrb r12, [r0]
 ; CHECK-ARMV7-STRICT-NEXT:    ldrb lr, [r0, #1]
-; CHECK-ARMV7-STRICT-NEXT:    ldrb r2, [r0, #2]
-; CHECK-ARMV7-STRICT-NEXT:    ldrb r4, [r0, #3]
-; CHECK-ARMV7-STRICT-NEXT:    ldrb r0, [r0, #5]
-; CHECK-ARMV7-STRICT-NEXT:    ldrb r5, [r1, #2]
-; CHECK-ARMV7-STRICT-NEXT:    ldrb r1, [r1, #3]
-; CHECK-ARMV7-STRICT-NEXT:    strb r0, [r6, #5]
-; CHECK-ARMV7-STRICT-NEXT:    strb r2, [r6, #2]
-; CHECK-ARMV7-STRICT-NEXT:    strb r4, [r6, #3]
-; CHECK-ARMV7-STRICT-NEXT:    strb r3, [r6]
-; CHECK-ARMV7-STRICT-NEXT:    strb lr, [r6, #1]
-; CHECK-ARMV7-STRICT-NEXT:    strb r12, [r6, #4]!
-; CHECK-ARMV7-STRICT-NEXT:    strb r5, [r6, #2]
+; CHECK-ARMV7-STRICT-NEXT:    ldrb r3, [r0, #2]
+; CHECK-ARMV7-STRICT-NEXT:    ldrb r1, [r0, #3]
+; CHECK-ARMV7-STRICT-NEXT:    ldrb r2, [r0, #5]
+; CHECK-ARMV7-STRICT-NEXT:    ldrb r4, [r0, #4]
+; CHECK-ARMV7-STRICT-NEXT:    ldrb r5, [r0, #7]
+; CHECK-ARMV7-STRICT-NEXT:    ldrb r0, [r0, #6]
+; CHECK-ARMV7-STRICT-NEXT:    strb r0, [r6, #6]
+; CHECK-ARMV7-STRICT-NEXT:    strb r5, [r6, #7]
+; CHECK-ARMV7-STRICT-NEXT:    strb r4, [r6, #4]
+; CHECK-ARMV7-STRICT-NEXT:    strb r2, [r6, #5]
+; CHECK-ARMV7-STRICT-NEXT:    strb r3, [r6, #2]
 ; CHECK-ARMV7-STRICT-NEXT:    strb r1, [r6, #3]
+; CHECK-ARMV7-STRICT-NEXT:    strb r12, [r6]
+; CHECK-ARMV7-STRICT-NEXT:    strb lr, [r6, #1]
 ; CHECK-ARMV7-STRICT-NEXT:    pop {r4, r5, r6, pc}
 ;
 ; CHECK-ARMV6-LABEL: test_unaligned:
@@ -251,23 +248,22 @@ define void @test_unaligned() {
 ; CHECK-ARMV6-STRICT-NEXT:    push {r4, r5, r6, lr}
 ; CHECK-ARMV6-STRICT-NEXT:    ldr r0, .LCPI1_0
 ; CHECK-ARMV6-STRICT-NEXT:    ldr r6, .LCPI1_1
-; CHECK-ARMV6-STRICT-NEXT:    mov r1, r0
-; CHECK-ARMV6-STRICT-NEXT:    ldrb lr, [r1, #4]!
-; CHECK-ARMV6-STRICT-NEXT:    ldrb r3, [r1, #2]
-; CHECK-ARMV6-STRICT-NEXT:    ldrb r12, [r1, #3]
-; CHECK-ARMV6-STRICT-NEXT:    ldrb r1, [r0]
-; CHECK-ARMV6-STRICT-NEXT:    ldrb r2, [r0, #1]
-; CHECK-ARMV6-STRICT-NEXT:    ldrb r4, [r0, #2]
-; CHECK-ARMV6-STRICT-NEXT:    ldrb r5, [r0, #3]
-; CHECK-ARMV6-STRICT-NEXT:    ldrb r0, [r0, #5]
-; CHECK-ARMV6-STRICT-NEXT:    strb r0, [r6, #5]
-; CHECK-ARMV6-STRICT-NEXT:    strb r4, [r6, #2]
-; CHECK-ARMV6-STRICT-NEXT:    strb r5, [r6, #3]
-; CHECK-ARMV6-STRICT-NEXT:    strb r1, [r6]
-; CHECK-ARMV6-STRICT-NEXT:    strb r2, [r6, #1]
-; CHECK-ARMV6-STRICT-NEXT:    strb lr, [r6, #4]!
+; CHECK-ARMV6-STRICT-NEXT:    ldrb r12, [r0]
+; CHECK-ARMV6-STRICT-NEXT:    ldrb lr, [r0, #1]
+; CHECK-ARMV6-STRICT-NEXT:    ldrb r3, [r0, #2]
+; CHECK-ARMV6-STRICT-NEXT:    ldrb r1, [r0, #3]
+; CHECK-ARMV6-STRICT-NEXT:    ldrb r2, [r0, #5]
+; CHECK-ARMV6-STRICT-NEXT:    ldrb r4, [r0, #4]
+; CHECK-ARMV6-STRICT-NEXT:    ldrb r5, [r0, #7]
+; CHECK-ARMV6-STRICT-NEXT:    ldrb r0, [r0, #6]
+; CHECK-ARMV6-STRICT-NEXT:    strb r0, [r6, #6]
+; CHECK-ARMV6-STRICT-NEXT:    strb r5, [r6, #7]
+; CHECK-ARMV6-STRICT-NEXT:    strb r4, [r6, #4]
+; CHECK-ARMV6-STRICT-NEXT:    strb r2, [r6, #5]
 ; CHECK-ARMV6-STRICT-NEXT:    strb r3, [r6, #2]
-; CHECK-ARMV6-STRICT-NEXT:    strb r12, [r6, #3]
+; CHECK-ARMV6-STRICT-NEXT:    strb r1, [r6, #3]
+; CHECK-ARMV6-STRICT-NEXT:    strb r12, [r6]
+; CHECK-ARMV6-STRICT-NEXT:    strb lr, [r6, #1]
 ; CHECK-ARMV6-STRICT-NEXT:    pop {r4, r5, r6, pc}
 ; CHECK-ARMV6-STRICT-NEXT:    .p2align 2
 ; CHECK-ARMV6-STRICT-NEXT:  @ %bb.1:
diff --git a/llvm/test/CodeGen/ARM/unaligned_load_store_aeabi.ll b/llvm/test/CodeGen/ARM/unaligned_load_store_aeabi.ll
new file mode 100644
index 0000000000000..0f1adc55139c7
--- /dev/null
+++ b/llvm/test/CodeGen/ARM/unaligned_load_store_aeabi.ll
@@ -0,0 +1,425 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=thumbv6m-eabi -mattr=+strict-align %s -o - | FileCheck %s -check-prefix=CHECK-V6M
+; RUN: llc -mtriple=thumbv7m-eabi -mattr=+strict-align %s -o - | FileCheck %s -check-prefix=CHECK-V7M
+; RUN: llc -mtriple=thumbv7m-eabi -mattr=-strict-align %s -o - | FileCheck %s -check-prefix=CHECK-ALIGNED
+
+define void @loadstore4_align1(i32* %a, i32* %b) nounwind optsize minsize {
+; CHECK-V6M-LABEL: loadstore4_a...
[truncated]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using getExternalSymbol() directly for libcalls is deprecated; please go through RuntimeLibcalls.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about doing this for optsize (-Os). This has a significant performance penalty on the chips where it's likely to be relevant, and people using -Os do care about performance to some extent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants