Skip to content

Change .location, NewLocation, and Location to lazily compute location information#5482

Closed
ashfordium wants to merge 12 commits intojoernio:masterfrom
ashfordium:flanders/lazyloc
Closed

Change .location, NewLocation, and Location to lazily compute location information#5482
ashfordium wants to merge 12 commits intojoernio:masterfrom
ashfordium:flanders/lazyloc

Conversation

@ashfordium
Copy link
Copy Markdown

  • Adds Loc class as replacement for LocationCreator and the location node step. Loc is a replacement that is meant to compute and store less up-front compared to LocationCreator and location. IIUC, scala lazy values also take up space, so I eyeballed the operations that might take more steps to compute and made those lazy and those that are just lookups on the underlying node and made these methods.
  • Adds a duplicate of one of the c2cpg test files to use Loc rather than replacing the other tests for now
  • Add deprecation warnings to the previous location code

@ashfordium ashfordium requested review from bbrehm and ml86 May 12, 2025 17:31
Copy link
Copy Markdown
Contributor

@itsacoderepo itsacoderepo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the short comment. Please don't merge this, we need to discuss it first.

Comment thread semanticcpg/src/main/scala/io/shiftleft/semanticcpg/language/Loc.scala Outdated
@fabsx00
Copy link
Copy Markdown
Contributor

fabsx00 commented May 14, 2025

Why do we need to deprecate the .location step and replace it with .loc? The query language is long-standing public API and .location can be spoken while .loc is more of an abbreviation.

@fabsx00 fabsx00 self-requested a review May 15, 2025 08:44
Copy link
Copy Markdown
Contributor

@fabsx00 fabsx00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should deprecate .location in favour of .loc. Is it not possible to simply adapt the old implementation but keep the name? Also, can you motivate why this change is necessary in the first place?

@ashfordium
Copy link
Copy Markdown
Author

I don't think we should deprecate .location in favour of .loc. Is it not possible to simply adapt the old implementation but keep the name? Also, can you motivate why this change is necessary in the first place?

@fabsx00, sorry, I did not see your earlier mesage.

This is an attempt at a more efficient implementation of LOCATION and the .location node step. These implementations compute and store all of the fields of the Location/NewLocation, but users may not use all of the computed information.

I had an internal issue assigned to me, IIUC deprecating instead of replacing the location implementation is because there is maybe some dislike for defining location in the CPG schema. I will send you a link here and maybe we can all discuss.

@ashfordium
Copy link
Copy Markdown
Author

After some discussion: there are understandably some problems with the name loc and Loc: it could means 'lines of code' or something else and isn't directly related to location unless you know already. Maybe we will change the name to .newLocation or .locationLazy or something besides .loc.

@ashfordium
Copy link
Copy Markdown
Author

It was decided that we will replace Location and NewLocation with the new lazily computed location info (previously Loc and .loc in this PR thread). This also means deleting these from the CPG spec: ShiftLeftSecurity/codepropertygraph#1819.

@ashfordium
Copy link
Copy Markdown
Author

Pushed some changes to make this nicer to use as a consumer of joern that extends the codepropertygraph / location types. It is nicer in that both the joern location types and any extended location types can both be wildcard imported, but the extended location types will be used by the implicits because of the way extending the location information uses subclassing.

Say that you have a package, io.shiftleft.extendedcpg that extends the codepropertygraph and location nodes. For these extended nodes, you subclass LazyLocation with something called like ExtendedLazyLocation and provide the implicit apply. Then, in the consuming packages, you now do this:

package io.shiftleft.extendedcpg;

def doSomething(node: AbstractNode): Unit = {
  ...
  val location = node.location
  ...
}

and everything works as expected even if node is an extended node type.

Similarly, if you import both the location types in joern and the location types in the extending and consuming library, then the implicit location creator in the extending and consuming library will be chosen since it will be more specific with this design. So this works as expected:

package io.shiftleft.someotherpackage;
import io.shiftleft.extendedcpg.*
import import io.shiftleft.semanticcpg.language.*

def doSomething(node: AbstractNode): Unit = {
  ...
  val location = node.location
  ...
}

The location will be ExtendedLazyLocation and the location creator in the extended library will be used, not the one in the joern repo.

@ashfordium ashfordium changed the title Switch to lazy/less storage .loc and Loc over .location and {New}Location Change .location, NewLocation, and Location to lazily compute location information May 27, 2025
Comment on lines +22 to +31
// Previously, the non-stored Location node generated
// from the CPG schema was named NewLocation. The
// addition of lazily computed location info in this
// file brought the deletion of these CPG nodes. This
// is not directly used in joern, but intended to not
// break usage in users of the API. Users directly
// importing NewLocation from the CPG package rather
// will still have a break.
@deprecated("NewLocation is deprecated, prefer LocationInfo")
type NewLocation = LocationInfo
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the scenario here is that people only have to adjust den import but not the other references to NewLocation in there code?

override def location: NewLocation =
LocationCreator.defaultCreateLocation(node)
override def location: LocationInfo = {
node.location
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a recursive call to location, at least my IDE agrees with me.
Same applies to the other XXXMethods.scala files except for NodeMethods.scala.
I think we do not even need all those overrides anymore. It is enough to have location defined in NodeMethods.

Comment on lines +33 to +35
trait HasLocation extends Any {
def location: LocationInfo
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this? I have not seem a usage in Joern or CS repo outside of declarations.

@ashfordium ashfordium closed this by deleting the head repository Jun 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants