Skip to content

Incrementally updated hash differs depending on chunk size #151

@J-Gras

Description

@J-Gras

I ran into an issue where the same data can yield different hashes when using the API for incremental updates. It seems that using a too small first chunk causes the deviation:

import tlsh

data = b"Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod"

print(f"One-shot: {tlsh.hash(data)}")

h_long_inc = tlsh.Tlsh()
h_long_inc.update(data[:5])
h_long_inc.update(data[5:])
h_long_inc.final()

print(f"Long:     {h_long_inc.hexdigest()}")

h_short_inc = tlsh.Tlsh()
h_short_inc.update(data[:4])
h_short_inc.update(data[4:])
h_short_inc.final()

print(f"Short:    {h_short_inc.hexdigest()}")

The above yields the following result:

One-shot: T19AA0120D0B41078406C204393AA94058A6082010E26C68420CB6B028112200C8020555
Long:     T19AA0120D0B41078406C204393AA94058A6082010E26C68420CB6B028112200C8020555
Short:    T141A0121D0B41054402C604393AA94058A2082010E36C58410CB5B024112100C8020559

I noticed this rerunning a test that was baselined ~8 years ago. So this seems to be a regression. For said test, the hash differed only in a single character.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions