Skip to content

Fixing intermittent invalid checksum messages#160

Open
pkhodak wants to merge 1 commit into
JanM321:mainfrom
pkhodak:main
Open

Fixing intermittent invalid checksum messages#160
pkhodak wants to merge 1 commit into
JanM321:mainfrom
pkhodak:main

Conversation

@pkhodak

@pkhodak pkhodak commented Apr 3, 2026

Copy link
Copy Markdown

Problem

ESP32 starts reading UART mid-message after boot. Since the LG CN-REMO protocol has no start-of-frame marker, the 13-byte framing window gets permanently shifted. This causes checksum errors intermittently. I have 3 LG indoor units with wall controls running in a slave mode and 2 of them were nearly always going into inflnite loop of messages like these:

[10:56:54.029][D][lg-controller:954]: received 00.00.00.00.00.00.00.00.00.00.F9.A8.20 (13)[10:56:54.029][E][lg-controller:964]: invalid checksum 00.00.00.00.00.00.00.00.00.00.F9.A8.20 (13)[10:56:54.033][D][lg-controller:954]: received 00.00.00.00.25.14.00.00.00.00.54.A8.20 (13)[10:56:54.036][E][lg-controller:964]: invalid checksum 00.00.00.00.25.14.00.00.00.00.54.A8.20 (13)[10:56:54.040][D][lg-controller:954]: received 00.00.00.00.25.14.00.00.00.00.54.AC.00 (13)[10:56:54.044][E][lg-controller:964]: invalid checksum 00.00.00.00.25.14.00.00.00.00.54.AC.00 (13)[10:56:54.047][D][lg-controller:954]: received 00.00.00.00.00.00.00.00.00.00.F9.AC.00 (13)[10:56:54.051][E][lg-controller:964]: invalid checksum 00.00.00.00.00.00.00.00.00.00.F9.AC.00 (13)[10:57:00.026][D][lg-controller:1360]: update

The fix below sorted it out and the messages are typically synced within a few cycles.

Fix: Two changes:

1: Startup re-flush (prevents the problem)

Added startup_flush_done_ member variable and a guard at the top of update() that re-flushes the UART buffer on the first call. This clears bytes that accumulated during the 10-second setup-to-update gap (root cause).

2: Sliding window (recovers from misalignment)

On checksum failure, instead of discarding all 13 bytes, drops the oldest byte and slides remaining 12 bytes left. The next incoming byte completes a new 13-byte window for immediate re-evaluation. Converges within a single update() cycle since multiple bytes are typically buffered.

Uses the existing calc_checksum() function — no duplicated checksum logic.

# Problem
ESP32 starts reading UART mid-message after boot. Since the LG CN-REMO protocol has no start-of-frame marker, the 13-byte framing window gets permanently shifted. This causes checksum errors intermittently.

## Fix: Two changes:

### 1: Startup re-flush (prevents the problem)

Added `startup_flush_done_` member variable and a guard at the top of `update()` that re-flushes the UART buffer on the first call. This clears bytes that accumulated during the 10-second setup-to-update gap (root cause).

### 2: Sliding window (recovers from misalignment)

On checksum failure, instead of discarding all 13 bytes, drops the oldest byte and slides remaining 12 bytes left. The next incoming byte completes a new 13-byte window for immediate re-evaluation. Converges within a single `update()` cycle since multiple bytes are typically buffered.

Uses the existing `calc_checksum()` function — no duplicated checksum logic.
@kchen

kchen commented Apr 3, 2026

Copy link
Copy Markdown

My LG system is currently broken for unrelated reasons, so I can't test this myself at the moment, but if this works, that's great to see!

Interestingly, I specifically see the checksum problem on my LMN079HVT indoor units, but do not see the problem on my LSN090HSV5 and KNSAL091A indoor units.

@pkhodak

pkhodak commented Apr 3, 2026

Copy link
Copy Markdown
Author

I have a few units but these two MT11R.NU1 and MT09R.NU1 refused to work until I fixed the buffer start logic. I've tested it so far for a couple of days and it seems to be ok. I hope the code change is generally non destructive and shouldn't change anything for other unit types.

One other small concern is it is probably good to fix those deprecation warnings at some point before it breaks in about 6-7 months. I've looked into dynamic hiding of unused entities and the easiest fix might be to consider using set_disabled_by_default for all non-supported device capabilities. It won't hide the as nicely as it is done now but at least would not keep them floating around and move it to the disabled entities list. Haven't tried it myself and wanted to check if this is something you have considered as a potential fix? Another alternative is register_sensor dynamically for all capabilities that are supported but it feels a lot more work.

@kchen

kchen commented Apr 11, 2026

Copy link
Copy Markdown

My LG system has been repaired, so I tried this out. From the description, I thought that the "Startup re-flush" would fix the original problem, although it looks like I'm seeing lots of "Checksum mismatch" messages after startup. It looks like in some cases, there are 13 "Checksum mismatch" messages in a row; since the expected messages are 13 bytes, don't know if that's a bug of some sort, although the PR at a quick glance seems reasonable to me.

logs_lg-hvac-office_run.txt

Given these logs, I think perhaps the issue with my system is different than yours, and it may be the case that my system's communication is actually just flaky. (Edit: Unclear whether this is the case, given the padding message issue mentioned in the next comment.)

When there is a checksum mismatch, it may be useful to log the actual message, since in my case, I can't tell what data is being discarded, whereas before, the message with a bad checksum would be logged.

@kchen

kchen commented Apr 11, 2026

Copy link
Copy Markdown

I went back to JanM321:main, and took a look at my logs, and see:

[00:19:58.373][D][lg-controller:954]: received 00.00.00.00.00.00.00.00.00.00.00.00.00 (13)
[00:19:58.373][D][lg-controller:961]: Ignoring padding message sent by unit

I assume that with pkhodak:main, such padding messages are the source of the 13 consecutive "Checksum mismatch" messages, so it would also be useful to account for such messages, rather than warning and shifting 13 times.

@kchen

kchen commented May 26, 2026

Copy link
Copy Markdown

I'm working on https://github.com/kchen/esphome-lg-controller/tree/handle-corrupted-messages , which may be a better fix to resolve the problem in the original description. I'm not quite ready for a PR yet (planning to add one more feature and get some more testing in), but I expect to create a PR in the next few days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants