Claimtire build taking too much ram and causing a crash #104

Open
opened 2023-07-07 22:33:15 +02:00 by niglomeister · 7 comments
niglomeister commented 2023-07-07 22:33:15 +02:00 (Migrated from github.com)

I'm trying to run a full lbcd node and synced up to height ~1 000 000 but now when i'm trying to run lbcd again i can't get past
height ~700 000 in the initial rebuilding of the claimtrie with it eating all my 16GB of ram + 16GB of swap and crashing.

The ram usage goes up very rapidly from around height 600 000.
Screenshot from 2023-07-07 22-26-05

Screenshot from 2023-07-07 22-26-48

Considering the blockchain is 1 400 000 blocks high at this moment how much ram is needed to run a full lbcd node ?
The readme says 8GB are needed but its already taking 32GB at height ~700 000 and i'm guessing the remaining blocks contain way more claims than the first.

I know the readme says that ram usage may increase over time but at this point if this is the expected usage i think the baseline should be updated to reflect more accurately the current state of things.

I'm trying to run a full lbcd node and synced up to height ~1 000 000 but now when i'm trying to run lbcd again i can't get past height ~700 000 in the initial rebuilding of the claimtrie with it eating all my 16GB of ram + 16GB of swap and crashing. The ram usage goes up very rapidly from around height 600 000. ![Screenshot from 2023-07-07 22-26-05](https://github.com/lbryio/lbcd/assets/96954560/74a9b34c-9ec9-47aa-8efb-bcaee8b685ab) ![Screenshot from 2023-07-07 22-26-48](https://github.com/lbryio/lbcd/assets/96954560/82be81b9-426f-4da3-aaa1-29423714695b) Considering the blockchain is 1 400 000 blocks high at this moment how much ram is needed to run a full lbcd node ? The readme says 8GB are needed but its already taking 32GB at height ~700 000 and i'm guessing the remaining blocks contain way more claims than the first. I know the readme says that ram usage may increase over time but at this point if this is the expected usage i think the baseline should be updated to reflect more accurately the current state of things.
roylee17 commented 2023-07-09 01:50:37 +02:00 (Migrated from github.com)

It took me ~50 mins to sync from 0 - 739,000 blocks with 1.4GB memory.
I remember even 1.3 million blocks back in January, the operational memory required was approximately 7GB.
Chances are your database might be corrupted, and the sync went rogue.
Remove the ~/.lbcd and re-sync and see if that changes.

2023-07-08 16:42:08.963 [INF] SYNC: Processed 762 blocks in the last 10.19s (27678 transactions, height 737332, 2020-03-25 02:06:34 -0700 PDT)
2023-07-08 16:42:18.964 [INF] SYNC: Processed 530 blocks in the last 10s (18964 transactions, height 737862, 2020-03-26 01:41:24 -0700 PDT)
2023-07-08 16:42:22.527 [INF] MAIN: RAM: using 1.4 GB with 7.2 available, DISK: using 13.6 GB with 740.9 available
2023-07-08 16:42:28.979 [INF] SYNC: Processed 668 blocks in the last 10.01s (22528 transactions, height 738530, 2020-03-27 07:26:50 -0700 PDT)
2023-07-08 16:42:38.992 [INF] SYNC: Processed 709 blocks in the last 10.01s (24291 transactions, height 739239, 2020-03-28 15:15:09 -0700 PDT)
It took me ~50 mins to sync from 0 - 739,000 blocks with 1.4GB memory. I remember even 1.3 million blocks back in January, the operational memory required was approximately 7GB. Chances are your database might be corrupted, and the sync went rogue. Remove the ~/.lbcd and re-sync and see if that changes. ``` 2023-07-08 16:42:08.963 [INF] SYNC: Processed 762 blocks in the last 10.19s (27678 transactions, height 737332, 2020-03-25 02:06:34 -0700 PDT) 2023-07-08 16:42:18.964 [INF] SYNC: Processed 530 blocks in the last 10s (18964 transactions, height 737862, 2020-03-26 01:41:24 -0700 PDT) 2023-07-08 16:42:22.527 [INF] MAIN: RAM: using 1.4 GB with 7.2 available, DISK: using 13.6 GB with 740.9 available 2023-07-08 16:42:28.979 [INF] SYNC: Processed 668 blocks in the last 10.01s (22528 transactions, height 738530, 2020-03-27 07:26:50 -0700 PDT) 2023-07-08 16:42:38.992 [INF] SYNC: Processed 709 blocks in the last 10.01s (24291 transactions, height 739239, 2020-03-28 15:15:09 -0700 PDT) ```
moodyjon commented 2023-07-09 15:48:04 +02:00 (Migrated from github.com)

You might also try to tweak some environment variables GOGC and GOMEMLIMIT to get behavior that is friendlier to your 16GB memory + 16GB swap environment. The default settings are GOGC=100 and GOMEMLIMIT=math.MaxInt64. For example, GOGC=100 means that starting from 1GB live memory usage, the program is allowed to allocate 100% more memory (doubling the heap to 2GiB) before the garbage collector scans the heap and releases unused memory. So out of your 32GB, half of that or more might be garbage.

The claimtrie build process makes lots of temporary allocations which become garbage immediately. The live memory needed to store the claimtrie is around 7GB, but you might observe the lbcd process using up to 14GB at the given moment (from the OS perspective).

See:
https://go.dev/doc/gc-guide#GOGC
https://go.dev/doc/gc-guide#Memory_limit

More on GOMEMLIMIT:
https://pkg.go.dev/runtime/debug#SetMemoryLimit

Running lbcd with a command like env GOGC=50 GOMEMLIMIT=16GiB ./lbcd ... would make the garbage collector more aggressive, especially above 16GiB heap usage. The cost of this is CPU time spent scanning the heap more often. The CPU time is usually not a big deal as long as there is 1 extra CPU core idle/available.

You might also try to tweak some environment variables `GOGC` and `GOMEMLIMIT` to get behavior that is friendlier to your 16GB memory + 16GB swap environment. The default settings are `GOGC=100` and `GOMEMLIMIT=math.MaxInt64`. For example, `GOGC=100` means that starting from 1GB live memory usage, the program is allowed to allocate 100% more memory (doubling the heap to 2GiB) before the garbage collector scans the heap and releases unused memory. So out of your 32GB, half of that or more *might* be garbage. The claimtrie build process makes lots of temporary allocations which become garbage immediately. The live memory needed to store the claimtrie is around 7GB, but you might observe the `lbcd` process using up to 14GB at the given moment (from the OS perspective). See: https://go.dev/doc/gc-guide#GOGC https://go.dev/doc/gc-guide#Memory_limit More on `GOMEMLIMIT`: https://pkg.go.dev/runtime/debug#SetMemoryLimit Running `lbcd` with a command like `env GOGC=50 GOMEMLIMIT=16GiB ./lbcd ...` would make the garbage collector more aggressive, especially above 16GiB heap usage. The cost of this is CPU time spent scanning the heap more often. The CPU time is usually not a big deal as long as there is 1 extra CPU core idle/available.
niglomeister commented 2023-07-09 17:23:55 +02:00 (Migrated from github.com)

Thank you.
i've tried to run it with env GOGC=50 GOMEMLIMIT=16GiB ./lbcd but i get the same result.
I've tried deleting the .lbcd folder and am now in the process of re-syncing the whole blockchain with the lastest release of lbcd. When i'm close to height 1 000 000 like i was before i'll try building the claim trie again and report on my results

Thank you. i've tried to run it with `env GOGC=50 GOMEMLIMIT=16GiB ./lbcd` but i get the same result. I've tried deleting the .lbcd folder and am now in the process of re-syncing the whole blockchain with the lastest release of lbcd. When i'm close to height 1 000 000 like i was before i'll try building the claim trie again and report on my results
roylee17 commented 2023-07-10 06:59:29 +02:00 (Migrated from github.com)

FYI: an M2 Max MacBook Pro took about 20 hours to sync to height 0 to 1,388,729 (2023-07-09 11:25:55 -0700 PDT)

2023-07-09 11:21:03.217 [INF] MAIN: RAM: using 8.9 GB with 6.7 available, DISK: using 176.3 GB with 581.5 available
2023-07-09 11:21:23.105 [INF] SYNC: Syncing to block height 1388727 from peer 5.135.140.105:9246
2023-07-09 11:23:00.183 [INF] SYNC: Processed 14 blocks in the last 15m1.95s (2174 transactions, height 1388728, 2023-07-09 11:22:45 -0700 PDT)
2023-07-09 11:23:03.235 [INF] MAIN: RAM: using 8.9 GB with 6.7 available, DISK: using 176.4 GB with 581.5 available
2023-07-09 11:23:43.235 [INF] MAIN: RAM: using 8.9 GB with 6.7 available, DISK: using 176.4 GB with 581.4 available
2023-07-09 11:24:53.106 [INF] SYNC: Syncing to block height 1388728 from peer 149.56.26.199:9246
2023-07-09 11:25:51.329 [INF] SYNC: Processed 1 block in the last 2m51.14s (472 transactions, height 1388729, 2023-07-09 11:25:55 -0700 PDT)
FYI: an M2 Max MacBook Pro took about 20 hours to sync to height 0 to 1,388,729 (2023-07-09 11:25:55 -0700 PDT) ``` 2023-07-09 11:21:03.217 [INF] MAIN: RAM: using 8.9 GB with 6.7 available, DISK: using 176.3 GB with 581.5 available 2023-07-09 11:21:23.105 [INF] SYNC: Syncing to block height 1388727 from peer 5.135.140.105:9246 2023-07-09 11:23:00.183 [INF] SYNC: Processed 14 blocks in the last 15m1.95s (2174 transactions, height 1388728, 2023-07-09 11:22:45 -0700 PDT) 2023-07-09 11:23:03.235 [INF] MAIN: RAM: using 8.9 GB with 6.7 available, DISK: using 176.4 GB with 581.5 available 2023-07-09 11:23:43.235 [INF] MAIN: RAM: using 8.9 GB with 6.7 available, DISK: using 176.4 GB with 581.4 available 2023-07-09 11:24:53.106 [INF] SYNC: Syncing to block height 1388728 from peer 149.56.26.199:9246 2023-07-09 11:25:51.329 [INF] SYNC: Processed 1 block in the last 2m51.14s (472 transactions, height 1388729, 2023-07-09 11:25:55 -0700 PDT) ```
niglomeister commented 2023-07-10 19:14:23 +02:00 (Migrated from github.com)

FYI: an M2 Max MacBook Pro took about 20 hours to sync to height 0 to 1,388,729 (2023-07-09 11:25:55 -0700 PDT)

2023-07-09 11:21:03.217 [INF] MAIN: RAM: using 8.9 GB with 6.7 available, DISK: using 176.3 GB with 581.5 available
2023-07-09 11:21:23.105 [INF] SYNC: Syncing to block height 1388727 from peer 5.135.140.105:9246
2023-07-09 11:23:00.183 [INF] SYNC: Processed 14 blocks in the last 15m1.95s (2174 transactions, height 1388728, 2023-07-09 11:22:45 -0700 PDT)
2023-07-09 11:23:03.235 [INF] MAIN: RAM: using 8.9 GB with 6.7 available, DISK: using 176.4 GB with 581.5 available
2023-07-09 11:23:43.235 [INF] MAIN: RAM: using 8.9 GB with 6.7 available, DISK: using 176.4 GB with 581.4 available
2023-07-09 11:24:53.106 [INF] SYNC: Syncing to block height 1388728 from peer 149.56.26.199:9246
2023-07-09 11:25:51.329 [INF] SYNC: Processed 1 block in the last 2m51.14s (472 transactions, height 1388729, 2023-07-09 11:25:55 -0700 PDT)

Just to be clear my problem happened not when syncing the blockchain but at the "building the full claimtrie in ram" point when restarting the node after it had been synced.

But it looks like you were right. My database must have been corrupted, i deleted the wholde .lbcd folder and started with a fresh install from the latest version and have now synced to the height i was at before.

When i restart lbcd now the claimtrie build only takes about 7GB of ram like you told me.

Thanks everyone for your help

> FYI: an M2 Max MacBook Pro took about 20 hours to sync to height 0 to 1,388,729 (2023-07-09 11:25:55 -0700 PDT) > > ``` > 2023-07-09 11:21:03.217 [INF] MAIN: RAM: using 8.9 GB with 6.7 available, DISK: using 176.3 GB with 581.5 available > 2023-07-09 11:21:23.105 [INF] SYNC: Syncing to block height 1388727 from peer 5.135.140.105:9246 > 2023-07-09 11:23:00.183 [INF] SYNC: Processed 14 blocks in the last 15m1.95s (2174 transactions, height 1388728, 2023-07-09 11:22:45 -0700 PDT) > 2023-07-09 11:23:03.235 [INF] MAIN: RAM: using 8.9 GB with 6.7 available, DISK: using 176.4 GB with 581.5 available > 2023-07-09 11:23:43.235 [INF] MAIN: RAM: using 8.9 GB with 6.7 available, DISK: using 176.4 GB with 581.4 available > 2023-07-09 11:24:53.106 [INF] SYNC: Syncing to block height 1388728 from peer 149.56.26.199:9246 > 2023-07-09 11:25:51.329 [INF] SYNC: Processed 1 block in the last 2m51.14s (472 transactions, height 1388729, 2023-07-09 11:25:55 -0700 PDT) > ``` Just to be clear my problem happened not when syncing the blockchain but at the "building the full claimtrie in ram" point when restarting the node after it had been synced. But it looks like you were right. My database must have been corrupted, i deleted the wholde .lbcd folder and started with a fresh install from the latest version and have now synced to the height i was at before. When i restart lbcd now the claimtrie build only takes about 7GB of ram like you told me. Thanks everyone for your help
niglomeister commented 2023-09-17 13:26:51 +02:00 (Migrated from github.com)

Hey. Just to say that the same bug is happening again. When i build the claimtrie it takes all my 16gb of rame + 16gb of swap. I'll delete the database again and resync since it solved it last time but it would be good to look into what's causing that issue

Hey. Just to say that the same bug is happening again. When i build the claimtrie it takes all my 16gb of rame + 16gb of swap. I'll delete the database again and resync since it solved it last time but it would be good to look into what's causing that issue
kaichaosun commented 2023-10-30 15:54:12 +01:00 (Migrated from github.com)

I confirm the issue also exist in my linux server with 16GB RAM.

2023-10-30 15:35:52.667 [INF] MAIN: RAM: using 10.4 GB with 4.8 available, DISK: using 56.7 GB with 156.8 available
2023-10-30 15:35:56.161 [INF] CHAN: Rebuilding claim trie data to 936681. At: 650684
2023-10-30 15:36:01.176 [INF] CHAN: Rebuilding claim trie data to 936681. At: 666520
2023-10-30 15:36:06.176 [INF] CHAN: Rebuilding claim trie data to 936681. At: 678459
2023-10-30 15:36:11.178 [INF] CHAN: Rebuilding claim trie data to 936681. At: 689073
2023-10-30 15:36:16.179 [INF] CHAN: Rebuilding claim trie data to 936681. At: 692127
2023-10-30 15:36:21.179 [INF] CHAN: Rebuilding claim trie data to 936681. At: 697614
2023-10-30 15:36:26.180 [INF] CHAN: Rebuilding claim trie data to 936681. At: 703273
2023-10-30 15:36:31.181 [INF] CHAN: Rebuilding claim trie data to 936681. At: 707315
2023-10-30 15:36:32.708 [INF] MAIN: RAM: using 13.1 GB with 2.2 available, DISK: using 56.7 GB with 156.8 available
2023-10-30 15:36:36.182 [INF] CHAN: Rebuilding claim trie data to 936681. At: 711832
2023-10-30 15:36:41.182 [INF] CHAN: Rebuilding claim trie data to 936681. At: 715616
2023-10-30 15:36:46.183 [INF] CHAN: Rebuilding claim trie data to 936681. At: 722001
2023-10-30 15:36:51.183 [INF] CHAN: Rebuilding claim trie data to 936681. At: 727028
2023-10-30 15:36:56.187 [INF] CHAN: Rebuilding claim trie data to 936681. At: 734153
2023-10-30 15:37:01.187 [INF] CHAN: Rebuilding claim trie data to 936681. At: 741013
2023-10-30 15:37:06.559 [INF] CHAN: Rebuilding claim trie data to 936681. At: 743350
2023-10-30 15:37:13.408 [INF] CHAN: Rebuilding claim trie data to 936681. At: 743354
2023-10-30 15:37:18.504 [INF] CHAN: Rebuilding claim trie data to 936681. At: 743356

This is pprof map:
pprof001

I confirm the issue also exist in my linux server with 16GB RAM. ``` 2023-10-30 15:35:52.667 [INF] MAIN: RAM: using 10.4 GB with 4.8 available, DISK: using 56.7 GB with 156.8 available 2023-10-30 15:35:56.161 [INF] CHAN: Rebuilding claim trie data to 936681. At: 650684 2023-10-30 15:36:01.176 [INF] CHAN: Rebuilding claim trie data to 936681. At: 666520 2023-10-30 15:36:06.176 [INF] CHAN: Rebuilding claim trie data to 936681. At: 678459 2023-10-30 15:36:11.178 [INF] CHAN: Rebuilding claim trie data to 936681. At: 689073 2023-10-30 15:36:16.179 [INF] CHAN: Rebuilding claim trie data to 936681. At: 692127 2023-10-30 15:36:21.179 [INF] CHAN: Rebuilding claim trie data to 936681. At: 697614 2023-10-30 15:36:26.180 [INF] CHAN: Rebuilding claim trie data to 936681. At: 703273 2023-10-30 15:36:31.181 [INF] CHAN: Rebuilding claim trie data to 936681. At: 707315 2023-10-30 15:36:32.708 [INF] MAIN: RAM: using 13.1 GB with 2.2 available, DISK: using 56.7 GB with 156.8 available 2023-10-30 15:36:36.182 [INF] CHAN: Rebuilding claim trie data to 936681. At: 711832 2023-10-30 15:36:41.182 [INF] CHAN: Rebuilding claim trie data to 936681. At: 715616 2023-10-30 15:36:46.183 [INF] CHAN: Rebuilding claim trie data to 936681. At: 722001 2023-10-30 15:36:51.183 [INF] CHAN: Rebuilding claim trie data to 936681. At: 727028 2023-10-30 15:36:56.187 [INF] CHAN: Rebuilding claim trie data to 936681. At: 734153 2023-10-30 15:37:01.187 [INF] CHAN: Rebuilding claim trie data to 936681. At: 741013 2023-10-30 15:37:06.559 [INF] CHAN: Rebuilding claim trie data to 936681. At: 743350 2023-10-30 15:37:13.408 [INF] CHAN: Rebuilding claim trie data to 936681. At: 743354 2023-10-30 15:37:18.504 [INF] CHAN: Rebuilding claim trie data to 936681. At: 743356 ``` This is pprof map: ![pprof001](https://github.com/lbryio/lbcd/assets/10568673/0397f665-ea6b-44e4-a7ab-3c1ecca94f67)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: LBRYCommunity/lbcd#104
No description provided.