Hints on running a Proofgold node

by Brown, Saturday, June 20, 2020, 13:31 (533 days ago)
edited by Brown, Saturday, June 20, 2020, 14:29

After running proofgold for about 10 days, my assessment is that the software works, but barely. It requires some babysitting. I thought it might help others if I listed a few hints on getting a node running and keeping a node running.

Getting started

The README provides some information about how to compile the code and start it running, but sometimes this isn't enough.

It looks like threading in ocaml is unreliable, so compiling with the script I posted the other day (makevmbytecode) will be necessary on some systems.

Also, on some systems, /dev/random either doesn't work properly or blocks for too long to be useful. The command line argument -randomseed can be used to remedy this to some degree. 64 hex characters can be generated on a computer with a working /dev/random and then -randomseed <64hexchars> can be given as a command line argument when starting proofgold. This is sometimes still not enough since the function strong_rand_256 in sha256.ml will still try to use /dev/random. A -- possibly unsafe -- way to deal with this is replace all calls to strong_rand_256 with calls to rand_256. We also tried changing the code to use /dev/urandom instead, but this caused very counterintuitive exceptions (Sys_blocking_io).

Nodes often have trouble connecting (and staying connected) to other nodes. When getpeerinfo says there are no connections, I found the information in Blake's post helpful: https://proofgold.org/forum/index.php?id=22

Keeping a node running

Nodes often die for unexplained reasons. This might happen roughly once a day, so it's a good idea to occasionally check if your node has died. If it has, restart it by hand. Restarting might require removing the lock file.

Even when nodes don't die, they often get slower as time passes, so I find it helps to restart my nodes once a day anyway. (The exit command stops a node.)

Sometimes nodes have trouble syncing even though they are connected. The command ltcstatus shows the current block chain (from the last week) and may indicate that certain headers or deltas of blocks are missing. If they are still missing after 10 minutes, it might be worth using the command "requestblock <blockid>" to explicitly request the block from peers. If no peer has the block, try to connect to more peers or post about it on the forum.

I also had the database get corrupted once and had to delete everything and resync from scratch. At this point every few days I make a backup copy of the .proofgold directory in case this happens again.

Other issues

I've noticed sometimes I send transactions and several blocks pass before they are confirmed. It's possible transactions aren't being propagated properly. I don't know if using "sendtx" multiple times after restarting the node helps, but I've done it sometimes and they tend to eventually confirm.

There were a few transactions where I was trying to publish documents with 200 bounties. These would not confirm for a long time. In the debug log (.proofgold/debug.log) there were messages saying the tx would make the block "too big". I didn't believe that, so I hacked the code to force the txs into a block I staked. The resulting blocks were not too big and were valid. Maybe this is risky, so I am not suggesting it for others, but if you have a tx with a publication that won't confirm, nodes might think the tx is too big. I guess a rule of thumb for now would be to not include more than, say, 40 items in a document.

Proofgold Forum

Hints on running a Proofgold node