This commit refactors the entire websocket client code to resolve several
issues with the previous implementation. Note that this commit does not
change the public API for websockets. It only consists of internal
improvements.
The following is the major issues which have been addressed:
- A slow websocket client could impede notifications to all clients
- Long-running operations such as rescans would block all other requests
until it had completed
- The above two points taken together could lead to apparant hangs since
the client doing the rescan would eventually run out of channel buffer
and block the entire group of clients until the rescan completed
- Disconnecting a websocket during certain operations could lead to a hang
- Stopping the rpc server with operations under way could lead to a hang
- There were no limits to the number of websocket clients that could
connect
The following is a summary of the major changes:
- The websocket code has been split into two entities: a
connection/notification manager and a websocket client
- The new connection/notification manager acts as the entry point from
the rest of the subsystems to feed data which potentially needs to
notify clients
- Each websocket client now has its own instance of the new websocket
client type which controls its own lifecycle
- The data flow has been completely redesigned to closely resemble the
peer data flow
- Each websocket now has its own long-lived goroutines for input, output,
and queuing of notifications
- Notifications use the new notification queue goroutine along with
queueing to ensure they dont't block on stalled or slow peers
- There is a new infrastructure for asynchronously executing long-running
commands such as a rescan while still allowing the faster operations to
continue to be serviced by the same client
- Since long-running operations now run asynchronously, they have been
limited to one at a time
- Added a limit of 10 websocket clients. This is hard coded for now, but
will be made configurable in the future
Taken together these changes make the code far easier to reason about and
update as well solve the aforementioned issues.
Further optimizations to improve performance are possible in regards to
the way the connection/notification manager works, however this commit
already contains a ton of changes, so they are being left for another
time.
This commit adds a new configuration option, --rpcmaxclients, to limit the
number of max standard RPC clients that are served concurrently. Note
that this value does not apply to websocket connections. A future commit
will add support for limiting those separately.
Closes#68.
Rather than using a type specifically in btcd for the getrawmempool, this
commit, along with a recent commit to btcjson, changes the code over to
use the type from btcjson. This is more consistent with other RPC results
and provides a few extra benefits such as the ability for btcjson to
automatically unmarshal the results into a concrete type with proper field
types as opposed to a generic interface.
Rather than using a type specifically in btcd for the getpeerinfo, this
commit, along with a recent commit to btcjson, changes the code over to
use the type from btcjson. This is more consistent with other RPC results
and provides a few extra benefits such as the ability for btcjson to
automatically unmarshal the results into a concrete type with proper field
types as opposed to a generic interface.
Recent commits to the reference implementation have changed the syncnode
field to be present in the getpeerinfo RPC even when it is false. This
commit changes btcd to match.
The wsContext was being locked twice when NewBlockNotifyCheckTxIn is
called. Fixed by changing handlers to assume lock is acquired and
renamed methods to not be exported.
This commit changes the server byte counters over to use a mutex instead
of the atomic package. The atomic.AddUint64 function requires the struct
fields to be 64-bit aligned on 32-bit platforms. The byte counts are
fields in the server struct and are not 64-bit aligned. While it would be
possible to arrange the fields to be aligned through various means, it
would make the code too fragile for my tastes. I prefer code that doesn't
depend on platform specific alignment.
Fixes#96.
This commit adds a new option, --logdir, which works in the same fashion
as the --datadir option. Consequently, the logging directory is name
"namespaced" by the network as well. This resolves the issue where two
btcd instances running (one for mainnet and one for testnet) would
overwrite each other's log files by default.
It also provides the user with a method to change the logging location to
non-default locations if they prefer. For example, it enables multiple
btcd instances on the same network to specify unique logging directories
(even though running multiple btcd instances on the same network is not
the most sane configuration).
Closes#95.
Since the websocket handlers run in their own separate goroutines, it's
possible that they execute after the websocket connection has been
closed and cleaned up. This commit add the necessary checks to ensure
stale data isn't added to notification lists and that requests to closed
connections are ignored.
This closes#92.
Access to connections map and associated notification maps in rpcServer
need to be protected with s.ws.Lock to prevent race with add/remove new
clients.
This closes#88.
Changed mempool.MaybeAcceptTransaction to accept an additional parameter
to differentiate betwee new transactions and those added from
disconnected blocks.
Added new fields to requestContexts to indicate which clients want to
receive all new transaction notifications.
Added NotifyForNewTx to rpcServer to deliver approriate transaction
notification.
Previously the getnettotals was just looping through all of the currently
connected peers to sum the byte counts and returning that. However, the
intention of the getnettotals RPC is to get all bytes since the server was
started, so this logic was not correct.
This commit modifies the code to keep an atomic counter on the server for
bytes read/written and has each peer update the server counters as well as
the per-peer counters.
Rather than using a dedicated channel for the sync peer request and reply,
use a single query channel that accepts a query type as well as a reply
channel. This will allow other queries to be added in the future without
the various queries being racy.
This commit adds byte counters to each peer using the new btcwire
ReadMessageN and WriteMessageN functions to obtain the number of bytes
read and written, respectively. It also returns those byte counters via
the PeerInfo struct which is used to populate the RPC getpeerinfo reply.
Closes#83.
This commit moves the connection endpoint for websockets to /ws instead of
/wallet. First, the former is more standard, and second the latter
presumes how the websocket is to be used.
Closes#80.
This commit improves how the headers-first mode works in several ways.
The previous headers-first code was an initial implementation that did not
have all of the bells and whistles and a few less than ideal
characteristics. This commit improves the heaers-first code to resolve
the issues discussed next.
- The previous code only used headers-first mode when starting out from
block height 0 rather than allowing it to work starting at any height
before the final checkpoint. This means if you stopped the chain
download at any point before the final checkpoint and restarted, it
would not resume and you therefore would not have the benefit of the
faster processing offered by headers-first mode.
- Previously all headers (even those after the final checkpoint) were
downloaded and only the final checkpoint was verified. This resulted in
the following issues:
- As the block chain grew, increasingly larger numbers of headers were
downloaded and kept in memory
- If the node the node serving up the headers was serving an invalid
chain, it wouldn't be detected until downloading a large number of
headers
- When an invalid checkpoint was detected, no action was taken to
recover which meant the chain download would essentially be stalled
- The headers were kept in memory even though they didn't need to be as
merely keeping track of the hashes and heights is enough to provde they
properly link together and checkpoints match
- There was no logging when headers were being downloaded so it could
appear like nothing was happening
- Duplicate requests for the same headers weren't being filtered which
meant is was possible to inadvertently download the same headers twice
only to throw them away.
This commit resolves these issues with the following changes:
- The current height is now examined at startup and prior each sync peer
selection to allow it to resume headers-first mode starting from the
known height to the next checkpoint
- All checkpoints are now verified and the headers are only downloaded
from the current known block height up to the next checkpoint. This has
several desirable properties:
- The amount of memory required is bounded by the maximum distance
between to checkpoints rather than the entire length of the chain
- A node serving up an invalid chain is detected very quickly and with
little work
- When an invalid checkpoint is detected, the headers are simply
discarded and the peer is disconnected for serving an invalid chain
- When the sync peer disconnets, all current headers are thrown away
and, due to the new aforementioned resume code, when a new sync peer
is selected, headers-first mode will continue from the last known good
block
- In addition to reduced memory usage from only keeping information about
headers between two checkpoints, the only information now kept in memory
about the headers is the hash and height rather than the entire header
- There is now logging information about what is happening with headers
- Duplicate header requests are now filtered
Previously the logging function which reports on progress was called for
every block, regardless of whether it was an orphan or not. This could be
confusing since it could show a different number of blocks processed as
compared to the old versus new heights reported (orphans do not add to the
block height since they aren't extending the main chain). Further, the
database had to be consulted for the latest block since the block we just
processed might not be the latest one if it was an orphan. This is quite
a bit more time conusming than it should've been for progress reporting.
This commit modifies that to only include non-orphan blocks. As a result,
the latest height shown will match the number of blocks processed (even
when there are orphans) and the additional block lookup from the database
is avoided.