We discussed the idea of adding a livecheck strategy to check crate
versions years ago but decided to put it off because it would have
only applied to one formula at the time (and it wasn't clear that a
crate was necessary in that case). We now have a few formulae that
use a crate in the `stable` URL (`cargo-llvm-cov`, `pngquant`,
`oakc`) and another formula with a crate resource (`deno`), so
there's some value to the idea now.
I established a standard approach for checking crate versions in a
somewhat recent `pngquant` `livecheck` block update and this commit
reworks it into a strategy, so we won't have to duplicate that
`livecheck` block in these cases. With this strategy, we usually
won't even need a `livecheck` block at all.
Under normal circumstances, a regex and/or strategy block shouldn't
be necessary but the strategy supports them when needed. The response
from the crates.io API is a JSON object, so this uses
`Json#versions_from_content` internally and a `strategy` block will
receive the parsed `json` object and a regex (the strategy default or
the regex from the `livecheck` block).
The default curl args in `#curl_headers` cover most of
`Livecheck::Strategy::DEFAULT_CURL_ARGS` but `--max-redirs` was
overlooked. This adds an explicit `--max-redirs` argument in
the `#page_headers` `#curl_headers` call but it's worth mentioning
that this approach wouldn't benefit from any changes in
`DEFAULT_CURL_ARGS` and would need to be manually kept in parity.
`#curl_headers` was recently introduced into `Strategy#page_headers`
but only the call was modified and the method wasn't updated to
correctly work with the new return value, so all `HeaderMatch` checks
immediately started failing with an error.
This commit includes changes that return `#page_headers` to a working
state. I've removed the `result.assert_success!` call because it
prevents a few checks from being retried with `GET` (`firefox-cn`,
`krisp`, `prepros`).
This adds a generic `Yaml` strategy to livecheck that requires a
`strategy` block to operate. The YAML-parsing code is taken from the
existing approach in the `ElectronBuilder` strategy.
We don't currently have any `strategy` blocks in first-party taps
that manually parse YAML. However, creating a generic `Yaml` strategy
allows us to simplify `ElectronBuilder` (and any future strategy
that works with YAML) while making it easy to create custom `Yaml`
`strategy` blocks in formulae/casks as needed.
This adds a generic `Xml` strategy to livecheck that requires a
`strategy` block to operate. The XML-parsing code is taken from the
existing approach in the `Sparkle` strategy. As such, `Sparkle` has
been updated to use the `Xml#parse_xml` method instead.
Unlike the `Json` strategy, we don't currently have any `strategy`
blocks in first-party taps that manually parse XML. However, we had a
user request support for something like this and I was already working
on an `Xml` strategy (as a way of extracting the XML-parsing code
from `Sparkle` into something general-purpose), so here we are.
Future strategies that parse simple XML data can potentially use the
`Xml#find_versions` method (similar to how we have strategies that
leverage `PageMatch#find_versions`) instead of having to implement
something bespoke like `Sparkle`.
Passing the strategy symbol into the `#from_url` `select` block
means that we can also get rid of a `#from_symbol` call in a
different conditional branch, as we can directly compare the
`livecheck_strategy` symbol to `strategy_symbol`.
When the `Json` strategy was introduced, I forgot to also ensure
that it's only treated as usable (in `Strategy#from_url`) if a
`livecheck` block uses `strategy :json`. As a result, `Json` is
incorrectly treated as a usable strategy for all formulae/casks that
contain a `strategy` block.
Since all of these `livecheck` blocks specify a strategy, this bug
doesn't meaningfully impact livecheck's behavior (i.e., these checks
continue to use their explicitly-specified strategy). The only
practical difference is that `Json` incorrectly appears in the list
of usable strategies in livecheck's verbose JSON output.
This commit modifies `Strategy#from_url` to address this issue. The
easiest way to enforce this rule involved passing in the
`@strategies` key (a symbol) into the `select` block, so we can
compare it to `livecheck_strategy` (the strategy symbol specified in
the `livecheck` block).
This adds a generic `Json` strategy to livecheck that requires a
`strategy` block to operate. This is primarily intended as a
replacement for existing `strategy` blocks in formulae/casks that
use `JSON#parse`, as it allows us to internalize/standardize that
boilerplate while improving error-handling.
Additionally, future strategies that parse JSON data can use the
`Json#find_versions` method instead of having to reinvent the wheel
(similar to how we currently have a number of strategies that
leverage `PageMatch#find_versions`).
The default redirection maximum for `curl` is 50 but we should use
something more reasonable in livecheck. It's rare but a misconfigured
server with an endless redirection loop will hit the 50 redirection
limit. Unfortunately, we've encountered this in the wild (e.g., the
server for `getmail` and `memtester` endlessly redirects), so it's
not an idle concern. This commit basically adds `--max-redirs 5` to
`Livecheck::Strategy::DEFAULT_CURL_ARGS` to enforce a more reasonable
redirection maximum.
To be clear, the `max_iterations` logic in `#parse_curl_output`
(which was previously found in `Strategy#page_content`) doesn't
restrict the number of redirections that `curl` follows. At the point
the `curl` output is being parsed, the requests have already been
made and `max_iterations` simply restricts the number of responses
`#parse_curl_output` is willing to parse. If we use `--max-redirs`
and properly set `max_iterations` to `max-redirs + 1`, we shouldn't
encounter the "Too many redirects" error in `#parse_curl_output`.
Currently, only `Livecheck::Strategy::PAGE_HEADERS_CURL_ARGS` uses
the `--silent` option and `PAGE_CONTENT_CURL_ARGS` does not (though
there's no intention behind this omission). However, the
`#page_content` method should also use the `--silent` flag, to
prevent progress bar text (`#=#=#`, etc.) from appearing in output.
This is an issue because the regex that's used to identify `curl`
error messages in `stderr` (`^curl:.+$/`) will fail if leading
progress bar text is present. This leads to an ambiguous "cURL
failed without a detectable error" message instead of the actual
error message(s) from `curl`.
This commit addresses the issue by adding `--silent` to
`Livecheck::Strategy::DEFAULT_CURL_ARGS`, which both
`PAGE_HEADERS_CURL_ARGS` and `PAGE_CONTENT_CURL_ARGS` inherit.
The existing regex wasn't able to match errors like:
curl: option --something: is unknown
Additionally, the existing approach wouldn't capture multi-line
errors, whereas this captures all the `curl:` lines from `stderr`.
Valid `strategy` block return types currently vary between
strategies. Some only accept a string whereas others accept a string
or array of strings. [`strategy` blocks also accept a `nil` return
(to simplify early returns) but this was already standardized across
strategies.]
While some strategies only identify one version by default (where a
string is an appropriate return type), it could be that a strategy
block identifies more than one version. In this situation, the
strategy would need to be modified to accept (and work with) an
array from a `strategy` block.
Rather than waiting for this to become a problem, this modifies all
strategies to standardize on allowing `strategy` blocks to return a
string or array of strings (even if only one of these is currently
used in practice). Standardizing valid return types helps to further
simplify the mental model for `strategy` blocks and reduce cognitive
load.
This commit extracts related logic from `#find_versions` into
methods like `#versions_from_content`, which is conceptually similar
to `PageMatch#page_matches` (renamed to `#versions_from_content`
for consistency). This allows us to write tests for the related code
without having to make network requests (or stub them) at this point.
In general, this also helps to better align the structure of
strategies and how the various `#find_versions` methods work with
versions.
There's still more planned work to be done here but this is a step
in the right direction.
Up to this point, we've had to rely on making `Strategy` constants
private to ensure that the only available constants are strategies.
With the current setup, the existence of a constant that's not a
strategy would break `Strategy#strategies` and
`Livecheck#livecheck_strategy_names`.
Instead, we can achieve the same goal by skipping over constants
that aren't a class. Other than saving us from having to make these
constants private, this is necessary to be able to create a
`Strategy` constant that can be used in all strategies.
The simple approach here caches all header or body content from
responses, so memory usage continually grows with each fetch. This
becomes more of a notable issue with long livecheck runs (e.g.,
`--tap homebrew/core`).
Instead, we should only cache the header/body for URLs that we know
will be fetched more than once in a given run. Being able to
determine which URLs will be fetched more than once requires
structural changes within livecheck strategies, so this will take a
bit of work to implement.
I've been working on this off and on and I'll introduce a more
sophisticated method of livecheck-wide caching in a later PR. In the
interim time, it's best to remove this caching behavior until I've
finished working on an approach that provides benefits (reducing
duplicate fetches) while minimizing detriments (increased memory
usage).