`#curl_headers` was recently introduced into `Strategy#page_headers`
but only the call was modified and the method wasn't updated to
correctly work with the new return value, so all `HeaderMatch` checks
immediately started failing with an error.
This commit includes changes that return `#page_headers` to a working
state. I've removed the `result.assert_success!` call because it
prevents a few checks from being retried with `GET` (`firefox-cn`,
`krisp`, `prepros`).
- Some casks have URL arguments like "referer" (spelled wrong, that's
intentional in the HTTP spec).
- The audit for one such cask, `iThoughtsX`, was failing because the
"referer" wasn't getting passed through to cURL so the access would
404.
----
Before:
```
❯ brew audit --cask --online --appcast --signing 'ithoughtsx'
[...]
audit for ithoughtsx: failed
- The binary URL https://cdn.toketaware.com?download=iThoughtsX.zip is not reachable (HTTP status code 404)
- Version '9.2.0' differs from '9.3.0' retrieved by livecheck.
- Version '9.2.0' differs from '9.3.0' retrieved by livecheck.
Error: 2 problems in 1 cask detected
```
After:
```
❯ brew audit --cask --online --appcast --signing 'ithoughtsx'
[...]
audit for ithoughtsx: failed
- Version '9.2.0' differs from '9.3.0' retrieved by livecheck.
- Version '9.2.0' differs from '9.3.0' retrieved by livecheck.
Error: 1 problem in 1 cask detected
```
Update base URL when there is an absolute location, so that following
relative locations are considered relative to the new base.
Consider below cURL output for https://example_one.com:
HTTP/1.1 302 Moved Temporarily
Location: https://example_two.com
HTTP/1.1 302 Moved Temporarily
Location: /foo/
HTTP/1.1 200 OK
The final URL should be https://example_two.com/foo/ rather than
https://example_one.com/foo/.
The response from a URL protected by Cloudflare may only provide a
relevant cookie on the first response but
`#curl_http_content_headers_and_checksum` only returns the headers of
the final response. In this scenario, `#curl_check_http_content` isn't
able to properly detect the protected URL and this is surfaced as an
error instead of skipping the URL.
This resolves the issue by including the array of response hashes in
the return value from `#curl_http_content_headers_and_checksum`, so
we can check all the responses in `#curl_check_http_content`.
The return hash from `#curl_http_content_headers_and_checksum`
contains a `:status`, which is the status code of the last response.
This string value comes from `#parse_curl_response`, where the key is
`:status_code` instead.
Aligning these keys technically allows us to pass either of these
hashes to the `#url_protected_by_*` methods, as both contain
`:status_code` and `:headers` in the expected format.
Before `#parse_curl_output` was introduced and related methods were
updated to use it, `#url_protected_by_cloudflare?` and
`#url_protected_by_incapsula?` were checking a string of all the
headers from a response and using a regex to check related header
values.
However, when `#curl_http_content_headers_and_checksum` was updated
to use `#parse_curl_output` internally, the `:headers` value became
a hash generated by `#parse_curl_response`. The `#url_protected_by_*`
methods were updated to work with the hash value but this wasn't able
to fully replicate the previous behavior because
`#parse_curl_response` was only keeping the last instance of a given
header (maintaining pre-existing behavior). This is an issue for
these methods because they check `Set-Cookie` headers and there can
be multiple instances of this header in a response.
This commit updates these methods to handle an array of strings in
addition to the existing string support. This change ensures that
these methods properly check all `Set-Cookie` headers, effectively
reinstating the previous behavior.
Past that, this updates one of the early return values in
`#url_protected_by_cloudflare?` to be `false` instead of an implicit
`nil`. After adding a type signature to this method, it became clear
that it wasn't always returning a boolean value and this fixes it.
`Curl#parse_curl_response` only includes the last instance of a given
header in its `:headers` hash (replicating pre-existing behavior).
This is a problem for headers like `Set-Cookie`, which can appear more
than once in a response.
This commit addresses the issue by collecting duplicate headers into
an array instead. Headers that only appear once in the response will
still have a string value but headers that appear more than once will
be an array of strings. Whenever headers from `#parse_curl_response`
are used (directly or indirectly), it's important to conditionally
handle the expected types.
The `max_iterations` value in `#parse_curl_output` is only intended
to prevent its `while` loop from potentially turning into an endless
loop. This should only come into play in exceptional circumstances
but the current default value (5) is low enough that we're seeing it
under normal circumstances.
`#parse_curl_output` isn't intended to restrict the number of
redirections (this should be done using the `--max-redirs` option in
`curl) but it's effectively doing this in rare cases due to the low
`max_iterations` default. This is a problem because `curl` can
successfully return a response only to have `#parse_curl_output`
error in relation to `max_iterations`.
Originally the code in `#parse_curl_output` was used in the context
of livecheck, where it's not a huge issue if a check fails. However,
now the `#parse_curl_output` method is used in important parts of
brew like `#curl_download`. We've received a report of a download
failing with the "Too many redirects (max = 5)` error, effectively
preventing the user from installing a formula [from a third-party
tap].
Until we can come up with a more adaptive way of bounding this
`while` loop, I think we should simply raise the default to something
that's less likely to be encountered under normal circumstances
(e.g., 25).
The `#curl_http_content_headers_and_checksum` method previously
parsed responses from `curl` output even if `status.success?` wasn't
`true`. A recent commit of mine moved the parsing logic behind this
guard but it's now leading to a "...is not reachable" error when a URL
involves a large download that takes longer than 25 seconds to finish
and hits the timeout.
This commit resolves the issue for the time being by moving related
logic back to its previous location, where it isn't guarded by
`status.success?`.
When its `try_partial` argument is `true`, `#curl_download` makes a
`HEAD` request before downloading the file using `#curl`. Currently
`try_partial` defaults to `true`, so any `#curl_download` call that
doesn't explicitly specify `try_partial: false` will make a `HEAD`
request first. This can potentially involve several requests if the
URL redirects, so it can be a bit of unnecessary overhead when a
partial download isn't needed.
Partial downloads are generally only useful when we're working with
larger files, however there's currently only one place in brew where
`#curl_download` is used and this is the case:
`CurlDownloadStrategy`. The other `#curl_download` calls are fetching
smaller [text] files and don't need to support partial downloads.
This commit changes the default `try_partial` value to `false`,
making partial downloads opt-in rather than opt-out.
We want `try_partial` to continue to default to `true` in
`CurlDownloadStrategy` and there are various ways to accomplish this.
In this commit, I've chosen to update its `#initialize` method to
accept a `try_partial` argument that defaults to `true`, as this
value can also be used in classes that inherit from
`CurlDownloadStrategy` (e.g., `HomebrewCurlDownloadStrategy`). This
instance variable is passed to `#curl_download` in related methods,
effectively maintaining the previous `try_partial: true` value, while
also allowing this value to be overridden when necessary.
Other uses of `#curl_download` in brew are
`Formulary::FromUrlLoader#load_file` and
`Cask::CaskLoader::FromURILoader#load`, which did not provide a
`try_partial` argument but should have been using
`try_partial: false`. With the `try_partial: false` default in this
commit, these calls are now fine without a `try_partial` argument.
The only other use of `#curl_download` in brew is
`SPDX#download_latest_license_data!`. These calls were previously
using `try_partial: false` but we can now omit this argument with
the new `false` default (aligning with the above).
In cases where there may be more than five responses in `curl`
output to parse, we need to be able to control the `max_iterations`
of the `while` loop in `#parse_curl_output` to properly parse all
the responses.
For example, if we pass `--max-redirs 5` to `curl` and there are
exactly five redirections before the final response, the output would
contain a total of six responses and `#parse_curl_output` wouldn't
properly handle this (it would give a `Too many redirects` error).
`max_iterations` should be the maximum number of redirections + 1
(to account for any final response after the redirections), so we
need to be able to override this value when necessary.