Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 support #426

Closed
wants to merge 101 commits into from
Closed
Show file tree
Hide file tree
Changes from 100 commits
Commits
Show all changes
101 commits
Select commit Hold shift + click to select a range
32130c2
clean up
madsbk Jul 30, 2024
4a5884c
benchmark: use cudf.set_option
madsbk Aug 13, 2024
804fd99
revert some minor changes
madsbk Aug 13, 2024
bbbe637
doc
madsbk Aug 13, 2024
a752eee
doc
madsbk Aug 13, 2024
ce4218a
doc
madsbk Aug 13, 2024
5ba191d
doc
madsbk Aug 13, 2024
5126b44
--bundled-server-lifetime
madsbk Aug 13, 2024
185e687
clean up
madsbk Aug 13, 2024
502f7cf
doc
madsbk Aug 13, 2024
740a15d
cleanup
madsbk Aug 13, 2024
7fbabdf
doc
madsbk Aug 13, 2024
5b41e74
doc
madsbk Aug 13, 2024
555903d
doc
madsbk Aug 16, 2024
69f7fd9
Re-run CI
KyleFromNVIDIA Aug 16, 2024
39ca408
Fix CMake code
KyleFromNVIDIA Aug 16, 2024
cce5b1b
Merge branch 'branch-24.10' into s3_support
KyleFromNVIDIA Aug 16, 2024
1c601ad
Style
KyleFromNVIDIA Aug 16, 2024
551fe20
Install moto server support for testing
KyleFromNVIDIA Aug 16, 2024
b0ff3d1
Install cudf for testing
KyleFromNVIDIA Aug 16, 2024
afb2dae
Merge branch 'branch-24.10' of https://github.com/rapidsai/kvikio int…
madsbk Aug 19, 2024
06c0311
doc
madsbk Aug 19, 2024
504f15d
Temporarily remove cudf test dependency
KyleFromNVIDIA Aug 19, 2024
2de3486
Merge branch 'branch-24.10' of https://github.com/rapidsai/kvikio int…
madsbk Aug 20, 2024
5f07f2e
test: import boto3 and moto
madsbk Aug 20, 2024
f391ad1
cleanup
madsbk Aug 20, 2024
247ab4d
clean up
madsbk Aug 20, 2024
9adffc1
Apply suggestions from code review
madsbk Aug 20, 2024
8b4436d
client(): not shared pointer
madsbk Aug 20, 2024
2287c9b
Merge branch 's3_support' of github.com:madsbk/kvikio into s3_support
madsbk Aug 20, 2024
e830f62
style
madsbk Aug 20, 2024
bff52a4
S3Context::get_file_size
madsbk Aug 20, 2024
f6e7ebc
move S3Context out of detail
madsbk Aug 20, 2024
3f8291a
cpp_RemoteHandle
madsbk Aug 21, 2024
2607ca1
_get_remote_module
madsbk Aug 21, 2024
6f3c25d
make S3Context public
madsbk Aug 21, 2024
149d6be
call Aws::ShutdownAPI()
madsbk Aug 21, 2024
98991b3
clean up
madsbk Aug 21, 2024
daac52f
doc
madsbk Aug 21, 2024
ba6925c
doc
madsbk Aug 21, 2024
c635519
S3Context: endpoint_override
madsbk Aug 22, 2024
b8bfef2
examples/aws_s3.py
madsbk Aug 22, 2024
4c4b36e
doc
madsbk Aug 22, 2024
e489bbe
doc
madsbk Aug 22, 2024
73d9104
RemoteHandle: no default ctor
madsbk Aug 22, 2024
b4ac1d7
cleanup
madsbk Aug 22, 2024
d9ba372
Merge branch 'branch-24.10' of https://github.com/rapidsai/kvikio int…
madsbk Aug 22, 2024
146efc0
KVIKIO_AWS_SDK_FOUND
madsbk Aug 22, 2024
b19a146
cmake: --no-s3
madsbk Aug 23, 2024
4cb54a9
move KVIKIO_AWS_SDK_FOUND up before aws include
madsbk Aug 23, 2024
379f521
parse_s3_path tests
madsbk Aug 23, 2024
8849d11
S3Context::client(): mark const
madsbk Aug 23, 2024
8c54a6c
east const
madsbk Aug 23, 2024
bb0ad24
typo
madsbk Aug 23, 2024
33b9227
Merge branch 'branch-24.10' of https://github.com/rapidsai/kvikio int…
madsbk Aug 23, 2024
2e5ff0b
aws-sdk-cpp>=1.11.267, we need <https://github.com/aws/aws-sdk-cpp/pu…
madsbk Aug 23, 2024
378e9b6
doc
madsbk Aug 23, 2024
496fe8d
doc
madsbk Aug 23, 2024
ea2d93c
CI: disable devcontainer until #446 has been fixed.
madsbk Aug 23, 2024
3bb3ce1
yaml: remove some aws-sdk-cpp
madsbk Aug 26, 2024
b9b2c44
ci: comment out devcontainer
madsbk Aug 26, 2024
fa4f298
Merge branch 'branch-24.10' of https://github.com/rapidsai/kvikio int…
madsbk Aug 27, 2024
c483b68
AllocRetain::ensure_alloc_size()
madsbk Aug 27, 2024
ce0c59a
add task_size argument
madsbk Aug 27, 2024
bbb4042
harmonic_mean
madsbk Aug 28, 2024
f363f15
use MonkeyPatch
madsbk Aug 28, 2024
8129d17
Apply suggestions from code review
madsbk Aug 28, 2024
37b294c
Merge branch 's3_support' of github.com:madsbk/kvikio into s3_support
madsbk Aug 28, 2024
fb95490
doc
madsbk Aug 28, 2024
2729dbf
test: 10mins timeout
madsbk Aug 28, 2024
ce08365
doc
madsbk Aug 28, 2024
e800233
Merge branch 'branch-24.10' of https://github.com/rapidsai/kvikio int…
madsbk Aug 28, 2024
a60011a
Get aws-sdk-cpp through CPM
KyleFromNVIDIA Aug 28, 2024
0d68e49
Enable devcontainers
KyleFromNVIDIA Aug 28, 2024
72812cc
Formatting
KyleFromNVIDIA Aug 28, 2024
c4ccd9e
pr-builder
KyleFromNVIDIA Aug 28, 2024
4f765d0
Style
KyleFromNVIDIA Aug 28, 2024
355d192
Argument ordering
KyleFromNVIDIA Aug 28, 2024
500681a
Style
KyleFromNVIDIA Aug 28, 2024
d2a49c9
Relax aws-sdk-cpp version
KyleFromNVIDIA Aug 28, 2024
bd83655
Find debug
KyleFromNVIDIA Aug 29, 2024
2687cc6
CPM_USE_LOCAL_PACKAGES
KyleFromNVIDIA Aug 29, 2024
8fe82b9
No debug find
KyleFromNVIDIA Aug 29, 2024
fdb9a59
Fix AWS linking
KyleFromNVIDIA Aug 29, 2024
4e4389f
Re-run CI
KyleFromNVIDIA Aug 29, 2024
cebc0a8
Install libcurl4-openssl-dev for pip devcontainers
KyleFromNVIDIA Aug 29, 2024
61a5d50
Link against aws-cpp-sdk-core
KyleFromNVIDIA Aug 29, 2024
d55030a
Try excluding curl
KyleFromNVIDIA Aug 29, 2024
2123666
Style
KyleFromNVIDIA Aug 29, 2024
0c2f09b
Merge branch 'branch-24.10' into s3_support
KyleFromNVIDIA Aug 30, 2024
bb445a3
Try AWS's HTTP client
KyleFromNVIDIA Aug 30, 2024
4701b52
No need to install libcurl
KyleFromNVIDIA Aug 30, 2024
2bd411e
libcudf_s3_io
madsbk Sep 3, 2024
b40a0ad
Merge branch 'branch-24.10' of https://github.com/rapidsai/kvikio int…
madsbk Sep 3, 2024
7bcfe5b
Merge branch 'branch-24.10' into s3_support
madsbk Sep 4, 2024
2a9f934
Merge branch 'branch-24.10' of https://github.com/rapidsai/kvikio int…
madsbk Sep 6, 2024
b978323
test: ignore deprecation warning
madsbk Sep 6, 2024
00304e7
test: remove the kvikio._lib.remote_handle trigger
madsbk Sep 6, 2024
f9b49f2
Revert "test: ignore deprecation warning"
madsbk Sep 6, 2024
0403686
pytest: ignore deprecation warning in botocore
madsbk Sep 6, 2024
0001265
Apply suggestions from code review
madsbk Sep 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 81 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,89 @@ The C++ library is header-only making it easy to include in [existing projects](
* A Python [Zarr](https://zarr.readthedocs.io/en/stable/) backend for reading and writing GPU data to file seamlessly.
* Concurrent reads and writes using an internal thread pool.
* Non-blocking API.
* Handle both host and device IO seamlessly.
* Read/write to both host and device memory seamlessly.
* Provides compile-time optional remote read from AWS S3 storage seamlessly, using the [AWS SDK](https://docs.aws.amazon.com/sdkref/latest/guide/overview.html).
* Provides Python bindings to [nvCOMP](https://github.com/NVIDIA/nvcomp).


### Documentation
* Python: <https://docs.rapids.ai/api/kvikio/nightly/>
* C++: <https://docs.rapids.ai/api/libkvikio/nightly/>


### Examples

#### Python
```python
import cupy
import kvikio

def main(path):
a = cupy.arange(100)
f = kvikio.CuFile(path, "w")
# Write whole array to file
f.write(a)
f.close()

b = cupy.empty_like(a)
f = kvikio.CuFile(path, "r")
# Read whole array from file
f.read(b)
assert all(a == b)

# Use contexmanager
c = cupy.empty_like(a)
with kvikio.CuFile(path, "r") as f:
f.read(c)
assert all(a == c)

# Non-blocking read
d = cupy.empty_like(a)
with kvikio.CuFile(path, "r") as f:
future1 = f.pread(d[:50])
future2 = f.pread(d[50:], file_offset=d[:50].nbytes)
future1.get() # Wait for first read
future2.get() # Wait for second read
assert all(a == d)


if __name__ == "__main__":
main("/tmp/kvikio-hello-world-file")
```

#### C++
```c++
#include <cstddef>
#include <cuda_runtime.h>
#include <kvikio/file_handle.hpp>
using namespace std;

int main()
{
// Create two arrays `a` and `b`
constexpr std::size_t size = 100;
void *a = nullptr;
void *b = nullptr;
cudaMalloc(&a, size);
cudaMalloc(&b, size);

// Write `a` to file
kvikio::FileHandle fw("test-file", "w");
size_t written = fw.write(a, size);
fw.close();

// Read file into `b`
kvikio::FileHandle fr("test-file", "r");
size_t read = fr.read(b, size);
fr.close();

// Read file into `b` in parallel using 16 threads
kvikio::default_thread_pool::reset(16);
{
kvikio::FileHandle f("test-file", "r");
future<size_t> future = f.pread(b_dev, sizeof(a), 0); // Non-blocking
size_t read = future.get(); // Blocking
// Notice, `f` closes automatically on destruction.
}
}
```
12 changes: 9 additions & 3 deletions build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,16 @@ ARGS=$*
# script, and that this script resides in the repo dir!
REPODIR=$(cd $(dirname $0); pwd)

VALIDARGS="clean libkvikio kvikio -v -g -n --pydevelop -h"
HELP="$0 [clean] [libkvikio] [kvikio] [-v] [-g] [-n] [--cmake-args=\"<args>\"] [-h]
VALIDARGS="clean libkvikio kvikio -v -g -n --pydevelop --no-s3 -h"
HELP="$0 [clean] [libkvikio] [kvikio] [--no-s3] [-v] [-g] [-n] [--pydevelop] [--cmake-args=\"<args>\"] [-h]
clean - remove all existing build artifacts and configuration (start over)
libkvikio - build and install the libkvikio C++ code
kvikio - build and install the kvikio Python package (requires libkvikio)
--no-s3 - build with no AWS S3 support
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this just be defined as a part of --cmake-args, with no special --no-s3 flag? It is possible for users to specify the option via --cmake-args. If the user gives a value there, we probably want to respect that with higher priority than defaulting to "ON" based on the absence of a --no-s3 flag.

-v - verbose build mode
-g - build for debug
-n - no install step
--pydevelop - Install Python packages in editable mode
--pydevelop - install Python packages in editable mode
--cmake-args=\\\"<args>\\\" - pass arbitrary list of CMake configuration options (escape all quotes in argument)
-h - print this text
default action (no args) is to build and install 'libkvikio' and 'kvikio' targets
Expand All @@ -36,6 +37,7 @@ KVIKIO_BUILD_DIR="${REPODIR}/python/kvikio/build/"
BUILD_DIRS="${LIBKVIKIO_BUILD_DIR} ${KVIKIO_BUILD_DIR}"

# Set defaults for vars modified by flags to this script
ENABLE_S3_SUPPORT="-DKvikIO_AWSSDK_SUPPORT=ON"
VERBOSE_FLAG=""
BUILD_TYPE=Release
INSTALL_TARGET=install
Expand Down Expand Up @@ -86,6 +88,7 @@ function ensureCMakeRan {
cmake -B "${LIBKVIKIO_BUILD_DIR}" -S . \
-DCMAKE_INSTALL_PREFIX="${INSTALL_PREFIX}" \
-DCMAKE_BUILD_TYPE=${BUILD_TYPE} \
${ENABLE_S3_SUPPORT} \
${EXTRA_CMAKE_ARGS}
RAN_CMAKE=1
fi
Expand All @@ -109,6 +112,9 @@ if (( ${NUMARGS} != 0 )); then
fi

# Process flags
if hasArg --no-s3; then
ENABLE_S3_SUPPORT="-DKvikIO_AWSSDK_SUPPORT=OFF"
fi
if hasArg -v; then
VERBOSE_FLAG=-v
export SKBUILD_BUILD_VERBOSE=true
Expand Down
3 changes: 3 additions & 0 deletions conda/environments/all_cuda-118_arch-aarch64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ channels:
- conda-forge
- nvidia
dependencies:
- aws-sdk-cpp>=1.11.267
- boto3>=1.21.21
- c-compiler
- cmake>=3.26.4,!=3.30.0
- cuda-python>=11.7.1,<12.0a0
Expand All @@ -17,6 +19,7 @@ dependencies:
- dask>=2022.05.2
- doxygen=1.9.1
- gcc_linux-aarch64=11.*
- moto>=4.0.8
- ninja
- numcodecs !=0.12.0
- numpy>=1.23,<3.0a0
Expand Down
3 changes: 3 additions & 0 deletions conda/environments/all_cuda-118_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ channels:
- conda-forge
- nvidia
dependencies:
- aws-sdk-cpp>=1.11.267
- boto3>=1.21.21
- c-compiler
- cmake>=3.26.4,!=3.30.0
- cuda-python>=11.7.1,<12.0a0
Expand All @@ -19,6 +21,7 @@ dependencies:
- gcc_linux-64=11.*
- libcufile-dev=1.4.0.31
- libcufile=1.4.0.31
- moto>=4.0.8
- ninja
- numcodecs !=0.12.0
- numpy>=1.23,<3.0a0
Expand Down
3 changes: 3 additions & 0 deletions conda/environments/all_cuda-125_arch-aarch64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ channels:
- conda-forge
- nvidia
dependencies:
- aws-sdk-cpp>=1.11.267
- boto3>=1.21.21
- c-compiler
- cmake>=3.26.4,!=3.30.0
- cuda-nvcc
Expand All @@ -18,6 +20,7 @@ dependencies:
- doxygen=1.9.1
- gcc_linux-aarch64=11.*
- libcufile-dev
- moto>=4.0.8
- ninja
- numcodecs !=0.12.0
- numpy>=1.23,<3.0a0
Expand Down
3 changes: 3 additions & 0 deletions conda/environments/all_cuda-125_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ channels:
- conda-forge
- nvidia
dependencies:
- aws-sdk-cpp>=1.11.267
- boto3>=1.21.21
- c-compiler
- cmake>=3.26.4,!=3.30.0
- cuda-nvcc
Expand All @@ -18,6 +20,7 @@ dependencies:
- doxygen=1.9.1
- gcc_linux-64=11.*
- libcufile-dev
- moto>=4.0.8
- ninja
- numcodecs !=0.12.0
- numpy>=1.23,<3.0a0
Expand Down
2 changes: 2 additions & 0 deletions conda/recipes/kvikio/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -64,11 +64,13 @@ requirements:
- rapids-build-backend >=0.3.0,<0.4.0.dev0
- scikit-build-core >=0.10.0
- libkvikio ={{ version }}
- aws-sdk-cpp>=1.11.267
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ We will have to be very careful with this pinning. We must match conda-forge exactly because aws-sdk-cpp requires a nearly-exact match at runtime (version x.x.x) and we need compatibility with conda-forge. Note that we will need to maintain this pinning very actively and align with conda-forge for every RAPIDS release, or we risk breaking environment compatibility with conda-forge. This is a large part of why I am hesitant on adding this dependency -- it constrains us tightly, and we do not benefit from the automated migrations that are used for conda-forge recipes to "rebuild the world" when pinnings like aws-sdk-cpp are updated.

We always need to be matching this line: https://github.com/conda-forge/conda-forge-pinning-feedstock/blob/2e592a612d925988747cd6daa9e271328ceb3bfc/recipe/conda_build_config.yaml#L268

Suggested change
- aws-sdk-cpp>=1.11.267
- aws-sdk-cpp==1.11.379

Copy link
Member Author

@madsbk madsbk Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my ignorance, but why do we need an exact match? Naively, I would have thought that a lower-limit would be more flexible? Do all packages in conda-forge needs to maintain this pinning if they use aws-sdk-cpp?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdice, would it help if we remove the version requirement?

Copy link
Contributor

@bdice bdice Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First I'll explain the migration logic. Then I'll answer the questions you posed about other options.

Migrations and why they're needed

The aws-sdk-cpp release cadence is very frequent. Some of those releases (there's a huge number of them) are picked up by conda-forge, but taking the new version requires "rebuilding the world" because aws-sdk-cpp is a core C++ dependency of many packages (see mamba repoquery whoneeds -c conda-forge aws-sdk-cpp). Updates happen via a conda-forge migration (tracked here) which rebuild everything that depends on aws-sdk-cpp.

See the number of recent AWS-related rebuilds of libarrow, for instance:
https://github.com/conda-forge/arrow-cpp-feedstock/commits/main/recipe/meta.yaml

I'll ignore the aws-crt-cpp and other rebuilds, and only focus on aws-sdk-cpp because that's the dependency we are introducing here.

In the past 12 months, there have been 6 rebuilds of libarrow for new versions of aws-sdk-cpp (1.11.379, 1.11.329, 1.11.267, 1.11.242, 1.11.210, 1.11.182).

Every time that conda-forge migrates to a new version, we must use the same version so that RAPIDS (KvikIO specifically) can be installed alongside the latest conda-forge packages.

Try this:

mamba create -n test --dry-run libarrow

Note that aws-sdk-cpp 1.11.379 is included in the environment because there has been a recent rebuild of libarrow.

Can we use a lower limit (or no version pinning)?

If we were to use a lower limit (or no version pinning), kvikio would use the latest aws-sdk-cpp when it is built. That would impose a runtime pinning on the latest aws-sdk-cpp. However, if a conda-forge migration has not happened yet (or is incomplete), those kvikio packages would be incompatible with the latest conda-forge packages of things like libarrow. We need to match the migrators so we don't get "latest aws-sdk-cpp" and instead align with "current version used by conda-forge's ecosystem-wide pinnings". Typically conda-forge maintainers will start a migration as soon as a new aws-sdk-cpp is released, but that's not guaranteed to happen immediately. Sometimes migrations are delayed or skipped, which would leave our packages unusable if we had picked up an aws-sdk-cpp version that conda-forge did not adopt with a migration.

So what do we do?

RAPIDS is not built with conda-forge infrastructure, but aims to be compatible with the conda-forge ecosystem. We do not benefit from conda-forge's automation for version migrations but we must track them in order to be compatible. (The conda-forge bots that "rebuild the world" do not rebuild RAPIDS recipes, but maybe we could create such a tool.)

To remain compatible with conda-forge, we need to match the exact pinnings for core packages like fmt, spdlog, aws-sdk-cpp, and others. We have a lot of pain from matching conda-forge migrations for fmt and spdlog (see Keith's warning and this problem in cuspatial that comes from us being behind on spdlog/fmt migrations)

Our best course of action for now is to:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened rapidsai/build-planning#100 to think more about conda-forge migration support in RAPIDS.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, thanks for the explanation @bdice! I now see why you were hesitant from the get-go 😊

What if we move to use libcurl instead? They take API and ABI stability very serious: https://curl.se/libcurl/features.html#stableapi

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually thinking of using libcurl from the start since we only need the very basic S3 operations. I don’t think it will be too hard to implement and it will make Azure Blob Storage support straightforward.

Copy link
Contributor

@bdice bdice Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be a better option than aws-sdk-cpp if we can use libcurl. It seems like libcurl is already a dependency of some packages we depend on in RAPIDS, so that gives me more confidence in it. See $ mamba repoquery whoneeds -c conda-forge libcurl 2>&1 | awk '{print $1;}' | sort | uniq.

Copy link
Member Author

@madsbk madsbk Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, I will give libcurl a try

run:
- python
- numpy >=1.23,<3.0a0
- cupy >=12.0.0
- zarr
- aws-sdk-cpp>=1.11.267
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not list a run dependency, this will be handled by run-exports: https://github.com/conda-forge/aws-sdk-cpp-feedstock/blob/f82d968670bdb9939ed7604a5fb7bb4885e2e6ba/recipe/meta.yaml#L14

Suggested change
- aws-sdk-cpp>=1.11.267

# See https://github.com/zarr-developers/numcodecs/pull/475
- numcodecs !=0.12.0
- packaging
Expand Down
5 changes: 5 additions & 0 deletions conda/recipes/libkvikio/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ requirements:
{% else %}
- libcufile-dev # [linux]
{% endif %}
- aws-sdk-cpp>=1.11.267
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Match conda-forge.

Suggested change
- aws-sdk-cpp>=1.11.267
- aws-sdk-cpp==1.11.379


outputs:
- name: libkvikio
Expand Down Expand Up @@ -83,6 +84,7 @@ outputs:
{% else %}
- libcufile-dev # [linux]
{% endif %}
- aws-sdk-cpp>=1.11.267
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this and add a host dependency on aws-sdk-cpp==1.11.379 here so that we can have compatible runtime pinnings inserted automatically by the run-exports.

test:
commands:
- test -f $PREFIX/include/kvikio/file_handle.hpp
Expand All @@ -106,6 +108,7 @@ outputs:
- cuda-cudart-dev
- libcufile-dev # [linux]
{% endif %}
- aws-sdk-cpp>=1.11.267
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this ignoring run exports? I'm confused overall by the changes in libkvikio-tests. Does libkvikio-tests not depend on libkvikio, even as a host dependency to build the standalone tests?

Perhaps this needs to use aws-sdk-cpp==1.11.379 in host, with nothing in run, and don't ignore the run-exports of aws-sdk-cpp.

requirements:
build:
- cmake {{ cmake_version }}
Expand All @@ -118,6 +121,7 @@ outputs:
- cuda-cudart-dev
- libcufile-dev # [linux]
{% endif %}
- aws-sdk-cpp>=1.11.267
run:
- {{ pin_compatible('cuda-version', max_pin='x', min_pin='x') }}
{% if cuda_major == "11" %}
Expand All @@ -127,6 +131,7 @@ outputs:
- cuda-cudart
- libcufile # [linux]
{% endif %}
- aws-sdk-cpp>=1.11.267
about:
home: https://rapids.ai
license: Apache-2.0
Expand Down
34 changes: 34 additions & 0 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ rapids_cmake_build_type(Release)

# build options
option(KvikIO_BUILD_EXAMPLES "Configure CMake to build examples" ON)
option(KvikIO_AWSSDK_SUPPORT "Configure CMake to build with AWS S3 support" ON)

rapids_cmake_support_conda_env(conda_env MODIFY_PREFIX_PATH)

Expand All @@ -55,6 +56,27 @@ rapids_find_package(
INSTALL_EXPORT_SET kvikio-exports
)

# If AWSSDK isn't found, the Cython module remote_handle.pyx isn't built and C++ users shouldn't
# include <kvikio/remote_handle.hpp>
if(KvikIO_AWSSDK_SUPPORT)
include(cmake/thirdparty/get_aws_sdk_cpp.cmake)
endif()

if(TARGET aws-cpp-sdk-s3)
get_property(
_lib_type
TARGET aws-cpp-sdk-s3
PROPERTY TYPE
)
if(_lib_type STREQUAL "STATIC_LIBRARY")
rapids_find_package(
ZLIB
BUILD_EXPORT_SET kvikio-exports
INSTALL_EXPORT_SET kvikio-exports
)
endif()
endif()

rapids_find_package(
cuFile
BUILD_EXPORT_SET kvikio-exports
Expand Down Expand Up @@ -131,6 +153,10 @@ target_include_directories(
target_link_libraries(
kvikio INTERFACE Threads::Threads ${CMAKE_DL_LIBS} nvtx3::nvtx3-cpp BS::thread_pool
)
if(TARGET aws-cpp-sdk-s3)
target_link_libraries(kvikio INTERFACE $<BUILD_LOCAL_INTERFACE:aws-cpp-sdk-s3>)
target_compile_definitions(kvikio INTERFACE $<BUILD_LOCAL_INTERFACE:KVIKIO_AWS_SDK_FOUND>)
endif()
target_compile_features(kvikio INTERFACE cxx_std_17)

# optionally build examples
Expand Down Expand Up @@ -217,6 +243,14 @@ if(NOT already_set_kvikio)
target_compile_definitions(kvikio::kvikio INTERFACE KVIKIO_CUFILE_STREAM_API_FOUND)
endif()
endif()

if(KvikIO_AWSSDK_SUPPORT)
find_package(AWSSDK COMPONENTS s3 QUIET)
endif()
if(TARGET aws-cpp-sdk-s3)
target_link_libraries(kvikio::kvikio INTERFACE aws-cpp-sdk-s3)
target_compile_definitions(kvikio::kvikio INTERFACE $<BUILD_LOCAL_INTERFACE:KVIKIO_AWS_SDK_FOUND>)
endif()
endif()
]=]
)
Expand Down
45 changes: 45 additions & 0 deletions cpp/cmake/thirdparty/get_aws_sdk_cpp.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# =============================================================================
# Copyright (c) 2024, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
# in compliance with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software distributed under the License
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
# or implied. See the License for the specific language governing permissions and limitations under
# the License.
# =============================================================================

# This function finds aws-sdk-cpp and sets any additional necessary environment variables.
function(find_and_configure_aws_sdk_cpp)
include(${rapids-cmake-dir}/cpm/find.cmake)

# Attempt to use find_package() - the patch is only needed if building from source
set(CPM_USE_LOCAL_PACKAGES ON)

rapids_cpm_find(
AWSSDK 1.11.267
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must match conda-forge.

Suggested change
AWSSDK 1.11.267
AWSSDK 1.11.379

GLOBAL_TARGETS aws-cpp-sdk-s3 COMPONENTS S3
BUILD_EXPORT_SET kvikio-exports
INSTALL_EXPORT_SET kvikio-exports
CPM_ARGS
GIT_REPOSITORY https://github.com/aws/aws-sdk-cpp.git
GIT_TAG 1.11.393
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably needs to match the version above, too?

Suggested change
GIT_TAG 1.11.393
GIT_TAG 1.11.379

PATCH_COMMAND
${CMAKE_COMMAND}
-E
env
GIT_COMMITTER_NAME=rapids-cmake
GIT_COMMITTER_EMAIL=rapids.cmake@rapids.ai
git
am
--no-gpg-sign
${CMAKE_CURRENT_LIST_DIR}/patches/aws-sdk-cpp/0001-Don-t-set-CMP0077-to-OLD.patch
OPTIONS "BUILD_ONLY s3" "BUILD_SHARED_LIBS OFF" "ENABLE_TESTING OFF" "ENABLE_UNITY_BUILD ON"
"USE_CRT_HTTP_CLIENT ON"
)
endfunction()

find_and_configure_aws_sdk_cpp()
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
From 7b24166a73e422e65b725ffcb0acd20ab493fac0 Mon Sep 17 00:00:00 2001
From: Kyle Edwards <kyedwards@nvidia.com>
Date: Wed, 28 Aug 2024 15:32:07 -0400
Subject: [PATCH] Don't set CMP0077 to OLD

---
CMakeLists.txt | 4 ----
1 file changed, 4 deletions(-)

diff --git a/CMakeLists.txt b/CMakeLists.txt
index c17ff8a07b1..b30bc81b6df 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -13,10 +13,6 @@ if (LEGACY_BUILD)
"update the build flags as mentioned in README.md and set -DLEGACY_BUILD=OFF. "
"The legacy support will be removed at 1.12.0 release.")

- if (POLICY CMP0077)
- cmake_policy(SET CMP0077 OLD) # CMP0077: option() honors normal variables. Introduced in 3.13
- endif ()
-
get_filename_component(AWS_NATIVE_SDK_ROOT "${CMAKE_CURRENT_SOURCE_DIR}" ABSOLUTE)

# Cmake invocation variables:
--
2.34.1
Loading