Releases: Bears-R-Us/arkouda
v2023.06.16
Bug Fixes
- Issue #2481 - Fixes Multi-Column Parquet not handling Empty Files Properly
- Issue #2506 -Fixes
Categorical
Optional Components Required Bug - Issue #2414 - Fixes
overMemLimit
calc error
Major Updates
- Issues #2424 and #2432 - Adds
Strings
value support forSegArray
- Issue #2443 - Read/Write
SegArray
ofStrings
for HDF5 - Issue #2444 - Adds
SegArray
withStrings
Values Parquet Support (Does not include Multi-Colmn) - Issue #2386 - Read/Write support for
GroupBy
objects in HDF5 - Issues #2434, #2459, #2462, and #2463 - Adds hashing support for
Segarray
,Strings
,Categorical
,BigInt
- Issues #2006, #2032, #2416, and #2431 -
BigInt
Support Improvements - Issue #2304 - Adds
inner_join
onStrings
andCategorical
- Issue #2417 -
Filename_Codes
matchCategorical.codes
- Issue #2425 - Import/Export lists from/to pandas
Minor Updates
- Issue #2454 - Updates
SegArray.__getitem__
to Always Returnpdarray
- Issue #2418 - Adds instructions to set max per-locale CPU cores and memory
- Issue #2433 - Updates
GroupBy
Object to only be client side
Auto-Generated Release Notes
* Adjust these modules to avoid deprecation warnings from non-default Math symbols by @lydia-duncan in https://github.com//pull/2415 * Closes #2412: Update quickstart to v2023.05.05 by @pierce314159 in https://github.com//pull/2413 * Fix overMemLimit calc error by @hokiegeek2 in https://github.com//pull/2421 * Closes #2427 - Deprecation updates related to Memory and Memory.Diagnostics by @jabraham17 in https://github.com//pull/2422 * Closes #2425 - Import/Export lists from/to pandas by @Ethan-DeBandi99 in https://github.com//pull/2428 * Closes #2386 - `GroupBy.to_hdf` & `GroupBy.update_hdf` by @Ethan-DeBandi99 in https://github.com//pull/2426 * add instructions to set max per-locale CPU cores and memory by @hokiegeek2 in https://github.com//pull/2429 * Closes #2416 and #2006: bigint shift performance by @pierce314159 in https://github.com//pull/2423 * Closes #2431: Add bigint broadcast by @pierce314159 in https://github.com//pull/2437 * Closes #2441: Adds missing `use Biginteger` in gt-130 bigint compat by @pierce314159 in https://github.com//pull/2442 * fixed typo by @hokiegeek2 in https://github.com//pull/2448 * Closes #2417 - `Filename_Codes` match `Categorical.codes` by @Ethan-DeBandi99 in https://github.com//pull/2440 * Closes #2445 - Deprecation updates related to string and byte factory functions by @jabraham17 in https://github.com//pull/2446 * Closes #2451: Remove pragma no doc instances in Chapel code by @bmcdonald3 in https://github.com//pull/2452 * Updates for Chapel `list.append` deprecation by @jeremiah-corrado in https://github.com//pull/2450 * Closes #2432 - Revert SegArray to Client Side by @Ethan-DeBandi99 in https://github.com//pull/2439 * Closes #2304: `inner_join` on `Strings` and `Categorical` by @pierce314159 in https://github.com//pull/2453 * Closes #2467: Arrow compilation can fail with clang 15 upgrade changes default PIE by @bmcdonald3 in https://github.com//pull/2468 * Closes #2032 - BigInt Support for HDF5 by @Ethan-DeBandi99 in https://github.com//pull/2460 * Closes #2454 - Update `SegArray.__getitem__` to Always Return `pdarray` by @Ethan-DeBandi99 in https://github.com//pull/2466 * Closes #2424 - Adds `SegArray` support for `Strings` Values by @Ethan-DeBandi99 in https://github.com//pull/2469 * Closes #1211 - Remove TaskErrors Workaround by @Ethan-DeBandi99 in https://github.com//pull/2470 * Closes #2433: GroupBy back to client only by @pierce314159 in https://github.com//pull/2456 * Changes for Chapel `c_memcpy` replacement with `OS.POSIX.memcpy` by @jeremiah-corrado in https://github.com//pull/2479 * Closes #2482 - Fix capitalization of POSIX compatibility module by @jeremiah-corrado in https://github.com//pull/2483 * Closes #2443 - Read/Write SegArray of Strings HDF5 by @Ethan-DeBandi99 in https://github.com//pull/2478 * Closes #2459 and #2434: `ak.hash` for `Segarray` and `Strings` by @pierce314159 in https://github.com//pull/2475 * Fixes #2481 - Multi-Column Parquet does not handle Empty Files Properly by @Ethan-DeBandi99 in https://github.com//pull/2484 * Closes #2436 - Updates `_buildReadAllJSON` to use `ObjType` Enum by @Ethan-DeBandi99 in https://github.com//pull/2486 * Closes #2490: Change `checkInstall` path to be relative to script, not Arkouda by @bmcdonald3 in https://github.com//pull/2491 * Closes #2462: Categorical hashing by @pierce314159 in https://github.com//pull/2487 * Closes #2476: Updates Chapel Tutorial by @pierce314159 in https://github.com//pull/2494 * Design and implement client Channel class hierarchy by @hokiegeek2 in https://github.com//pull/2496 * Closes #2463: Hashing for bigint pdarrays by @pierce314159 in https://github.com//pull/2497 * Closes #2500 - Remove Old Test Prototype by @Ethan-DeBandi99 in https://github.com//pull/2501 * Closes #2502 - Remove ArkoudaWeeklyCall References by @Ethan-DeBandi99 in https://github.com//pull/2503 * Closes #2488: Quiet deprecation warnings in prep for Chapel 1.31 by @bmcdonald3 in https://github.com//pull/2489 * Fixes #2506 - Categorical Optional Components Required Bug by @Ethan-DeBandi99 in https://github.com//pull/2507 * Closes #2444 - SegArray with String Values Parquet Support by @Ethan-DeBandi99 in https://github.com//pull/2492Full Changelog: v2023.05.05...v2023.06.17
Release Notes v2023.05.05
Bug Fixes
- Issue #2398 - Fixes parquet error on list columns containing nested lists
- Issue #2380 - Fixes SegArray register bug
- Issue #2396 - Fixes server crash caused by nested parquet fields
- Issue #2300 - Improves Strings strip performance
New Features
- Issue #2296 - Adds bitops support for bigint pdarays
- Issue #474 - Adds HDF5 overwrite dataset
- Issues #2372 and #2373 - Update Categorical HDF5 format and add
update_hdf
method - Issue #2377 - Adds groupby aggregations that require min/max on bigint
- Issue #1855 - Adds
divmod
support
Minor Updates
- Issue #2355 - Drops support for 1.28
- Issue #2138 - Updates messaging overview docs
- Issue #2368 - Cleans up Strings references in SegArray
Auto-Generated Release Notes
* Closes #2370 - Fixes Deprecation Warnings during `make` by @Ethan-DeBandi99 in https://github.com//pull/2371 * Closes #2368 - Cleans up Strings references in SegArray by @Ethan-DeBandi99 in https://github.com//pull/2369 * Closes #1855: Implement `divmod` by @jaketrookman in https://github.com//pull/2356 * Closes #2374: Pin hdf5 version to 1.12.2 by @pierce314159 in https://github.com//pull/2375 * Closes #2355: Drop support for 1.28 by @bmcdonald3 in https://github.com//pull/2383 * Closes #2377: Add groupby aggregations that require min/max on bigint by @pierce314159 in https://github.com//pull/2378 * Closes #474 - HDF5 Overwrite Dataset by @Ethan-DeBandi99 in https://github.com//pull/2382 * Closes #2300 - `segString` Strip performance issue by @joshmarshall1 in https://github.com//pull/2379 * Fixes #2380: Segarray register bug by @pierce314159 in https://github.com//pull/2392 * Closes #2341: Quiet 131 deprecations by @bmcdonald3 in https://github.com//pull/2342 * Closes #2372 & #2373 - Categorical HDF5 Format Update & `update_hdf` method by @Ethan-DeBandi99 in https://github.com//pull/2394 * Closes #2396 - Fixes Nested parquet fields causing server crash by @Ethan-DeBandi99 in https://github.com//pull/2399 * Closes #2398 - Fixes Parquet List Columns with Nested Lists Error by @Ethan-DeBandi99 in https://github.com//pull/2401 * Closes #2403: Update c_getDatasetNames calls to match new behavior by @bmcdonald3 in https://github.com//pull/2404 * Add a use of the Math module to avoid the anticipated `pi` deprecation warning by @lydia-duncan in https://github.com//pull/2407 * Deprecation updates for `BitOps.popcount` and `bigint.mod` by @jeremiah-corrado in https://github.com//pull/2409 * Closes #2296: bitops support for bigint by @pierce314159 in https://github.com//pull/2408 * Closes #2138 Update messaging overview docs by @jaketrookman in https://github.com//pull/2391 * Deprecation updates for `datetime` and `fromTimestamp` by @bmcdonald3 in https://github.com//pull/2410Full Changelog: v2023.04.07...v2023.05.05
Release Notes v2023.04.07
Bug Fixes
- Issue #2329 - Fixes continued SegArray read performance issues
- Issue #2299 - Fixes file writes reporting success when directory does not exist
- Issue #2297 - Fixes HDF5 single file write stall
- Issue #2327 - Fixes issue loading 16-bit and 32-bit from Parquet
- Issue #2337 - Fixes
ak.DataFrame.to_parquet
with IPv4 columns - Issue #2348 - Fixes
IPv4
removal from DataFrame - Issue #2350 - Fixes DataFrame column subset access
- Issues #2306, #2317 and PR #2354 - Fix
Datetime
component scaling - Issue #2328 - Fixes bug when printing Dataframes containing bigint
- Issue #2307 - Fixes reported memory usage exceeding 100 percent
- Issue #2309 - Fixes error in Groupby.unique on Categorical or Strings
- Issue #2240 - Fixes
ak.coargsort
empty String and Categorical bug - Issue #2347 - Fixes broken links in
README.md
New Features
- Issue #2050 - Adds File of Origin when loading data from Parquet or HDF5. Use
ak.read_tagged_data
to return the file origin information. More information on this function can be found here. - Issue #2295 - Adds SegArray filters, ak.SegArray.filter() allows values to be removed from SegArray.
- Issue #2293- Adds
ak.where
support Strings and Categoricals - Issue #2324 - Adds benchmark documentation
- Issue #2209 - Enhances Arkouda metrics
Minor Updates
- Issue #2015 - Updates Parquet NaN detection on float/double columns to be more efficient
- Issue #2280 - Improves
max_bits
handling - Issues #2319, #2320, #2339 - Update to test framework to
pytest
configuration
Auto-Generated Release Notes
- Recommend Chapel 1.30.0 and use it for CI testing by @ronawho in #2291
- Closes #2299 - Fixes file writes reporting success when directory does not exist by @Ethan-DeBandi99 in #2305
- Closes #2297 - Fixes HDF5 Single File Write Stall by @Ethan-DeBandi99 in #2310
- Closes #2280: Improve
max_bits
handling by @pierce314159 in #2303 - Closes #2311: Reenable bulk transfer for developer builds by @bmcdonald3 in #2312
- Closes #2240:
ak.coargsort
empty String and Categorical bug fix by @jaketrookman in #2308 - Closes #2309: Error in Groupby.unique on Categorical or Strings by @pierce314159 in #2315
- Fixes #2306:
Datetime.date
scaling by @pierce314159 in #2316 - Closes #2023 - Added all compression types to legacy benchmarks by @joshmarshall1 in #2302
- Bump bigint groupby benchmark problem size by @ronawho in #2326
- Closes #2295 - SegArray Filters by @Ethan-DeBandi99 in #2325
- Closes #2329 - Fixes Continued SegArray Read Performance Issues by @Ethan-DeBandi99 in #2330
- Closes #2335 - Update Quickstart to point to v2023.03.24 by @Ethan-DeBandi99 in #2336
- Closes #2050 - File of Origin on Read by @Ethan-DeBandi99 in #2323
- Closes #2327 - Fixes Issue Loading 16-bit and 32-bit from Parquet by @Ethan-DeBandi99 in #2333
- Fixes #2328: Bug when printing Dataframes containing bigint by @pierce314159 in #2331
- Closes #2319 - PyTest Test Suite Restructure by @joshmarshall1 in #2314
- Closes #2320 - Adding Pytest structure to
array_view_test.py
by @joshmarshall1 in #2338 - Closes #2339 - Adding PyTest Structure to
bigint_agg_test.py
by @joshmarshall1 in #2340 - 2307 reported mem used greater than 100 percent by @hokiegeek2 in #2346
- Eliminate some communication from datetime methods by @ronawho in #2354
- Closes #2317: Add ArkoudaTimeShim modules to improve DateTimeMsg performance by @bmcdonald3 in #2353
- Closes #2015 - Efficient NaN read from Parquet by @Ethan-DeBandi99 in #2351
- Closes #2337 - Fixes
ak.DataFrame.to_parquet
with IPv4 Columns by @Ethan-DeBandi99 in #2352 - Closes #2348 - Fixes
IPv4
Removal from DataFrame by @Ethan-DeBandi99 in #2358 - Closes #2293:
ak.where
support Strings and Categoricals by @pierce314159 in #2349 - Closes #2362: Start using Chapel 1.30 for building gh-pages by @bmcdonald3 in #2363
- Closes #2347 - Fixes broken links in
README.md
by @Ethan-DeBandi99 in #2360 - Closes #2350 - Fixes DataFrame Column Subset by @Ethan-DeBandi99 in #2359
- Setting pages to v3 checkout by @Ethan-DeBandi99 in #2364
- Updated to deployment v4 by @Ethan-DeBandi99 in #2365
- Closes #2366: Update graph files after IO benchmark changes by @bmcdonald3 in #2367
- Closes #2324 - Benchmark documentation by @joshmarshall1 in #2361
- 2209 enhance Arkouda metrics by @hokiegeek2 in #2332
Full Changelog: v2023.03.24...v2023.04.07
Release Notes v2023.03.24
Bug Fixes
- Issue #2265 - Fixes bug in
ak.DataFrame.to_parquet
with emptyStrings
Column - Issue #2263 - Fixes bug which caused slow reads of large SegArrays and Strings
- Issue #2214 - Fixes bigint rotate by more than max_bits bug
- Issue #2183 - Fixes index reset in dataframe
get_head_tail
- Issues #2179 and #2199 - Fix OOB error when writing SegArray to HDF5 when locales exceed number of segments
New Features
- Issues #1994, #2212, #2215, #2207 - Add improvements to bigint performance. The most important being aggregation support
- Issues #2119 and #2186 - Add Parquet single-column and multi-column write support for SegArray
- Issue #2216 - Adds simple groupby aggregations on bigint values
- Issue #2083 - Adds
get_filetype
support for CSV - Issue #2292 - Adds ability to write to logs from client
- Issues #2213, #2219, #2220, #2223, #2226, #2231, #2234, #2235, #2239, #2243, #2244, #2246, #2248, #2250, #2252, #2254, #2258, #2261, #2262, #2270, #2272, #2277, #2278 - Transition benchmarks to use pytest framework
Minor Updates
- Issue #2110 - Updates
_buildReadAllJSON
to use Map - Issue #2077 - Remove duplicated
bigint
logic inIndexingMsg.chpl
Auto-Generated Release Notes
- Have code avoid using the "single-statement return exception" by @bradcray in #2155
- Closes #2184 - Updates Quick Start download to v2023.03.01 by @Ethan-DeBandi99 in #2185
- Closes #2119 - Parquet Write SegArray by @Ethan-DeBandi99 in #2187
- Closes #2110 - Updates
_buildReadAllJSON
to use Map by @Ethan-DeBandi99 in #2188 - Fixes #2183: Index reset in dataframe get_head_tail by @pierce314159 in #2189
- Closes #2086 - Cleans up extra
SymEntry
creation calls inSegmentedMsg.chpl
by @joshmarshall1 in #2192 - Closes #2083 - Adds
get_filetype
support for CSV by @Ethan-DeBandi99 in #2190 - Closes #2179 - Fixes OOB Error when writing SegArray to HDF5 by @Ethan-DeBandi99 in #2197
- Closes #2203: Fix bigint shifting negative values for 1.30 release by @bmcdonald3 in #2204
- Closes #2191: Bigint Groupby Benchmark by @pierce314159 in #2198
- Closes #2077 Remove duplicated
bigint
logic inIndexingMsg.chpl
by @jaketrookman in #2081 - Closes #2181: Bigint array transfer benchmark by @pierce314159 in #2205
- Closes #2199 - Fixes
fixupSegBoundaries
Index Bug by @Ethan-DeBandi99 in #2200 - Closes #2207: Create task private copies of values for pdarray=value by @bmcdonald3 in #2208
- Closes #2186 - Adds Parquet Mutli-Column Write SegArray Support by @Ethan-DeBandi99 in #2210
- Fixes #2214: bigint rotate by more than max_bits bug by @pierce314159 in #2217
- Part of #2213 - Configuration & Example of PyTest Benchmark by @Ethan-DeBandi99 in #2218
- Closes #2215: Optimize bigint opequals by @pierce314159 in #2222
- Closes #2220 - PyTest Benchmark for
ak.argsort
by @Ethan-DeBandi99 in #2224 - Closes #2219 - Implementation of v2 aggregation benchmark by @joshmarshall1 in #2221
- Closes #2226 - Array Transfer Pytest Benchmark by @Ethan-DeBandi99 in #2230
- Closes #2231 - PyTest Benchmark for BigInt Conversion by @Ethan-DeBandi99 in #2232
- Closes #2216: Simple groupby aggregations on bigint by @pierce314159 in #2237
- Closes #2234 - PyTest Benchmark for GroupBy by @Ethan-DeBandi99 in #2236
- Closes #2228 - Updates bigint_bitwise_binops benchmark by @joshmarshall1 in #2233
- Closes #2223 -
array-create.py
Benchmark updates by @joshmarshall1 in #2227 - Closes #2239 - PyTest Benchmark for CoArgSort by @Ethan-DeBandi99 in #2241
- Closes #2235 - PyTest Benchmark for stream by @joshmarshall1 in #2238
- Closes #2246 - Pytest Benchmark for Flatten by @Ethan-DeBandi99 in #2247
- Closes #2244 - PyTest Benchmark for Encodings by @Ethan-DeBandi99 in #2245
- Closes #2256 - Add
workflow_dispatch
to CI by @Ethan-DeBandi99 in #2257 - Closes #2248 - Pytest Benchmark for Gather by @Ethan-DeBandi99 in #2259
- Closes #2263 - Fixes Locality of Segment Compution by @Ethan-DeBandi99 in #2267
- Closes #2262 - Pytest Benchmark for Scatter by @Ethan-DeBandi99 in #2264
- Closes 2201: Quiet deprecations in prep for Chapel 1.30 by @bmcdonald3 in #2202
- Closes #2265 - Fixes
ak.DataFrame.to_parquet
with emptyStrings
Column by @Ethan-DeBandi99 in #2269 - Closes #2243 - Initial implementation of dataframe benchmark updates by @joshmarshall1 in #2249
- Closes #2274: Separate bigintInitThrows flag from CHPL_FLAGS for deps by @bmcdonald3 in #2275
- Closes #2258 - Pytest benchmark updates for
reduce.py
by @joshmarshall1 in #2260 - Closes #2250 - PyTest Benchmark for
in1d.py
&str-in1d.py
by @joshmarshall1 in #2251 - Part of #1994: Add some bigint aggregation tests by @ronawho in #2287
- Closes #2284: Remove 129 deprecation warnings introduced by 130 changes by @bmcdonald3 in #2285
- Closes #2252 - No_Op PyTest Benchmark by @joshmarshall1 in #2253
- Closes #2270 - setops pytest benchmark by @joshmarshall1 in #2271
- Add initial aggregation support for
bigint
by @ronawho in #2290 - Closes #2278 - Pytest Benchmark for Substring Search by @Ethan-DeBandi99 in #2281
- Closes #2261 - Implementation of pytest
scan
benchmark by @joshmarshall1 in #2286 - Closes #2132 - Updates HDF5 Code to use Globals by @Ethan-DeBandi99 in #2283
- Closes #2277 - Pytest Benchmark for String Locality by @Ethan-DeBandi99 in #2279
- Closes #2292 - Write Log from Client by @Ethan-DeBandi99 in #2294
- Closes #2272 - sort-cases PyTest benchmark by @joshmarshall1 in #2276
- Closes #2254 - PyTest benchmark upgrade for IO benchmarks by @joshmarshall1 in #2255
Full Changelog: v2023.03.01...v2023.03.24
Release Notes v2023.03.01
Bug Fixes
- Issue #2163 - Resolves issue with SymEntry destruction for GroupBy objects.
- Issue #2173 - Resolves SymEntry destruction bug for Strings and SegArray.
- PR #2164 - Updates memory checks to use Chapel runtime view of allocatable memory when available
- Issue #1987 - Resolves issues with AutoAPI documentation.
- Issue #2129 - Resolves a periodic 403 error for integration and metrics-enabled Arkouda
New Features
- Issues #2118, #2141, #2145, #2156 - Enable reading of Parquet files with columns containing
SegArray
objects. - PRs #2178, #2131 (part of Issue #2088), Issues #2139, #1961, - Provides updates speeding up BigInt pdarray creation using
bigint_from_uint_arrays
- Issue #2147 - Adds API for hashArrays
- Issue #1835 - Updates SegArray HDF5 save format. Backwards compatibility maintained.
- Issue #1939 - Adds
%
and%=
for floats - Issue #1522 - Adds
Index
&MultiIndex
Support forkey
inSeries.locate()
Minor Updates
- Issue #2152 - Enhances memory management logging and metrics
- Issue #2177 - Updates to
ak.client.maxTransferBytes
Auto-Generated Release Notes
- Closes #2126 - Update
quickstart
to v2023.02.08 by @Ethan-DeBandi99 in #2127 - Part of #2088: Speedup bigint_from_uint_arrays by @ronawho in #2131
- Closes #2129: Periodic 403 error for integration and metrics-enabled Arkouda by @hokiegeek2 in #2130
- Closes #2118 Parquet Read Columns containing Array Elements by @Ethan-DeBandi99 in #2123
- Closes #1939 - Adds
%
and%=
for floats by @Ethan-DeBandi99 in #2136 - Close #1522 - Add
Index
&MultiIndex
Support forkey
inSeries.locate()
by @Ethan-DeBandi99 in #2135 - Closes #2139: Add benchmark for bigint conversion by @pierce314159 in #2140
- Closes #1961:
bigint
bug hotfix by @pierce314159 in #2144 - Closes #2141 - Fixes Parquet SegArray Read Index Bug by @Ethan-DeBandi99 in #2142
- Closes #2150: Update path in IO benchmarks by @pierce314159 in #2151
- 2152 enhance memory management logging and metrics by @hokiegeek2 in #2153
- fixed bug in pct memory used calculation by @hokiegeek2 in #2158
- Closes #1987 - Updates AutoAPI Documentation by @Ethan-DeBandi99 in #2148
- Add some error handling code for upcoming change to map module by @bmcdonald3 in #2162
- Use Chapel runtime view of allocatable memory when available by @ronawho in #2164
- Closes #2166 - Add CI as Protection to Merge Queue by @Ethan-DeBandi99 in #2167
- Closes #2147 - Adding API for hashArrays by @joshmarshall1 in #2159
- Closes #2168 - Remove push from on clause by @Ethan-DeBandi99 in #2169
- Closes #2145 - Enable Parquet Read SegArray w/ Empty Segments by @Ethan-DeBandi99 in #2160
- Closes 1835 - hdf5 Save SegArray segments and values under single group by @joshmarshall1 in #2128
- Closes #2163 - Fixes SymEntry Destruction Issue by @Ethan-DeBandi99 in #2170
- Closes #2156: Bigint stream benchmark by @pierce314159 in #2157
- 2171 enhance arkouda response time metrics by @hokiegeek2 in #2172
- Closes #2175: Add check for local domain size before passing as C pointer by @bmcdonald3 in #2176
- Fixes #2173: Fixes SymEntry destruction bug by @pierce314159 in #2174
- Improve bigint
>>
performance by @ronawho in #2178 - Closes #2177: Update to
ak.client.maxTransferBytes
by @pierce314159 in #2180
Full Changelog: v2023.02.08...v2023.03.01
Release Notes v2023.02.08
Bug Fixes
- Issue #2068 - Fixes dataframe groupby with categorical index bug
- Issue #2076 - Fixes integer overflow in
groupby.mean
- Issue #2105 - Fixes bug when loading a
dataframe
containing asegarray
with an_
in the column name - Issue #2099 - Fixes bug in left and right shift by >=64 bits for int/uint
New Features
- Issues #2047, #2117 - Add CSV Support
- Issue #2060 - Adds pdarray data type and size to metrics
- Issue #2042 - Adds BigInt support in SegArray
- Issue #2111 - Enables
load_all
andread
workflows with dataframes containing segarrays - Issue #1695 - Renames
util
Packages - Issue #2058 - Enables logging to a variety of channels
Minor Updates
- Issue #1994 - Adds aggregation interface for
bigint
- Issues #1988, #2073, #2082, #2089, #2094, #2103, #2108, #2124 and PR #2057 - Update documentation
- Issues #2066, #2070, #2092 and PR #2074 - Provide the following updates to compilation:
- Separates setting optimization level and enabling runtime checks
- Disable bulk transfer when using
ARKOUDA_QUICK_COMPILE
- Improve Makefile errors when dependencies aren't found
- Updates iconv check during compilation
- Issue #2097 - Adds save-data flag to benchmark script
- Issue #2096 - Updates to prevent numpy overflow deprecation warnings
- Issue #2113 - Corrects naming and bug in segarray bigint test
Auto-Generated Release Notes
- Closes #2058: enable logging to a variety of channels by @hokiegeek2 in #2059
- Part of #1994: Add aggregation interface for
bigint
by @ronawho in #2062 - Fix documentation for encode/decode by @bmcdonald3 in #2057
- Closes #2060 add pdarray data type and size to metrics by @hokiegeek2 in #2067
- Closes #2070: Improve Makefile errors when dependencies aren't found by @bmcdonald3 in #2075
- Closes #2068: Fix dataframe groupby with categorical index bug by @pierce314159 in #2069
- Reformat 1-element arrays to avoid trailing commas and avoid some unnecessary arrays by @bradcray in #2072
- Add some function calls so that iconv check works properly by @bmcdonald3 in #2074
- Closes #2066: Separate setting optimization level and enabling runtime checks by @ronawho in #2078
- Fixes #2076: Fixes integer overflow in
groupby.mean
by @pierce314159 in #2079 - Closes #1695: Rename
util
Packages by @jaketrookman in #2017 - Closes #2042 - BigInt support in SegArray by @joshmarshall1 in #2080
- Closes #2082 - Add Developer Section to GitHub Pages by @Ethan-DeBandi99 in #2087
- Closes #2092: Disable bulk transfer when using quick compile by @bmcdonald3 in #2090
- Closes #1988 Update
pydoc/quickstart.rst
by @Ethan-DeBandi99 in #2091 - Closes #2089 - Update MacOS Install by @Ethan-DeBandi99 in #2095
- Closes #2097: Add save-data flag to benchmark script by @bmcdonald3 in #2098
- Fixes #2105: bug when loading a
dataframe
containing asegarray
with an_
in the column name by @pierce314159 in #2106 - Closes #2103: Update messaging overview docs by @pierce314159 in #2104
- Closes #2094: Adds release process to dev docs by @pierce314159 in #2102
- Closes #2047 - CSV Support by @Ethan-DeBandi99 in #2085
- Closes #2108: Update contributing docs by @pierce314159 in #2109
- Closes #2096: numpy overflow deprecations by @pierce314159 in #2101
- Closes #2113 - Fixing naming and bug in segarray bigint test by @joshmarshall1 in #2114
- Fixes #2115: Fixes bug in segarray load by @pierce314159 in #2116
- Closes #2111: Enable
load_all
andread
workflows with dataframes containing segarrays by @pierce314159 in #2112 - Closes #2117 - CSV Distributed Read Bug by @Ethan-DeBandi99 in #2122
- Closes #2073: Add developer documentation for faster Arkouda compiles by @bmcdonald3 in #2100
- Closes #2099: Bug in left and right shift by >=64 bits for int/uint by @jaketrookman in #2107
- Fixes #2124: Chapel API reference broken link by @pierce314159 in #2125
Full Changelog: v2023.01.11...v2023.02.08
Release Notes v2023.01.11
Major updates:
- Issues #1970, #1989, #1995, #2000, #2009, #2013, #2033, and #2041 - Add
bigint
pdarrays with binary operations and support forsort
,in1d
,search_intervals
,groupby
, anddataframe
- Issue #1876 - 2.5x speed up of multi-column write for parquet
- Issue #2019 - Fixes bug preventing reading strings formatted by older versions of Arkouda
- Issues #1297 and #1983 - Change
ak.array
to preferuint
overfloat
when containing values>2**63
- Issue #2005 - Fixes parquet read error for columns containing NANs
- Issues #1991, #2021, and #2024 - Add additional parquet compression support
- Issues #1962 and #2038 - Add parameters to
ak.get_mem_*
functions and add percentage of memory used toovermemlimit
logs - Issues #1965, #1979, #1981, #1986, #1996, and #2026 - Rework and update online documentation
- PR #1966 - Recommends Chapel 1.29.0 and updates CI to use it
- Issue #1850 - Removes legacy HDF5 multi-dim
Minor fixes:
- Issues #1949, #1951, #2054, and PR #2016 - Add support for conversions between IDNA and non-UTF-8 encodings
- Issue #2011 - Updates version requirement for
h5py
andnumpy
- Issue #1932 - Fixes bug with
binopvv
betweenuint
andbool
- Issue #1963 - Fixes message arg failure when List contains
Strings
andpdarray
- Issue #1972 - Switches to
allclose
for float comparison in operator test
Auto-generated release notes
- Closes #1963 - Message Arg failure when List contains
Strings
andpdarray
by @Ethan-DeBandi99 in #1964 - Closes #1951 - Remove
idnaEncodeDecode
by @Ethan-DeBandi99 in #1959 - Recommend Chapel 1.29.0 and use it for CI testing by @ronawho in #1966
- Closes #1972: Switch to allclose for float comparison in operator test by @bmcdonald3 in #1973
- Closes #1979 - Update Sphinx Theme by @Ethan-DeBandi99 in #1980
- Closes #1876 - Multi-Column Write Parquet by @Ethan-DeBandi99 in #1969
- Closes #1965 - Update
pydoc/examples.rst
by @Ethan-DeBandi99 in #1976 - Closes #1850 - Legacy HDF5 Removal by @Ethan-DeBandi99 in #1978
- Closes #1297 and #1983: Change
ak.array
to preferuint
overfloat
when>2**63
by @pierce314159 in #1984 - Closes #1970: Basic bigint pdarray by @pierce314159 in #1971
- Closes #1932: binopvv support for uint and bool types by @jaketrookman in #1982
- Closes #1981 - Sphinx Utilization Updates by @Ethan-DeBandi99 in #1985
- Closes #1989:
bigint
indexing by @pierce314159 in #1990 - Closes #1996 - Remove Release from header by @Ethan-DeBandi99 in #1997
- Closes #1962: Update ak.get_mem_* methods by @pierce314159 in #1992
- Closes #1925 - Remove Deprecated IO Functions by @Ethan-DeBandi99 in #1993
- Closes #1949: Allow conversions with IDNA and non-UTF-8 encodings by @bmcdonald3 in #1950
- Closes #2000:
bigint
opequals by @pierce314159 in #2002 - Closes #1995:
bigint
binops by @pierce314159 in #1999 - Closes #1991 - Additional Compression Support for Parquet by @Ethan-DeBandi99 in #2004
- Skip idna utf-16 test to work around bug by @bmcdonald3 in #2016
- Closes #2013: Create local copies of block size for bigint modulus by @bmcdonald3 in #2018
- Closes #2011 - Updating version requirement for
h5py
andnumpy
by @joshmarshall1 in #2014 - Closes #2021 - Compression Fixes for Benchmarks by @Ethan-DeBandi99 in #2022
- Closes #2019 - Read strings formatted for older versions of Arkouda by @Ethan-DeBandi99 in #2020
- Closes #2024: Add flags to Arrow install command for new compressions by @bmcdonald3 in #2025
- Closes #2005 - Parquet Read Error by @Ethan-DeBandi99 in #2012
- Closes #2010 - overflow notification for GroupBy Aggregations by @Ethan-DeBandi99 in #2028
- Closes #2009: Add
bigint
support forsort
,in1d
,search_intervals
, andgroupby
by @pierce314159 in #2027 - Closes #2029 - Undo IO Function Deprecation by @Ethan-DeBandi99 in #2035
- Closes #2033: Initial
bigint.to_ndarray/list
by @pierce314159 in #2036 - Closes #2026: Updating the Mac install doc to add
brew install gmp
by @jaketrookman in #2037 - Closes #2038: Add percentage used to
overmemlimit
by @pierce314159 in #2039 - Closes #2041:
bigint
compatibility fordataframe
by @pierce314159 in #2043 - Closes #2054: Fix iconv encoding for multi-byte characters and reenable tests by @bmcdonald3 in #2055
- Closes #1986 - File I/O Documentation Updates by @Ethan-DeBandi99 in #2045
Full Changelog: v2022.12.09...v2023.01.11
Release Notes v2022.12.09
Release Notes 2022-12-09
Major updates:
- Issues #1914, #1917, and #1922 - Add
serverInfoNoSplash
andautoShutdown
flags along with documentation for running arkouda from a script - Issue #1927 - Quiets HDF5 Errors when ObjType attribute is missing
- Issues #1904 and #1935 - Add Chapel-native encoding/decoding functionality
- Issue #1896 - Separates client IO in anticipation of IO rework
- Issue #1947 - Fixes conversion error when IPv4 is Index
Minor fixes:
- Issues #1796, #1722, and #1894 - Update
SegArray
to only register a single object - Issues #1926 and #1938 - Fix operation equals for
uint
arrays - Issue #1901 - Reduces SymEntry creation overheads
- Issue #1941 - Fixes condition for regex edge case
- Issue #1905 - Adds encoding libraries to conda dependencies
- Issue #319 - Moves chpl tests to tests/server directory
- Issue #1943 - Quiets deprecation warnings in preparation for Chapel 1.29
Auto-generated release notes
- Part of #1901: Replace
SymEntry.Ad
withSymEntry.a.domain
by @ronawho in #1908 - Closes #1914 - Adding
autoShutdown
flag to server to shutdown on client disconnect by @joshmarshall1 in #1916 - Closes #1917 - Adds server config json to stdout by @joshmarshall1 in #1920
- Closes #1901: Reduce SymEntry creation overheads by @ronawho in #1921
- Closes #1905: Add encoding libraries to conda dependencies by @pierce314159 in #1918
- Closes #1796, #1722, and #1894 - Update
SegArray
to only register a single object by @joshmarshall1 in #1900 - Closes #1922 - Running Arkouda from a script example documentation by @joshmarshall1 in #1923
- Closes #1927 - Quiet HDF5 Errors when ObjType attribute is missing by @Ethan-DeBandi99 in #1929
- Closes #319: Move chpl tests to tests/server directory by @bmcdonald3 in #1928
- Closes #1904: Add Chapel-native encoding/decoding functionality by @bmcdonald3 in #1897
- Closes #1896 - Client IO Separation by @Ethan-DeBandi99 in #1924
- Closes #1926: Fix
+=
foruint
arrays by @pierce314159 in #1930 - Closes #1938: Remove
float %= uint
by @pierce314159 in #1940 - Closes #1935: Increase encode benchmark size by @bmcdonald3 in #1936
- Create local copies of remote strings for encoding by @bmcdonald3 in #1937
- Closes #1941 - Proper pattern search to get the correct error by @jaketrookman in #1942
- Closes 1945 - Remove
src/exec
by @Ethan-DeBandi99 in #1946 - Closes #1947 - Conversion Error when IPv4 is Index by @Ethan-DeBandi99 in #1948
- Closes #1953 CI Testing Error Fix by @jaketrookman in #1954
- Closes 1943: Quiet deprecation warnings in prep for Chapel 1.29 by @ronawho in #1944
Full Changelog: v2022.11.17...v2022.12.09
Release Notes v2022.11.17
Release Notes 2022-11-17
Major updates:
- Issue #1906 - Supports older HDF5 files by assuming
pdarray
/Strings
when noObjType
attribute is set- Note: This removes the need to use the
legacyHDF5
flag
- Note: This removes the need to use the
- Issue #1909 - Adds support for
__invert__
calls onuint
- Issues #1844 and #1912 - Add option for
hierarchical
behavior tosearch_intervals
- Note: This behavior is the new default. To maintain existing behavior, set
hierarchical=False
- Note: This behavior is the new default. To maintain existing behavior, set
Minor fixes:
- Issue #1727 - Adds
where
argument tosqrt
andpower
- Issue #1800 - Adds Symbol Table Overview documentation
Auto-generated release notes
- Closes #1906 - HDF5 Assume
pdarray
/Strings
Object when no ObjType Attribute by @Ethan-DeBandi99 in #1907 - Closes #1909 uint not supported for binop invert by @Ethan-DeBandi99 in #1910
- Closes #1727: Add
where
argument tosqrt
andpower
by @jaketrookman in #1871 - Closes #1800 - Adding Symbol Table Overview documentation by @joshmarshall1 in #1873
- Closes #1844: Add option for
hierarchical
behavior tosearch_intervals
by @pierce314159 in #1911 - Closes #1912: Updates to
search_interval
by @pierce314159 in #1913
Full Changelog: v2022.11.10...v2022.11.17
Release Notes v2022.11.10
Release Notes 2022-11-10
In Memoriam
Mike Merrill (@mhmerrill), one of the co-founders of Arkouda, recently passed away. Without his leadership and contributions, the project would not exist. It's hard to overstate Mike's impact. He was a great person and will be dearly missed. Our deepest condolences to his family.
Major updates:
- Issues #487, #1558, #1559, #1846, #1877, #1887 and PR #1879 - Rework HDF5 structure and schema, enable writing to a single file, and add documentation of schema
- NOTE: Files written with tag
v2022.10.13
or earlier need to be read with thelegacyHDF5
flag set and re-written with the new format
- NOTE: Files written with tag
- Issue #1891 - Fixes bug in IDNA decode
- Issues #1776, #1847, #1852, and #1867 - Optimize
GroupBy
on small strings
Minor fixes:
- PR #1858 and Issue #1859 - Switches to C++17 for Arrow compilation and updates Arrow version to 9.0.0
- Issue #1801 - Reorganizes structure of the symbol table
- Issue #1779 - Adds documentation for creating a new symbol table entry
- Issues #1839 and #1889 - Update
MessageArgs
parameter forCommandMap
functions - Issue #1837 - Resolves intermittent failures in
GroupBy
prod aggregate test - Issue #1868 - Adds name property to
AbstractSymEntry
- Issue #1854 - Takes advantage of set and generator comprehensions in client code
- Issue #1842 - Updates
COMPARISON.md
Auto-generated release notes
- Closes #1836 - Removing gasnet from CI by @Ethan-DeBandi99 in #1841
- Closes #1839 -
generic_msg
Parameter Update by @Ethan-DeBandi99 in #1840 - Closes #1837: Intermittent failures in groupby prod aggregate test by @pierce314159 in #1838
- Closes #1842: Update
COMPARISON.md
by @jaketrookman in #1843 - Closes #1847: Small String GroupBy benchmark by @pierce314159 in #1848
- Closes #1852: Add small string groupby perf to
run_benchmarks
by @pierce314159 in #1853 - Closes #1854 - Take advantage of set and generator comprehensions in client code by @sotiriskaragiannis in #1856
- Switch to C++17 for Arrow compilation by @bmcdonald3 in #1858
- Closes #1859- Updating Arrow version to 9.0.0 by @Ethan-DeBandi99 in #1860
- Closes #1801 - Reorganizes Symbol Table by @Ethan-DeBandi99 in #1861
- Closes #1776: GroupBy Strings Optimization by @pierce314159 in #1851
- Closes #1857 - Remove
OPTIONAL_CHECKS
by @Ethan-DeBandi99 in #1863 - Closes #1865 - Remove unused return value in
MultiTypeSymbolTable
by @jeremiah-corrado in #1866 - Part of #1867: String GroupBy optimizations are timing out nightly tests by @pierce314159 in #1869
- Closes #1559, #1558, & #487 - HDF5 Upgrades by @Ethan-DeBandi99 in #1845
- Closes #1868 -
AbstractSymEntry
Name Property by @Ethan-DeBandi99 in #1870 - Closes #1846 - HDF5 Documentation by @Ethan-DeBandi99 in #1849
- Part of #1404: Add 10/03/22 sort scalability graphs from SGI 8600 by @ronawho in #1872
- Closes #1877 - HDF5 Updates Fail Nightly by @Ethan-DeBandi99 in #1878
- Temporarily drop python 3.11 testing from CI by @ronawho in #1883
- Add missing localize call to dsetname by @bmcdonald3 in #1879
- Closes #1779 - New Symbol Table Entry Generation Documentation by @Ethan-DeBandi99 in #1874
- Swap order of zippered for-loops in which an unbounded range was the leader by @bradcray in #1880
- Closes #1885 - Updating mypy to use >=0.931,<0.990 by @joshmarshall1 in #1886
- Closes #1887 - HDF5 Empty String Set Write by @Ethan-DeBandi99 in #1888
- Closes #1891 - IDNA Bug by @Ethan-DeBandi99 in #1892
- Closes #1889 - MessageArgs Parameter for CommandMap Functions by @Ethan-DeBandi99 in #1890
New Contributors
- @jaketrookman made their first contribution in #1843
- @sotiriskaragiannis made their first contribution in #1856
- @jeremiah-corrado made their first contribution in #1866
Full Changelog: v2022.10.13...v2022.11.10