Skip to content

Releases: Bears-R-Us/arkouda

v2023.06.16

16 Jun 12:24
eeebe70
Compare
Choose a tag to compare

Bug Fixes

  • Issue #2481 - Fixes Multi-Column Parquet not handling Empty Files Properly
  • Issue #2506 -Fixes Categorical Optional Components Required Bug
  • Issue #2414 - Fixes overMemLimit calc error

Major Updates

  • Issues #2424 and #2432 - Adds Strings value support for SegArray
  • Issue #2443 - Read/Write SegArray of Strings for HDF5
  • Issue #2444 - Adds SegArray with Strings Values Parquet Support (Does not include Multi-Colmn)
  • Issue #2386 - Read/Write support for GroupBy objects in HDF5
  • Issues #2434, #2459, #2462, and #2463 - Adds hashing support for Segarray, Strings, Categorical, BigInt
  • Issues #2006, #2032, #2416, and #2431 - BigInt Support Improvements
  • Issue #2304 - Adds inner_join on Strings and Categorical
  • Issue #2417 - Filename_Codes match Categorical.codes
  • Issue #2425 - Import/Export lists from/to pandas

Minor Updates

  • Issue #2454 - Updates SegArray.__getitem__ to Always Return pdarray
  • Issue #2418 - Adds instructions to set max per-locale CPU cores and memory
  • Issue #2433 - Updates GroupBy Object to only be client side
Auto-Generated Release Notes * Adjust these modules to avoid deprecation warnings from non-default Math symbols by @lydia-duncan in https://github.com//pull/2415 * Closes #2412: Update quickstart to v2023.05.05 by @pierce314159 in https://github.com//pull/2413 * Fix overMemLimit calc error by @hokiegeek2 in https://github.com//pull/2421 * Closes #2427 - Deprecation updates related to Memory and Memory.Diagnostics by @jabraham17 in https://github.com//pull/2422 * Closes #2425 - Import/Export lists from/to pandas by @Ethan-DeBandi99 in https://github.com//pull/2428 * Closes #2386 - `GroupBy.to_hdf` & `GroupBy.update_hdf` by @Ethan-DeBandi99 in https://github.com//pull/2426 * add instructions to set max per-locale CPU cores and memory by @hokiegeek2 in https://github.com//pull/2429 * Closes #2416 and #2006: bigint shift performance by @pierce314159 in https://github.com//pull/2423 * Closes #2431: Add bigint broadcast by @pierce314159 in https://github.com//pull/2437 * Closes #2441: Adds missing `use Biginteger` in gt-130 bigint compat by @pierce314159 in https://github.com//pull/2442 * fixed typo by @hokiegeek2 in https://github.com//pull/2448 * Closes #2417 - `Filename_Codes` match `Categorical.codes` by @Ethan-DeBandi99 in https://github.com//pull/2440 * Closes #2445 - Deprecation updates related to string and byte factory functions by @jabraham17 in https://github.com//pull/2446 * Closes #2451: Remove pragma no doc instances in Chapel code by @bmcdonald3 in https://github.com//pull/2452 * Updates for Chapel `list.append` deprecation by @jeremiah-corrado in https://github.com//pull/2450 * Closes #2432 - Revert SegArray to Client Side by @Ethan-DeBandi99 in https://github.com//pull/2439 * Closes #2304: `inner_join` on `Strings` and `Categorical` by @pierce314159 in https://github.com//pull/2453 * Closes #2467: Arrow compilation can fail with clang 15 upgrade changes default PIE by @bmcdonald3 in https://github.com//pull/2468 * Closes #2032 - BigInt Support for HDF5 by @Ethan-DeBandi99 in https://github.com//pull/2460 * Closes #2454 - Update `SegArray.__getitem__` to Always Return `pdarray` by @Ethan-DeBandi99 in https://github.com//pull/2466 * Closes #2424 - Adds `SegArray` support for `Strings` Values by @Ethan-DeBandi99 in https://github.com//pull/2469 * Closes #1211 - Remove TaskErrors Workaround by @Ethan-DeBandi99 in https://github.com//pull/2470 * Closes #2433: GroupBy back to client only by @pierce314159 in https://github.com//pull/2456 * Changes for Chapel `c_memcpy` replacement with `OS.POSIX.memcpy` by @jeremiah-corrado in https://github.com//pull/2479 * Closes #2482 - Fix capitalization of POSIX compatibility module by @jeremiah-corrado in https://github.com//pull/2483 * Closes #2443 - Read/Write SegArray of Strings HDF5 by @Ethan-DeBandi99 in https://github.com//pull/2478 * Closes #2459 and #2434: `ak.hash` for `Segarray` and `Strings` by @pierce314159 in https://github.com//pull/2475 * Fixes #2481 - Multi-Column Parquet does not handle Empty Files Properly by @Ethan-DeBandi99 in https://github.com//pull/2484 * Closes #2436 - Updates `_buildReadAllJSON` to use `ObjType` Enum by @Ethan-DeBandi99 in https://github.com//pull/2486 * Closes #2490: Change `checkInstall` path to be relative to script, not Arkouda by @bmcdonald3 in https://github.com//pull/2491 * Closes #2462: Categorical hashing by @pierce314159 in https://github.com//pull/2487 * Closes #2476: Updates Chapel Tutorial by @pierce314159 in https://github.com//pull/2494 * Design and implement client Channel class hierarchy by @hokiegeek2 in https://github.com//pull/2496 * Closes #2463: Hashing for bigint pdarrays by @pierce314159 in https://github.com//pull/2497 * Closes #2500 - Remove Old Test Prototype by @Ethan-DeBandi99 in https://github.com//pull/2501 * Closes #2502 - Remove ArkoudaWeeklyCall References by @Ethan-DeBandi99 in https://github.com//pull/2503 * Closes #2488: Quiet deprecation warnings in prep for Chapel 1.31 by @bmcdonald3 in https://github.com//pull/2489 * Fixes #2506 - Categorical Optional Components Required Bug by @Ethan-DeBandi99 in https://github.com//pull/2507 * Closes #2444 - SegArray with String Values Parquet Support by @Ethan-DeBandi99 in https://github.com//pull/2492

Full Changelog: v2023.05.05...v2023.06.17

Release Notes v2023.05.05

05 May 18:03
a6629af
Compare
Choose a tag to compare

Bug Fixes

  • Issue #2398 - Fixes parquet error on list columns containing nested lists
  • Issue #2380 - Fixes SegArray register bug
  • Issue #2396 - Fixes server crash caused by nested parquet fields
  • Issue #2300 - Improves Strings strip performance

New Features

  • Issue #2296 - Adds bitops support for bigint pdarays
  • Issue #474 - Adds HDF5 overwrite dataset
  • Issues #2372 and #2373 - Update Categorical HDF5 format and add update_hdf method
  • Issue #2377 - Adds groupby aggregations that require min/max on bigint
  • Issue #1855 - Adds divmod support

Minor Updates

  • Issue #2355 - Drops support for 1.28
  • Issue #2138 - Updates messaging overview docs
  • Issue #2368 - Cleans up Strings references in SegArray
Auto-Generated Release Notes * Closes #2370 - Fixes Deprecation Warnings during `make` by @Ethan-DeBandi99 in https://github.com//pull/2371 * Closes #2368 - Cleans up Strings references in SegArray by @Ethan-DeBandi99 in https://github.com//pull/2369 * Closes #1855: Implement `divmod` by @jaketrookman in https://github.com//pull/2356 * Closes #2374: Pin hdf5 version to 1.12.2 by @pierce314159 in https://github.com//pull/2375 * Closes #2355: Drop support for 1.28 by @bmcdonald3 in https://github.com//pull/2383 * Closes #2377: Add groupby aggregations that require min/max on bigint by @pierce314159 in https://github.com//pull/2378 * Closes #474 - HDF5 Overwrite Dataset by @Ethan-DeBandi99 in https://github.com//pull/2382 * Closes #2300 - `segString` Strip performance issue by @joshmarshall1 in https://github.com//pull/2379 * Fixes #2380: Segarray register bug by @pierce314159 in https://github.com//pull/2392 * Closes #2341: Quiet 131 deprecations by @bmcdonald3 in https://github.com//pull/2342 * Closes #2372 & #2373 - Categorical HDF5 Format Update & `update_hdf` method by @Ethan-DeBandi99 in https://github.com//pull/2394 * Closes #2396 - Fixes Nested parquet fields causing server crash by @Ethan-DeBandi99 in https://github.com//pull/2399 * Closes #2398 - Fixes Parquet List Columns with Nested Lists Error by @Ethan-DeBandi99 in https://github.com//pull/2401 * Closes #2403: Update c_getDatasetNames calls to match new behavior by @bmcdonald3 in https://github.com//pull/2404 * Add a use of the Math module to avoid the anticipated `pi` deprecation warning by @lydia-duncan in https://github.com//pull/2407 * Deprecation updates for `BitOps.popcount` and `bigint.mod` by @jeremiah-corrado in https://github.com//pull/2409 * Closes #2296: bitops support for bigint by @pierce314159 in https://github.com//pull/2408 * Closes #2138 Update messaging overview docs by @jaketrookman in https://github.com//pull/2391 * Deprecation updates for `datetime` and `fromTimestamp` by @bmcdonald3 in https://github.com//pull/2410

Full Changelog: v2023.04.07...v2023.05.05

Release Notes v2023.04.07

07 Apr 13:30
80c4fe6
Compare
Choose a tag to compare

Bug Fixes

  • Issue #2329 - Fixes continued SegArray read performance issues
  • Issue #2299 - Fixes file writes reporting success when directory does not exist
  • Issue #2297 - Fixes HDF5 single file write stall
  • Issue #2327 - Fixes issue loading 16-bit and 32-bit from Parquet
  • Issue #2337 - Fixes ak.DataFrame.to_parquet with IPv4 columns
  • Issue #2348 - Fixes IPv4 removal from DataFrame
  • Issue #2350 - Fixes DataFrame column subset access
  • Issues #2306, #2317 and PR #2354 - Fix Datetime component scaling
  • Issue #2328 - Fixes bug when printing Dataframes containing bigint
  • Issue #2307 - Fixes reported memory usage exceeding 100 percent
  • Issue #2309 - Fixes error in Groupby.unique on Categorical or Strings
  • Issue #2240 - Fixes ak.coargsort empty String and Categorical bug
  • Issue #2347 - Fixes broken links in README.md

New Features

  • Issue #2050 - Adds File of Origin when loading data from Parquet or HDF5. Use ak.read_tagged_data to return the file origin information. More information on this function can be found here.
  • Issue #2295 - Adds SegArray filters, ak.SegArray.filter() allows values to be removed from SegArray.
  • Issue #2293- Adds ak.where support Strings and Categoricals
  • Issue #2324 - Adds benchmark documentation
  • Issue #2209 - Enhances Arkouda metrics

Minor Updates

  • Issue #2015 - Updates Parquet NaN detection on float/double columns to be more efficient
  • Issue #2280 - Improves max_bits handling
  • Issues #2319, #2320, #2339 - Update to test framework to pytest configuration
Auto-Generated Release Notes

Full Changelog: v2023.03.24...v2023.04.07

Release Notes v2023.03.24

24 Mar 21:52
424e162
Compare
Choose a tag to compare

Bug Fixes

  • Issue #2265 - Fixes bug in ak.DataFrame.to_parquet with empty Strings Column
  • Issue #2263 - Fixes bug which caused slow reads of large SegArrays and Strings
  • Issue #2214 - Fixes bigint rotate by more than max_bits bug
  • Issue #2183 - Fixes index reset in dataframe get_head_tail
  • Issues #2179 and #2199 - Fix OOB error when writing SegArray to HDF5 when locales exceed number of segments

New Features

Minor Updates

  • Issue #2110 - Updates _buildReadAllJSON to use Map
  • Issue #2077 - Remove duplicated bigint logic in IndexingMsg.chpl
Auto-Generated Release Notes

Full Changelog: v2023.03.01...v2023.03.24

Release Notes v2023.03.01

01 Mar 13:23
8ae978b
Compare
Choose a tag to compare

Bug Fixes

  • Issue #2163 - Resolves issue with SymEntry destruction for GroupBy objects.
  • Issue #2173 - Resolves SymEntry destruction bug for Strings and SegArray.
  • PR #2164 - Updates memory checks to use Chapel runtime view of allocatable memory when available
  • Issue #1987 - Resolves issues with AutoAPI documentation.
  • Issue #2129 - Resolves a periodic 403 error for integration and metrics-enabled Arkouda

New Features

  • Issues #2118, #2141, #2145, #2156 - Enable reading of Parquet files with columns containing SegArray objects.
  • PRs #2178, #2131 (part of Issue #2088), Issues #2139, #1961, - Provides updates speeding up BigInt pdarray creation using bigint_from_uint_arrays
  • Issue #2147 - Adds API for hashArrays
  • Issue #1835 - Updates SegArray HDF5 save format. Backwards compatibility maintained.
  • Issue #1939 - Adds % and %= for floats
  • Issue #1522 - Adds Index&MultiIndex Support for key in Series.locate()

Minor Updates

  • Issue #2152 - Enhances memory management logging and metrics
  • Issue #2177 - Updates to ak.client.maxTransferBytes
Auto-Generated Release Notes

Full Changelog: v2023.02.08...v2023.03.01

Release Notes v2023.02.08

08 Feb 16:40
5909302
Compare
Choose a tag to compare

Bug Fixes

  • Issue #2068 - Fixes dataframe groupby with categorical index bug
  • Issue #2076 - Fixes integer overflow in groupby.mean
  • Issue #2105 - Fixes bug when loading a dataframe containing a segarray with an _ in the column name
  • Issue #2099 - Fixes bug in left and right shift by >=64 bits for int/uint

New Features

  • Issues #2047, #2117 - Add CSV Support
  • Issue #2060 - Adds pdarray data type and size to metrics
  • Issue #2042 - Adds BigInt support in SegArray
  • Issue #2111 - Enables load_all and read workflows with dataframes containing segarrays
  • Issue #1695 - Renames util Packages
  • Issue #2058 - Enables logging to a variety of channels

Minor Updates

  • Issue #1994 - Adds aggregation interface for bigint
  • Issues #1988, #2073, #2082, #2089, #2094, #2103, #2108, #2124 and PR #2057 - Update documentation
  • Issues #2066, #2070, #2092 and PR #2074 - Provide the following updates to compilation:
    • Separates setting optimization level and enabling runtime checks
    • Disable bulk transfer when using ARKOUDA_QUICK_COMPILE
    • Improve Makefile errors when dependencies aren't found
    • Updates iconv check during compilation
  • Issue #2097 - Adds save-data flag to benchmark script
  • Issue #2096 - Updates to prevent numpy overflow deprecation warnings
  • Issue #2113 - Corrects naming and bug in segarray bigint test
Auto-Generated Release Notes

Full Changelog: v2023.01.11...v2023.02.08

Release Notes v2023.01.11

11 Jan 22:26
62b3266
Compare
Choose a tag to compare

Major updates:

  • Issues #1970, #1989, #1995, #2000, #2009, #2013, #2033, and #2041 - Add bigint pdarrays with binary operations and support for sort, in1d, search_intervals, groupby, and dataframe
  • Issue #1876 - 2.5x speed up of multi-column write for parquet
  • Issue #2019 - Fixes bug preventing reading strings formatted by older versions of Arkouda
  • Issues #1297 and #1983 - Change ak.array to prefer uint over float when containing values >2**63
  • Issue #2005 - Fixes parquet read error for columns containing NANs
  • Issues #1991, #2021, and #2024 - Add additional parquet compression support
  • Issues #1962 and #2038 - Add parameters to ak.get_mem_* functions and add percentage of memory used to overmemlimit logs
  • Issues #1965, #1979, #1981, #1986, #1996, and #2026 - Rework and update online documentation
  • PR #1966 - Recommends Chapel 1.29.0 and updates CI to use it
  • Issue #1850 - Removes legacy HDF5 multi-dim

Minor fixes:

  • Issues #1949, #1951, #2054, and PR #2016 - Add support for conversions between IDNA and non-UTF-8 encodings
  • Issue #2011 - Updates version requirement for h5py and numpy
  • Issue #1932 - Fixes bug with binopvv between uint and bool
  • Issue #1963 - Fixes message arg failure when List contains Strings and pdarray
  • Issue #1972 - Switches to allclose for float comparison in operator test
Auto-generated release notes

Full Changelog: v2022.12.09...v2023.01.11

Release Notes v2022.12.09

09 Dec 16:50
f5ef366
Compare
Choose a tag to compare

Release Notes 2022-12-09

Major updates:

  • Issues #1914, #1917, and #1922 - Add serverInfoNoSplash and autoShutdown flags along with documentation for running arkouda from a script
  • Issue #1927 - Quiets HDF5 Errors when ObjType attribute is missing
  • Issues #1904 and #1935 - Add Chapel-native encoding/decoding functionality
  • Issue #1896 - Separates client IO in anticipation of IO rework
  • Issue #1947 - Fixes conversion error when IPv4 is Index

Minor fixes:

  • Issues #1796, #1722, and #1894 - Update SegArray to only register a single object
  • Issues #1926 and #1938 - Fix operation equals for uint arrays
  • Issue #1901 - Reduces SymEntry creation overheads
  • Issue #1941 - Fixes condition for regex edge case
  • Issue #1905 - Adds encoding libraries to conda dependencies
  • Issue #319 - Moves chpl tests to tests/server directory
  • Issue #1943 - Quiets deprecation warnings in preparation for Chapel 1.29

Auto-generated release notes

Full Changelog: v2022.11.17...v2022.12.09

Release Notes v2022.11.17

17 Nov 22:57
cdeab05
Compare
Choose a tag to compare

Release Notes 2022-11-17

Major updates:

  • Issue #1906 - Supports older HDF5 files by assuming pdarray/Strings when no ObjType attribute is set
    • Note: This removes the need to use the legacyHDF5 flag
  • Issue #1909 - Adds support for __invert__ calls on uint
  • Issues #1844 and #1912 - Add option for hierarchical behavior to search_intervals
    • Note: This behavior is the new default. To maintain existing behavior, set hierarchical=False

Minor fixes:

  • Issue #1727 - Adds where argument to sqrt and power
  • Issue #1800 - Adds Symbol Table Overview documentation

Auto-generated release notes

Full Changelog: v2022.11.10...v2022.11.17

Release Notes v2022.11.10

10 Nov 20:51
1e31ee7
Compare
Choose a tag to compare

Release Notes 2022-11-10

In Memoriam

Mike Merrill (@mhmerrill), one of the co-founders of Arkouda, recently passed away. Without his leadership and contributions, the project would not exist. It's hard to overstate Mike's impact. He was a great person and will be dearly missed. Our deepest condolences to his family.

Major updates:

  • Issues #487, #1558, #1559, #1846, #1877, #1887 and PR #1879 - Rework HDF5 structure and schema, enable writing to a single file, and add documentation of schema
    • NOTE: Files written with tag v2022.10.13 or earlier need to be read with the legacyHDF5 flag set and re-written with the new format
  • Issue #1891 - Fixes bug in IDNA decode
  • Issues #1776, #1847, #1852, and #1867 - Optimize GroupBy on small strings

Minor fixes:

  • PR #1858 and Issue #1859 - Switches to C++17 for Arrow compilation and updates Arrow version to 9.0.0
  • Issue #1801 - Reorganizes structure of the symbol table
  • Issue #1779 - Adds documentation for creating a new symbol table entry
  • Issues #1839 and #1889 - Update MessageArgs parameter for CommandMap functions
  • Issue #1837 - Resolves intermittent failures in GroupBy prod aggregate test
  • Issue #1868 - Adds name property to AbstractSymEntry
  • Issue #1854 - Takes advantage of set and generator comprehensions in client code
  • Issue #1842 - Updates COMPARISON.md

Auto-generated release notes

New Contributors

Full Changelog: v2022.10.13...v2022.11.10