Release 3.5.0
Bug
HWORKS-224 hopsworks python SDK opensearch link broken
HWORKS-267 After fail checkout, branch is empty on pull
HWORKS-269 Don't show the same in file status and in UI table
HWORKS-309 Bugs in searching
HWORKS-331 Hopsworks metrics not showing after payara restart
HWORKS-374 opensearch_api.get_default_py_config() returns public IP of host, should return consul fqdn
HWORKS-386 Spark job does not accept space in arguments
HWORKS-469 Dependent conda commands not handled correctly in case of failure
HWORKS-475 hopsworks Python library can't connect in sklearn legacy deployments
HWORKS-492 Servings logs are not shown in Kibana
HWORKS-493 Request batching configuration is broken in the deployment creation form
HWORKS-496 Exception not raised if other than duplicated deployment entry
HWORKS-498 Transformer resources in dict format are not properly deserialized
HWORKS-499 Support for jupyter notebooks as predictor scripts
HWORKS-561 Hopsworks Kafka Authorizer does not allow IDEMPOTENT_WRITE operations
HWORKS-569 Filebeat cannot list nodes which is required for scraping container logs
HWORKS-602 HA cluster could not connect to live logs, rm UI, spark UI, monitor, onlineFS
HWORKS-630 Project creation doesn't fail on invalid project name
HWORKS-633 Failing deployments are shown as Starting when Kubernetes stops restarting pods
HWORKS-636 Inference logging default value is overwritten when updating a deployment using .save()
HWORKS-656 NP trying to force delete a partially created project
HWORKS-693 Server logs not collected when the predictor class fails to initialize
HWORKS-786 Storage connector specs failure for flag disabled case
HWORKS-787 Git repository current commit and branch not being updated
HWORKS-792 Hopsworks job schedule update doesn't update cron expression
HWORKS-809 Project creation fails when the serving api key k8s secret takes too long to be available
HWORKS-815 Judge service should be restarted when docker daemon restarts
HWORKS-817 Log level in RonDB replication scripts is wrong
HWORKS-819 The request URI port is absent in MultiRegionFilter
HWORKS-820 In MultiRegionFilter the Primary region is null if invoked too early
HWORKS-821 Kubernetes does not accept serving names starting with a number
HWORKS-824 Only Primary MySQLs should be able to update RonDB replication metadata table
HWORKS-833 Consul datacenter is hardcoded in Prometheus configuration
HWORKS-834 Undeploy hopsworks on upgrade
HWORKS-837 RonDB native backup script fails when there is no database to backup
HWORKS-848 AlertManager EJB does not take Datacenter into consideration in Multiregion clusters
HWORKS-851 Check in hops-hadoop-chef for creating RonDB undo and log files is wrong
HWORKS-865 NullPointerException when monitoring execution of a deleted job
Task
HWORKS-135 Models backend should store metadata in tables instead of opensearch
HWORKS-164 Add Airflow documentation
HWORKS-198 Remove hopsworks::image recipe
HWORKS-226 python api should throw informative error if user tries to download a folder
HWORKS-258 Add hopsworks python sdk dataset upload/download to workflow testing
HWORKS-262 Remove user email FK from hopsworks schema.
HWORKS-266 Checking out a branch that exist on remote repo does not work
HWORKS-284 Documentation to export cluster logs
HWORKS-302 Enhancement Request: Ability to disable access to Anaconda package repository and hide the functionality at a system defined level
HWORKS-314 Add support for git fetch and reset
HWORKS-422 Exclude old versions of the documentation from being indexed by search engines
HWORKS-491 Replace hsml with hopsworks in the code snippets shown in the UI
HWORKS-494 Support local script path in model.deploy()
HWORKS-511 Add number of GPUs to Python resource usage
HWORKS-520 Make oauth claims configurable
HWORKS-536 Move master encryption password to KMS
HWORKS-559 Users should be able to specify a directory when using the Hopsworks-api to download logs
HWORKS-565 Hopsworks api dataset upload should support parallel chunk upload
HWORKS-571 Run more integration tests in parallel
HWORKS-590 Make jenkins test use HA cluster
HWORKS-591 Show a better log for when pods are killed by Kubernetes with OOM
HWORKS-617 Make load balancer use https
HWORKS-631 Support info exchange between preprocess and postprocess methods in transformers
HWORKS-635 Add unit tests and loadtest for hsml deployments
HWORKS-641 Add loadtests for hsml and model registry
HWORKS-662 Fix versioning and build automation for hopsify
HWORKS-676 Supress CVE-2023-33265 in payara-embedded-web-5.2022.5.jar/META-INF/maven/com.hazelcast/hazelcast/pom.xml:
HWORKS-677 Supress multiple severe CVE in hive-storage-api-2.6.1.2.jar
HWORKS-679 Add option to set python environment
HWORKS-690 Airflow project dags sharing
HWORKS-697 Remove airflow JWT
HWORKS-707 Separate statistics from feature monitoring PR
HWORKS-729 Nightly loadtest cluster should have an admin user that is member of all projects
HWORKS-745 Support python 3.11 for the client APIs
HWORKS-746 Documentation for python environment history and custom commands
HWORKS-751 Add webhook support for Hopsworks alerts
HWORKS-756 Remove cmake dependency in hops-hadoop-chef
HWORKS-759 Show custom commands file that was used to build the environment in the environment history
HWORKS-764 Reduce JWT lifetime in nightly tests
HWORKS-774 Set hopsworks_rest_log_level to DEV in the nightly tests
HWORKS-775 Nightly tests should reload random module
HWORKS-777 Fix integration test
HWORKS-780 Increase number of concurrent load tests
HWORKS-783 Cleanup PIA functionality
HWORKS-784 Remove airflow jdbc metric from payara
HWORKS-789 Fix git tests
HWORKS-791 Add Python API support to schedule jobs
HWORKS-795 Hopsworks CA should generate Java keystores instead of a shell script on the client
HWORKS-798 Add **kwargs to python client libraries
HWORKS-800 Implement dataset.copy and dataset.move API
HWORKS-801 Export onlinefs/default/private_ips in metadata
HWORKS-803 Remove airflow from old UI
HWORKS-804 upgrade zookeeper 3.8.2 to 3.8.3 CVE-2023-44981
HWORKS-805 Environment history compares versions lexicographically instead of numerically per component
HWORKS-807 Add wait_until_finished method to execution object in the hopsworks python apis
HWORKS-829 Register model files without compressing them
HWORKS-830 Configurable glassfish tmpdir property
HWORKS-832 Set size limit in Consul log files
HWORKS-835 Download model artifact without compressing model files
HWORKS-841 Add 'See resource usage' button in the deployment overview page
HWORKS-847 RonDB Global replication alert
HWORKS-849 Check if dag has access_control is not None before syncing dag permissions in airflow
HWORKS-854 hops-system should use the minimal env definition without pydoop
Feature Store
Epic
FSTORE-473 Data Management
FSTORE-612 Feature Monitoring
FSTORE-1047 Support Similarity Search in the Feature Store
Bug
FSTORE-55 Backfill job gets stuck
FSTORE-537 Explicit provenance ClassCastException thrown
FSTORE-754 Feature Group creation during RonDB reconfig leaves behind broken feature group
FSTORE-755 Uninformative error when deleting feature group during RonDB reconfig
FSTORE-799 HSFS should reinstantiate sqlalchemy connectrion pool if MySQL node disappears
FSTORE-813 Unclear error when the feature group contains duplicated columns
FSTORE-830 Error fetching feature statistics from feature view UI - but statistics exists with a different timestamp
FSTORE-856 Only one OnlineFS instance running
FSTORE-892 FM jobs are not removed when the corresponding FM configuration is deleted
FSTORE-916 FeatureGroup.insert does not retry in case of a failure
FSTORE-932 Pyarrow dependency is required even if the python profile is not installed
FSTORE-943 NullPointerException in SchematizedTagHelper
FSTORE-944 Nested filter statements are not handled correctly when attached to a feature view
FSTORE-968 Add workflow test for get_batch_data
FSTORE-976 Feature group filter in PIT join gets pushed down on the temporary table that doesn't contain the feature group
FSTORE-987 Failed to read data when there is a self-join
FSTORE-989 GCS connector-Encryption fields and secrets update issues
FSTORE-992 create_train_validation_test_split fails with unexpected keyword argument 'pit_query_asof'
FSTORE-998 Can't read from a shared feature store
FSTORE-1000 Tags parameter is missing in TrainingDataset class
FSTORE-1005 Training datasets are written and read using a wrong path
FSTORE-1009 hsfs flink consumer doesn't work with parallelism greater than 1.
FSTORE-1011 Validation message during data ingestion to FG when there are no kafka topics configured
FSTORE-1017 Updating job schedule's cron expression doesn't work
FSTORE-1023 Mix up of subjectId and schemaId
FSTORE-1024 Concurrency can cause SQLIntegrityConstraintViolationException when creating topic
FSTORE-1026 Wrong td file path created by ArrowFlight
FSTORE-1029 Unexplained OnlineFS exceptions in logs
FSTORE-1034 .select() method should not default to empty list
FSTORE-1035 No error if a user tries to create a feature view without features
FSTORE-1037 ArrayIndexOutOfBounds in TrainingDatasetController
FSTORE-1038 Misleading exception for ArrowFlight when not waiting for ingestion job to finish
FSTORE-1044 Insert fails with Feature 'id': dtype 'O' (arrow_type 'null') not supported
FSTORE-1049 FlyingDuck gets NoneType in evaluate_filter_expression
FSTORE-1050 FlyingDuck gets Permission denied trying to read hoodie.properties file
FSTORE-1051 NotImplementedException when querying FeatureView based on external feature group
FSTORE-1053 Never ending materialization job
FSTORE-1057 Schema inconsistency in DuckDB after upsert
FSTORE-1058 if feature view query doesn't include event time feature add for get_batch_data(start_time, end_time)
FSTORE-1059 ArrowFlight server hangs after some requests
FSTORE-1062 Materialisation job does not work with new Hopsworks scheduler
FSTORE-1068 Fixed failed load test
FSTORE-1071 Flyingduck crashes when joining shared feature stores
FSTORE-1072 Connection timeout not set in the arrow_flight_client.py
FSTORE-1073 OnlineFS getting offsets fails when project name has upper case letters
FSTORE-1074 Expand documentation on filter logic
FSTORE-1075 Cannot get schema from shared project
FSTORE-1084 Cannot run multiple insert_stream query on the same project by default
FSTORE-1086 Broken query when adding a filter on the non-label feature group without selecting the feature
FSTORE-1087 feature_view.get_batch_data timing out when using ArrowFlight
FSTORE-1089 Remove copying of application code from databricks integration
FSTORE-1094 pyarrow._flight.FlightServerError: 'fileIdAndRelativePaths' when reading from ArrowFlight
FSTORE-1095 JDBC storage connector missing driver option in the documentation
Subtask
FSTORE-1007 Fix training dataset paths on the backend and adjust them on hsfs
Task
FSTORE-26 Print Warning that backfill job had to be started manually, when PySpark engine stream=True and using .insert_stream()
FSTORE-335 Track GIT commit that was used to create/insert into a feature group
FSTORE-362 Replace Deequ statistics with more lightweight module
FSTORE-363 Make HSFS API typesafe
FSTORE-367 Create internal guide for hsfs unit test
FSTORE-382 Feature store v2.5 documentation - broken links
FSTORE-394 Unify metadata update API in hsfs
FSTORE-404 Add rollback action for Fs job creation.
FSTORE-471 Allow tranformation functions for label features
FSTORE-550 Ensure type consistency between for online/offline/feature view reads in python and spark engine
FSTORE-665 Design log: Kafka topic deletion
FSTORE-722 Hopsworks should not retrieve metadata from the Hive metastore
FSTORE-725 Transformation Function unit test should test the default behaviour of the pre-registered function in the backend
FSTORE-738 UI should show commit timestamps in UTC format
FSTORE-753 Improve user error message for feature group creation and append during RonDB reconfig
FSTORE-790 Replace hsfs with hopsworks in the code snippets shown in the UI
FSTORE-823 Statistics Engine - Design Log
FSTORE-825 Deleting an online feature group doesn't revoke mysql privileges
FSTORE-826 Add test support for AWS and GCP connectors
FSTORE-828 Automatically add prefix if there are duplicate columns in the query join.
FSTORE-834 Relative path in absolute URI - External FG
FSTORE-845 Update Apache Hudi version to 0.14.0
FSTORE-870 Issues including None/Nan/Null values in queries on feature views
FSTORE-876 Remove Stats from Write Path
FSTORE-878 Integrate Docsbot into Docs
FSTORE-880 Improve error message when there is no match of primary key in get_feature_vector
FSTORE-881 Appending to feature groups after upgrade from 3.0 to 3.2 seems off
FSTORE-885 Catch no data error when fetching dataframe in feature monitoring
FSTORE-895 Capability Write Up - Online Inference Pipelines
FSTORE-896 Capability Write Up - Batch Inference Pipelines
FSTORE-897 Capability Write Up - Training Pipelines
FSTORE-898 Capability Write Up - Feature Engineering in Beam
FSTORE-900 Capability Write Up - Feature Engineering in SQL
FSTORE-901 Capability Write Up - Feature Engineering in Spark
FSTORE-905 Add tests for training dataset statistics computation
FSTORE-909 Capability Write Up - On-demand Feature Engineering
FSTORE-920 support for JDBC test connection
FSTORE-925 support for ADLS test connection
FSTORE-927 Upgrade great expectations to 0.15.12
FSTORE-930 Allow pyarrow object type in pandas 2
FSTORE-933 DBT Scheduling
FSTORE-941 Support versioning of Feature Monitoring configurations
FSTORE-949 FlyingDuck - Pandas 2.0 Pyarrow-backed Types support
FSTORE-950 Split feature descriptive statistics table into three tables
FSTORE-951 Increase test coverage for feature monitoring
FSTORE-954 Investigate how to apply transformation function in REST API
FSTORE-957 FS REST API benchmark
FSTORE-967 Bump intellij plugin version
FSTORE-971 Do not compute statistics on in-memory training datasets
FSTORE-973 JDBC connector - add driver upload and driver name fields
FSTORE-980 Helper, Primary key and event time columns with feature view
FSTORE-990 Incorporate Flying Duck in nightly tests
FSTORE-1003 Build locusts tests benchmarking Vertex AI- FS
FSTORE-1008 add java client to hsfs
FSTORE-1015 Add option to add Keytab to Kafka connector
FSTORE-1019 Single Kafka topic per project documentation
FSTORE-1021 Add a warning when using externally managed Kafka
FSTORE-1022 Missing data in featurestore benchmark
FSTORE-1025 Support datetime in training dataset time series split
FSTORE-1036 Tracking usage of hsfs libraries
FSTORE-1040 SageMaker Online benchmarking
FSTORE-1041 Databricks online benchmarking
FSTORE-1042 Investigate increasing time to make hudi commits
FSTORE-1043 Historical Data from Feature View
FSTORE-1046 Improve online feature store metrics
FSTORE-1052 Update the docs to include Kafka config variables for throughput
FSTORE-1055 Support pandas 2.1.*
FSTORE-1056 Increase sleep time in nightly tests after online inserts to handle heavy concurrent load
FSTORE-1060 Add load tests using concurrent clients that read feature groups and feature views
FSTORE-1061 Add feature group id to serving key prefix and documentation in get_feature_vector
FSTORE-1063 Hopsworks data preview should use arrow flight to retrieve data if available
FSTORE-1064 Improve Spine Group docs
FSTORE-1065 update streaming apis in hopsworks-tutorials
FSTORE-1066 Shared feature store workflows tests failing with permission denied
FSTORE-1067 Feast online scenarios
FSTORE-1069 Foreign key constraint
FSTORE-1070 Tutorial for external Flink client
FSTORE-1078 Support Similarity Search in the Feature Store v1.5
FSTORE-1080 Upgrade hudi to version 0.12.3
FSTORE-1085 Add profiles to HSFS to support multiple Spark version
FSTORE-1088 Bump recommended Databricks runtime version to 12.2
FSTORE-1090 Concepts & Guides for helper columns and on-demand features
FSTORE-1091 Feature monitoring Tutorial
FSTORE-1092 Helper Columns Tutorial
FSTORE-1097 Add user id to workflow test and unit test