1 |
HashiCorp Vault |
Tools |
Adopt |
Manage Secrets and Protect Sensitive DataVault has become a mature product over several iterations. We think you should Adopt it because Vault has become a mature product over several iterations. suggested by:David |
2 |
CD4ML |
Techniques |
Adopt |
Continuous Deployment for Machine Learning is a technique to easily, and fully automated deploy your infra + data + model + code to production. We think you should Adopt it because suggested by:Maarten, Laurens |
3 |
Apache Flink |
Platforms |
Trial |
Unified Streaming/Batch platform for Kappa architectures. Larger companies are moving to Flink (Uber/Spotify/Alibaba)Apache Flink scales to trillions of events and petabytes of data. Apache Flink handles complex stream processing logic like recovery and watermarks without much effort required from the application developer. It is still newer then e.g. Spark and therefore doesn't have as big an ecosystem as Spark has, but its model definitely seems better suited for real-time analytics. We think you should Trial it because Apache Flink scales to trillions of events and petabytes of data. Apache Flink handles complex stream processing logic like recovery and watermarks without much effort required from the application developer. It is still newer then e.g. Spark and therefore doesn't have as big an ecosystem as Spark has, but its model definitely seems better suited for real-time analytics. suggested by:Lennard |
4 |
Data Lake table formats |
Languages and Frameworks |
Trial |
Er zijn verschillende "table format" abstracties voor data in een data lake, zoals Delta Lake, Apache Iceberg en Apache Hudi. Deze oplossingen bieden een abstractielaag bovenop traditionele storage formats zoals Parquet en Avro. Deze table formats bieden functionaliteit zoals ACID transacties, schema evolution en betere performance. We think you should Trial it because suggested by:Thijs |
5 |
Deep Learning Java Library |
Languages and Frameworks |
Trial |
Machine Learning Wrapping library gemaakt door Amazon samen met NetflixHandige machine learning framework, dat production ready is en makkelijk te incorpereren is in de meeste architecturen We think you should Trial it because Handige machine learning framework, dat production ready is en makkelijk te incorpereren is in de meeste architecturen suggested by:Arno |
6 |
Delta lake |
Platforms |
Trial |
Delta Lake is an open-source storage layer, implemented by Databricks, that attempts to bring ACID transactions to big data processing. We think you should Trial it because suggested by:Laurens |
7 |
Okteto |
Platforms |
Trial |
Kubernetes as a Service voor developers ( KIS&S )New kid on the block if you want to try and explore kubernetes We think you should Trial it because New kid on the block if you want to try and explore kubernetes suggested by:Arno |
8 |
Serverless |
Platforms |
Trial |
Serverless architecture We think you should Trial it because suggested by:Robbert |
9 |
Temporal |
Tools |
Trial |
Workflow engine that focusses on relability. https://www.youtube.com/watch?v=f-18XztyN6cDefine retries, rollbacks, cleanup, and even human-in-the-loop steps in the case of failure. With end to end visibility across multiple services.
Reliability, consistency, failure compensation, long running operations, and distributed transactions for your most critical operations. We think you should Trial it because Define retries, rollbacks, cleanup, and even human-in-the-loop steps in the case of failure. With end to end visibility across multiple services.
Reliability, consistency, failure compensation, long running operations, and distributed transactions for your most critical operations. suggested by:David, Pieter, Arno |
10 |
timescaleDB |
Tools |
Trial |
specialized timeseries database on top of postgresqlCombines the stability, features and tooling ecosystem of postgresql with efficient time series ingestion, storage and querying. Allows for combining relational and timeseries data in one database. Outguns influxdb both in performance, features and engineering quality. We think you should Trial it because Combines the stability, features and tooling ecosystem of postgresql with efficient time series ingestion, storage and querying. Allows for combining relational and timeseries data in one database. Outguns influxdb both in performance, features and engineering quality. suggested by:Mathijs |
11 |
APM for CD4ML |
Techniques |
Assess |
Know the health of your project, even after you abandoned it. We think you should Assess it because suggested by:Joep |
12 |
Dapr |
Platforms |
Assess |
Platform/runtime dat via een sidecar docker om je bestaande python scripties heen zorgt voor bijv. state management en opschalingEen makkelijke manier om stateful functionaliteit aan je stateless scripts toe te voegen We think you should Assess it because Een makkelijke manier om stateful functionaliteit aan je stateless scripts toe te voegen suggested by:Steven |
13 |
Datastax Astra ( Cassandra as a Service ) |
Platforms |
Assess |
Cassandra is een fijne database voor opslag van heel veel gegevens waarbij het belankrijk is dat gegevens ook daadwerkelijk opgeslagen worden, het runnen en onderhouden van Cassandra is geen pretje maar dit maakt het wel heel handig.Cassandra as a Service aangeboden door de makers zelf We think you should Assess it because Cassandra as a Service aangeboden door de makers zelf suggested by:Arno |
14 |
Federated Graphql |
Languages and Frameworks |
Assess |
Federation van graphql endpoints We think you should Assess it because suggested by:Pieter |
15 |
foundationDB |
Platforms |
Assess |
ACID distributed key-value database with extensionsVery well tested distributed ACID KV database that supports multi-model things. Their testing methods are really awesome. Well tested in production e.g Snowflake runs on foundationDB/ We think you should Assess it because Very well tested distributed ACID KV database that supports multi-model things. Their testing methods are really awesome. Well tested in production e.g Snowflake runs on foundationDB/ suggested by:Mathijs |
16 |
Gan-based augmentation of data for a richer data set for cnns |
Techniques |
Assess |
Using Gan models to generate images for a CNN to train on in case of low data datasets. We think you should Assess it because suggested by:Jeroen |
17 |
HashiCorp Boundary |
Tools |
Assess |
Secure access to hosts and services We think you should Assess it because suggested by:David |
18 |
HashiCorp Nomad |
Platforms |
Assess |
"Alternative to Kubernetes" We think you should Assess it because suggested by:David |
19 |
Kedro |
Tools |
Assess |
Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code. It borrows concepts from software engineering best-practice and applies them to machine-learning code; applied concepts include modularity, separation of concerns and versioning. We think you should Assess it because suggested by:Tjian |
20 |
Open Policy Agent |
Tools |
Assess |
Open Policy Agent (OPA) is a general purpose policy engine.OPA enables you to simplify security within your organisation. Where previously you would have to interact with multiple security frameworks to ensure authorization is enforced within your environment, with OPA you can create a single policy decision point to validate authorization rules. It does not matter wheter you are securing an API or your Database. We think you should Assess it because OPA enables you to simplify security within your organisation. Where previously you would have to interact with multiple security frameworks to ensure authorization is enforced within your environment, with OPA you can create a single policy decision point to validate authorization rules. It does not matter wheter you are securing an API or your Database. suggested by:Lennard |
21 |
Prefect |
Tools |
Assess |
New light-weight, turns python functions into tasks. Not very fore front. Open core. We think you should Assess it because suggested by:Jeroen |
22 |
PyOD library for outlier detection |
Tools |
Assess |
Not really new, but quite up to date library for Outlier Detection that keeps getting extended by state of the art research methods. Ties together multiple existing libraries such as sklearn, scipy, statsmodels, keras, tensorflow. Pros: ease of use, supports many OD algorithms, optimized using (numba) JIT and parallelization. We think you should Assess it because Pros: ease of use, supports many OD algorithms, optimized using (numba) JIT and parallelization. suggested by:Dennis |
23 |
Redash |
Tools |
Assess |
Redash is zo ver ik weet een dashboard visualiser die runt op queries. Dus je kunt queries gelijk visualiseren en vervolgens een dashboard opstellen van al deze query visualisaties We think you should Assess it because suggested by:Jeroen |
24 |
Rust |
Languages and Frameworks |
Assess |
Rust is a multi-paradigm programming language designed for performance and safety, especially safe concurrency We think you should Assess it because suggested by:David |
25 |
SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model |
Tools |
Trial |
Dit is een python library dat kan helpen op het vlak van explainability van ML modellen. Dit gebeurt op basis van Shapley values uit de game theory (https://en.wikipedia.org/wiki/Shapley_value).
"This is an introduction to explaining machine learning models with Shapley values. Shapley values are a widely used approach from cooperative game theory that come with desirable properties."
Uit de 'readthedocs'. Voorbeeld: https://shap.readthedocs.io/en/latest/example_notebooks/overviews/An%20introduction%20to%20explainable%20AI%20with%20Shapley%20values.html We think you should Trial it because suggested by:Olaf |
26 |
Singer |
Languages and Frameworks |
Assess |
Singer is a specification for transmitting data over stdout. It allows you to create interoperable Taps & Targets for fetching data from a source and forward it to a sink. E.g. fetch data from MySQL and forward it to S3. It removes the need for plumbing code when setting up extract pipelines. We think you should Assess it because suggested by:Lennard |
27 |
Streamlit |
Languages and Frameworks |
Assess |
I am looking for an easy way to create and share interactive data plots, I currently use Django. From the Streamlit site: Streamlit is an open-source Python library that makes it easy to create and share beautiful, custom web apps for machine learning and data science. In just a few minutes you can build and deploy powerful data apps We think you should Assess it because suggested by:Jeff |
28 |
TensorFlow dataset |
Techniques |
Assess |
Het is de "pandas" voor ongestructureerd data. je kan er bloedsnel (6 tot 38 keer sneller) data processen dan de standaard aanpak. Het is parallel, Het wordt al door bedrijven gebruikt en gevraagd / vereist (Schiphol). Geen for loops meer. Alles via lambda en map functies.
Het is voor de cool kids.. (zijn wij de cool kids??)het wordt al meer en meer gepushed vanuit tensorflow, en sommige bedrijven gebruiken het al We think you should Assess it because het wordt al meer en meer gepushed vanuit tensorflow, en sommige bedrijven gebruiken het al suggested by:Willem |
29 |
TensorFlow Extended (TFX) |
Tools |
Assess |
Create and manage ML production pipelines with components created by google. TFX offers proper data tracking, transforming in a computational graph, data drift detection and more.Even though it is not realistic to build the whole pipeline at a client it is good to have knowledge of its components. And when building production ML pipelines try to implement some of the same functionaliies. We think you should Assess it because Even though it is not realistic to build the whole pipeline at a client it is good to have knowledge of its components. And when building production ML pipelines try to implement some of the same functionaliies. suggested by:Tim |
30 |
TileDB |
Tools |
Assess |
sparse multidimensional array database on object storageStoring multidimensional array data (think dataframes but better) on cloud object storage. Currently well suited for things like lidar and geo data. Quite new, not >10years old so treat with caution. We think you should Assess it because Storing multidimensional array data (think dataframes but better) on cloud object storage. Currently well suited for things like lidar and geo data. Quite new, not >10years old so treat with caution. suggested by:Mathijs |
31 |
Weights and Biases |
Tools |
Assess |
MLFlow has become the standard for tracking ML experiments. For all large frameworks it will automatically track the metrics that can be found, and it is easy to configure even more. Weights and Biases aims to do the same and more. The reporting page is easier to customize, the storage can be handled by WandB itself, and the API to extend metrics to be tracked is easy to learn.It is good to know an alternative to MLFlow We think you should Assess it because It is good to know an alternative to MLFlow suggested by:Tim |
32 |
Fast AI |
Languages and Frameworks |
Hold |
High, medium & low level API to pytorch We think you should Hold it because suggested by:Laurens |
33 |
FastAPI |
Tools |
Assess |
FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.6+ based on standard Python type hints.
The key features are:
Fast: Very high performance, on par with NodeJS and Go (thanks to Starlette and Pydantic). One of the fastest Python frameworks available.
Fast to code: Increase the speed to develop features by about 200% to 300%. *
Fewer bugs: Reduce about 40% of human (developer) induced errors. *
Intuitive: Great editor support. Completion everywhere. Less time debugging.
Easy: Designed to be easy to use and learn. Less time reading docs.
Short: Minimize code duplication. Multiple features from each parameter declaration. Fewer bugs.
Robust: Get production-ready code. With automatic interactive documentation.
Standards-based: Based on (and fully compatible with) the open standards for APIs: OpenAPI (previously known as Swagger) and JSON Schema.Setting up API's in Python without much boiler plate code using Open API spec. FastAPI is faster than other frequently used Python alternatives. Define interface as code using Pydantic models. We think you should Assess it because Setting up API's in Python without much boiler plate code using Open API spec. FastAPI is faster than other frequently used Python alternatives. Define interface as code using Pydantic models. suggested by:Tjian, Robbert |
34 |
nix |
Tools |
Hold |
package manager for reproducible, declarative and reliable builds and deploymentsgood language-agnostic and OS-agnostic way to reproducibly build systems and software. you should consider this if you're e.g. writing docker images or building code projects with native dependencies. We think you should Hold it because good language-agnostic and OS-agnostic way to reproducibly build systems and software. you should consider this if you're e.g. writing docker images or building code projects with native dependencies. suggested by:Kiara |
35 |
Facebook Prophet |
Tools |
Hold |
An easy to use and popular time series package.Even though it is popular it does not provide accurate models: even a running average might beat a prophet model in specific cases. Facebook also has given up on it judging by the fact that they launched a new time series package called 'kats' We think you should Hold it because Even though it is popular it does not provide accurate models: even a running average might beat a prophet model in specific cases. Facebook also has given up on it judging by the fact that they launched a new time series package called 'kats' suggested by:Tim |
36 |
Flask |
Tools |
Hold |
The problem with this approach is that there is no data validation, meaning, that we can pass any type of data being it string, tuple, numbers, or any character. This can break the program often and you can imagine if an ML model getting wrong data types, the program will crash. You can create a data checker before passing the values further but it would add up additional work.
The error pages in Flask as simple HTML pages that can raise decoder errors when the API is being called in other applications. There are other issues with Flask such as slow nature, no async, and web sockets support that can speed up the processes, and finally no automated docs generation system. You need to manually design the user interface for the usage and examples of the API. All these issues are resolved in the new framework. We think you should Hold it because suggested by: |
37 |
Julia |
Languages and Frameworks |
Assess |
We think you should Assess it because suggested by:David |
38 |
Airflow |
Tools |
Hold |
Airflow is a platform to programmatically author, schedule and monitor workflows.
Use airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command line utilities make performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues when needed.While this might be a controversial opinion, it is really noticable that Airflow was created before the rise of modern schedulers, such as Kubernetes. This results in complexity maintaining Airflow in production when running Airflow on e.g. Kubernetes. Besides that, the python based execution model leads to conflicts when maintaining a larger number of workflows. One of the current best practices running Airflow in production is to dockerize each individual operator, however at that point you might as well move to a container-native workflow solutions such as Argo Workflows. We think you should Hold it because While this might be a controversial opinion, it is really noticable that Airflow was created before the rise of modern schedulers, such as Kubernetes. This results in complexity maintaining Airflow in production when running Airflow on e.g. Kubernetes. Besides that, the python based execution model leads to conflicts when maintaining a larger number of workflows. One of the current best practices running Airflow in production is to dockerize each individual operator, however at that point you might as well move to a container-native workflow solutions such as Argo Workflows. suggested by: |
39 |
AWS Glue |
Tools |
Hold |
AWS Glue is a cloud service that prepares data for analysis through automated extract, transform, load (ETL) processes. The managed service a simple and cost-effective method for categorizing and managing big data in the enterprise. It provides organizations with a data integration tool that formats information from disparate data sources and organizes it in a central repository, where it can be used to inform business decisions.
Glue uses ETL jobs to extract data from a combination of other Amazon Web Services and incorporates it into data lakes and data warehouses. It uses application programming interfaces (APIs) to transform the extracted data set for integration, and to help users monitor jobs.While in theory Glue sounds like a great solution, after all it removes the need to manage worker nodes and provides a lot of features to simplify ETL processing, it is to good to be true. In reality Glue is inflexible, moves your entire development cycle inside the cloud (with a hefty price tag) and doesn't provide a lot to help maintain pipelines in production. Most Glue projects we have seen that make it to production result in an issue maintaining a stable data platform. While it is nice to get started quickly, it seems to stop there. Most features seem to be aimed at Data Scientist wanting a quick solution for their machine learning projects, and for that case it seems well suited. But do not believe any AWS consultant who tells you this is the solution to your production data pipelines. We think you should Hold it because While in theory Glue sounds like a great solution, after all it removes the need to manage worker nodes and provides a lot of features to simplify ETL processing, it is to good to be true. In reality Glue is inflexible, moves your entire development cycle inside the cloud (with a hefty price tag) and doesn't provide a lot to help maintain pipelines in production. Most Glue projects we have seen that make it to production result in an issue maintaining a stable data platform. While it is nice to get started quickly, it seems to stop there. Most features seem to be aimed at Data Scientist wanting a quick solution for their machine learning projects, and for that case it seems well suited. But do not believe any AWS consultant who tells you this is the solution to your production data pipelines. suggested by: |
40 |
CatBoost |
Tools |
Assess |
Gradient boosting tool for ai |