Tech Radar

ID	Naam	Categorie	Status
1	HashiCorp Vault	Tools	Adopt
Manage Secrets and Protect Sensitive DataVault has become a mature product over several iterations. We think you should Adopt it because Vault has become a mature product over several iterations. suggested by:David
2	CD4ML	Techniques	Adopt
Continuous Deployment for Machine Learning is a technique to easily, and fully automated deploy your infra + data + model + code to production. We think you should Adopt it because suggested by:Maarten, Laurens
3	Apache Flink	Platforms	Trial
Unified Streaming/Batch platform for Kappa architectures. Larger companies are moving to Flink (Uber/Spotify/Alibaba)Apache Flink scales to trillions of events and petabytes of data. Apache Flink handles complex stream processing logic like recovery and watermarks without much effort required from the application developer. It is still newer then e.g. Spark and therefore doesn't have as big an ecosystem as Spark has, but its model definitely seems better suited for real-time analytics. We think you should Trial it because Apache Flink scales to trillions of events and petabytes of data. Apache Flink handles complex stream processing logic like recovery and watermarks without much effort required from the application developer. It is still newer then e.g. Spark and therefore doesn't have as big an ecosystem as Spark has, but its model definitely seems better suited for real-time analytics. suggested by:Lennard
4	Data Lake table formats	Languages and Frameworks	Trial
Er zijn verschillende "table format" abstracties voor data in een data lake, zoals Delta Lake, Apache Iceberg en Apache Hudi. Deze oplossingen bieden een abstractielaag bovenop traditionele storage formats zoals Parquet en Avro. Deze table formats bieden functionaliteit zoals ACID transacties, schema evolution en betere performance. We think you should Trial it because suggested by:Thijs
5	Deep Learning Java Library	Languages and Frameworks	Trial
Machine Learning Wrapping library gemaakt door Amazon samen met NetflixHandige machine learning framework, dat production ready is en makkelijk te incorpereren is in de meeste architecturen We think you should Trial it because Handige machine learning framework, dat production ready is en makkelijk te incorpereren is in de meeste architecturen suggested by:Arno
6	Delta lake	Platforms	Trial
Delta Lake is an open-source storage layer, implemented by Databricks, that attempts to bring ACID transactions to big data processing. We think you should Trial it because suggested by:Laurens
7	Okteto	Platforms	Trial
Kubernetes as a Service voor developers ( KIS&S )New kid on the block if you want to try and explore kubernetes We think you should Trial it because New kid on the block if you want to try and explore kubernetes suggested by:Arno
8	Serverless	Platforms	Trial
Serverless architecture We think you should Trial it because suggested by:Robbert
9	Temporal	Tools	Trial
Workflow engine that focusses on relability. https://www.youtube.com/watch?v=f-18XztyN6cDefine retries, rollbacks, cleanup, and even human-in-the-loop steps in the case of failure. With end to end visibility across multiple services. Reliability, consistency, failure compensation, long running operations, and distributed transactions for your most critical operations. We think you should Trial it because Define retries, rollbacks, cleanup, and even human-in-the-loop steps in the case of failure. With end to end visibility across multiple services. Reliability, consistency, failure compensation, long running operations, and distributed transactions for your most critical operations. suggested by:David, Pieter, Arno
10	timescaleDB	Tools	Trial
specialized timeseries database on top of postgresqlCombines the stability, features and tooling ecosystem of postgresql with efficient time series ingestion, storage and querying. Allows for combining relational and timeseries data in one database. Outguns influxdb both in performance, features and engineering quality. We think you should Trial it because Combines the stability, features and tooling ecosystem of postgresql with efficient time series ingestion, storage and querying. Allows for combining relational and timeseries data in one database. Outguns influxdb both in performance, features and engineering quality. suggested by:Mathijs
11	APM for CD4ML	Techniques	Assess
Know the health of your project, even after you abandoned it. We think you should Assess it because suggested by:Joep
12	Dapr	Platforms	Assess
Platform/runtime dat via een sidecar docker om je bestaande python scripties heen zorgt voor bijv. state management en opschalingEen makkelijke manier om stateful functionaliteit aan je stateless scripts toe te voegen We think you should Assess it because Een makkelijke manier om stateful functionaliteit aan je stateless scripts toe te voegen suggested by:Steven
13	Datastax Astra ( Cassandra as a Service )	Platforms	Assess
Cassandra is een fijne database voor opslag van heel veel gegevens waarbij het belankrijk is dat gegevens ook daadwerkelijk opgeslagen worden, het runnen en onderhouden van Cassandra is geen pretje maar dit maakt het wel heel handig.Cassandra as a Service aangeboden door de makers zelf We think you should Assess it because Cassandra as a Service aangeboden door de makers zelf suggested by:Arno
14	Federated Graphql	Languages and Frameworks	Assess
Federation van graphql endpoints We think you should Assess it because suggested by:Pieter
15	foundationDB	Platforms	Assess
ACID distributed key-value database with extensionsVery well tested distributed ACID KV database that supports multi-model things. Their testing methods are really awesome. Well tested in production e.g Snowflake runs on foundationDB/ We think you should Assess it because Very well tested distributed ACID KV database that supports multi-model things. Their testing methods are really awesome. Well tested in production e.g Snowflake runs on foundationDB/ suggested by:Mathijs
16	Gan-based augmentation of data for a richer data set for cnns	Techniques	Assess
Using Gan models to generate images for a CNN to train on in case of low data datasets. We think you should Assess it because suggested by:Jeroen
17	HashiCorp Boundary	Tools	Assess
Secure access to hosts and services We think you should Assess it because suggested by:David
18	HashiCorp Nomad	Platforms	Assess
"Alternative to Kubernetes" We think you should Assess it because suggested by:David
19	Kedro	Tools	Assess
Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code. It borrows concepts from software engineering best-practice and applies them to machine-learning code; applied concepts include modularity, separation of concerns and versioning. We think you should Assess it because suggested by:Tjian
20	Open Policy Agent	Tools	Assess
Open Policy Agent (OPA) is a general purpose policy engine.OPA enables you to simplify security within your organisation. Where previously you would have to interact with multiple security frameworks to ensure authorization is enforced within your environment, with OPA you can create a single policy decision point to validate authorization rules. It does not matter wheter you are securing an API or your Database. We think you should Assess it because OPA enables you to simplify security within your organisation. Where previously you would have to interact with multiple security frameworks to ensure authorization is enforced within your environment, with OPA you can create a single policy decision point to validate authorization rules. It does not matter wheter you are securing an API or your Database. suggested by:Lennard
21	Prefect	Tools	Assess
New light-weight, turns python functions into tasks. Not very fore front. Open core. We think you should Assess it because suggested by:Jeroen
22	PyOD library for outlier detection	Tools	Assess
Not really new, but quite up to date library for Outlier Detection that keeps getting extended by state of the art research methods. Ties together multiple existing libraries such as sklearn, scipy, statsmodels, keras, tensorflow. Pros: ease of use, supports many OD algorithms, optimized using (numba) JIT and parallelization. We think you should Assess it because Pros: ease of use, supports many OD algorithms, optimized using (numba) JIT and parallelization. suggested by:Dennis
23	Redash	Tools	Assess
Redash is zo ver ik weet een dashboard visualiser die runt op queries. Dus je kunt queries gelijk visualiseren en vervolgens een dashboard opstellen van al deze query visualisaties We think you should Assess it because suggested by:Jeroen
24	Rust	Languages and Frameworks	Assess
Rust is a multi-paradigm programming language designed for performance and safety, especially safe concurrency We think you should Assess it because suggested by:David
25	SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model	Tools	Trial
Dit is een python library dat kan helpen op het vlak van explainability van ML modellen. Dit gebeurt op basis van Shapley values uit de game theory (https://en.wikipedia.org/wiki/Shapley_value). "This is an introduction to explaining machine learning models with Shapley values. Shapley values are a widely used approach from cooperative game theory that come with desirable properties." Uit de 'readthedocs'. Voorbeeld: https://shap.readthedocs.io/en/latest/example_notebooks/overviews/An%20introduction%20to%20explainable%20AI%20with%20Shapley%20values.html We think you should Trial it because suggested by:Olaf
26	Singer	Languages and Frameworks	Assess
Singer is a specification for transmitting data over stdout. It allows you to create interoperable Taps & Targets for fetching data from a source and forward it to a sink. E.g. fetch data from MySQL and forward it to S3. It removes the need for plumbing code when setting up extract pipelines. We think you should Assess it because suggested by:Lennard
27	Streamlit	Languages and Frameworks	Assess
I am looking for an easy way to create and share interactive data plots, I currently use Django. From the Streamlit site: Streamlit is an open-source Python library that makes it easy to create and share beautiful, custom web apps for machine learning and data science. In just a few minutes you can build and deploy powerful data apps We think you should Assess it because suggested by:Jeff
28	TensorFlow dataset	Techniques	Assess
Het is de "pandas" voor ongestructureerd data. je kan er bloedsnel (6 tot 38 keer sneller) data processen dan de standaard aanpak. Het is parallel, Het wordt al door bedrijven gebruikt en gevraagd / vereist (Schiphol). Geen for loops meer. Alles via lambda en map functies. Het is voor de cool kids.. (zijn wij de cool kids??)het wordt al meer en meer gepushed vanuit tensorflow, en sommige bedrijven gebruiken het al We think you should Assess it because het wordt al meer en meer gepushed vanuit tensorflow, en sommige bedrijven gebruiken het al suggested by:Willem
29	TensorFlow Extended (TFX)	Tools	Assess
Create and manage ML production pipelines with components created by google. TFX offers proper data tracking, transforming in a computational graph, data drift detection and more.Even though it is not realistic to build the whole pipeline at a client it is good to have knowledge of its components. And when building production ML pipelines try to implement some of the same functionaliies. We think you should Assess it because Even though it is not realistic to build the whole pipeline at a client it is good to have knowledge of its components. And when building production ML pipelines try to implement some of the same functionaliies. suggested by:Tim
30	TileDB	Tools	Assess
sparse multidimensional array database on object storageStoring multidimensional array data (think dataframes but better) on cloud object storage. Currently well suited for things like lidar and geo data. Quite new, not >10years old so treat with caution. We think you should Assess it because Storing multidimensional array data (think dataframes but better) on cloud object storage. Currently well suited for things like lidar and geo data. Quite new, not >10years old so treat with caution. suggested by:Mathijs
31	Weights and Biases	Tools	Assess
MLFlow has become the standard for tracking ML experiments. For all large frameworks it will automatically track the metrics that can be found, and it is easy to configure even more. Weights and Biases aims to do the same and more. The reporting page is easier to customize, the storage can be handled by WandB itself, and the API to extend metrics to be tracked is easy to learn.It is good to know an alternative to MLFlow We think you should Assess it because It is good to know an alternative to MLFlow suggested by:Tim
32	Fast AI	Languages and Frameworks	Hold
High, medium & low level API to pytorch We think you should Hold it because suggested by:Laurens
33	FastAPI	Tools	Assess
FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.6+ based on standard Python type hints. The key features are: Fast: Very high performance, on par with NodeJS and Go (thanks to Starlette and Pydantic). One of the fastest Python frameworks available. Fast to code: Increase the speed to develop features by about 200% to 300%. * Fewer bugs: Reduce about 40% of human (developer) induced errors. * Intuitive: Great editor support. Completion everywhere. Less time debugging. Easy: Designed to be easy to use and learn. Less time reading docs. Short: Minimize code duplication. Multiple features from each parameter declaration. Fewer bugs. Robust: Get production-ready code. With automatic interactive documentation. Standards-based: Based on (and fully compatible with) the open standards for APIs: OpenAPI (previously known as Swagger) and JSON Schema.Setting up API's in Python without much boiler plate code using Open API spec. FastAPI is faster than other frequently used Python alternatives. Define interface as code using Pydantic models. We think you should Assess it because Setting up API's in Python without much boiler plate code using Open API spec. FastAPI is faster than other frequently used Python alternatives. Define interface as code using Pydantic models. suggested by:Tjian, Robbert
34	nix	Tools	Hold
package manager for reproducible, declarative and reliable builds and deploymentsgood language-agnostic and OS-agnostic way to reproducibly build systems and software. you should consider this if you're e.g. writing docker images or building code projects with native dependencies. We think you should Hold it because good language-agnostic and OS-agnostic way to reproducibly build systems and software. you should consider this if you're e.g. writing docker images or building code projects with native dependencies. suggested by:Kiara
35	Facebook Prophet	Tools	Hold
An easy to use and popular time series package.Even though it is popular it does not provide accurate models: even a running average might beat a prophet model in specific cases. Facebook also has given up on it judging by the fact that they launched a new time series package called 'kats' We think you should Hold it because Even though it is popular it does not provide accurate models: even a running average might beat a prophet model in specific cases. Facebook also has given up on it judging by the fact that they launched a new time series package called 'kats' suggested by:Tim
36	Flask	Tools	Hold
The problem with this approach is that there is no data validation, meaning, that we can pass any type of data being it string, tuple, numbers, or any character. This can break the program often and you can imagine if an ML model getting wrong data types, the program will crash. You can create a data checker before passing the values further but it would add up additional work. The error pages in Flask as simple HTML pages that can raise decoder errors when the API is being called in other applications. There are other issues with Flask such as slow nature, no async, and web sockets support that can speed up the processes, and finally no automated docs generation system. You need to manually design the user interface for the usage and examples of the API. All these issues are resolved in the new framework. We think you should Hold it because suggested by:
37	Julia	Languages and Frameworks	Assess
We think you should Assess it because suggested by:David
38	Airflow	Tools	Hold
Airflow is a platform to programmatically author, schedule and monitor workflows. Use airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command line utilities make performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues when needed.While this might be a controversial opinion, it is really noticable that Airflow was created before the rise of modern schedulers, such as Kubernetes. This results in complexity maintaining Airflow in production when running Airflow on e.g. Kubernetes. Besides that, the python based execution model leads to conflicts when maintaining a larger number of workflows. One of the current best practices running Airflow in production is to dockerize each individual operator, however at that point you might as well move to a container-native workflow solutions such as Argo Workflows. We think you should Hold it because While this might be a controversial opinion, it is really noticable that Airflow was created before the rise of modern schedulers, such as Kubernetes. This results in complexity maintaining Airflow in production when running Airflow on e.g. Kubernetes. Besides that, the python based execution model leads to conflicts when maintaining a larger number of workflows. One of the current best practices running Airflow in production is to dockerize each individual operator, however at that point you might as well move to a container-native workflow solutions such as Argo Workflows. suggested by:
39	AWS Glue	Tools	Hold
AWS Glue is a cloud service that prepares data for analysis through automated extract, transform, load (ETL) processes. The managed service a simple and cost-effective method for categorizing and managing big data in the enterprise. It provides organizations with a data integration tool that formats information from disparate data sources and organizes it in a central repository, where it can be used to inform business decisions. Glue uses ETL jobs to extract data from a combination of other Amazon Web Services and incorporates it into data lakes and data warehouses. It uses application programming interfaces (APIs) to transform the extracted data set for integration, and to help users monitor jobs.While in theory Glue sounds like a great solution, after all it removes the need to manage worker nodes and provides a lot of features to simplify ETL processing, it is to good to be true. In reality Glue is inflexible, moves your entire development cycle inside the cloud (with a hefty price tag) and doesn't provide a lot to help maintain pipelines in production. Most Glue projects we have seen that make it to production result in an issue maintaining a stable data platform. While it is nice to get started quickly, it seems to stop there. Most features seem to be aimed at Data Scientist wanting a quick solution for their machine learning projects, and for that case it seems well suited. But do not believe any AWS consultant who tells you this is the solution to your production data pipelines. We think you should Hold it because While in theory Glue sounds like a great solution, after all it removes the need to manage worker nodes and provides a lot of features to simplify ETL processing, it is to good to be true. In reality Glue is inflexible, moves your entire development cycle inside the cloud (with a hefty price tag) and doesn't provide a lot to help maintain pipelines in production. Most Glue projects we have seen that make it to production result in an issue maintaining a stable data platform. While it is nice to get started quickly, it seems to stop there. Most features seem to be aimed at Data Scientist wanting a quick solution for their machine learning projects, and for that case it seems well suited. But do not believe any AWS consultant who tells you this is the solution to your production data pipelines. suggested by:
40	CatBoost	Tools	Assess
Gradient boosting tool for ai