Data engineering with Alteryx
The obvious question that this book has to answer is "Why would you use Alteryx for data engineering?" It gives three main reasons:
Speed of development
Iterative workflow development
Self-documentation
Code-based approaches are also self-documenting, and I see no reason why they can't be developed iteratively either, but my experience is that the no-code approach with a visual canvas and the ability to easily inspect the data at any point in the workflow does lead to significantly faster development than code-based solutions. This may change over time as LLMs improve (see Complete Cursor and Generative AI for software development).
To those points, I would also add the ease of picking it up and the convenience of having almost everything that you need available in a single program (particularly beneficial in organizations where it is difficult to get IT approval for new programs or packages). For many companies, its advantages will outweigh its disadvantages (such as uninformative diffs in version control).
This isn't a bad book overall, but its general focus is on step-by-step instructions for how to do things in Alteryx (accompanied by a lot of screenshots), with very little time spent on how you can use Alteryx in accordance with general data engineering/DataOps principles and best practices. Given that I can just check the documentation when I want to know how to do something, I would have preferred a 20-page book that stuck to what you should do and why rather than one that spreads 20 pages' worth of information over more than 300 pages.
Part 1: Introduction
Tools:
Alteryx Designer for data transformation
Alteryx Server for automation and scheduling
Alteryx Connect for cataloguing and discovery
Suggested best practices for workflows:
Supplement the automatic annotations
Use tool containers to group logic
Use comment and explorer boxes
This chapter defines data engineering and gives some examples of using Designer, Server, and Connect. Nothing particularly stood out as worthy of mentioning here.
DataOps is basically DevOps applied to analytics workflows. According to the book, it promises:
Faster cycle times
Faster access to actionable insights
Improved robustness of data processes (e.g., through statistical process control)
The ability to see the entire data flow in a workflow
Strong security and confidence
The book also discusses the principles from https://dataopsmanifesto.org/en/. There are quite a few of them, so see either the link or the expandable box below.
Part 2: Functional steps in DataOps
Much of this section covers basic functionality, which will be useful for new Alteryx users but will already be familiar to people that have been using the software for a while. It does suggest some best practices though:
Use the Field Info tool to check for changes in file structure
Consider Dynamic Select and Dynamic Rename over Select
Consider whether blanks should actually be
nulls
Use comments, containers, and annotations to make the workflow easier to follow
Remember to profile the data and use exploratory data analysis
Consider whether to impute missing values
Field types can be extracted to a .yxft file (useful for replacing Auto Field with Select)
Use the Message and Test tools to flag errors
Use relative paths or Universal Naming Convention paths rather than absolute paths
Check the Workflow Dependencies window, which not only lists dependencies, but also allows you to test them and switch them between absolute, relative, and UNC
Use a secrets file or environment variables
Part 3: Governance of DataOps
The book suggests using the community-developed CReW macros to complement the Message and Test tools for testing, though for some reason it capitalizes the name in two different ways, neither of which are correct.
You can use GitHub Actions to run test scripts to do things like check for missing metadata, as well as to push the updates to Alteryx Server using the Alteryx Server API. No suggestions are offered on how to use them to check that your workflows actually work.
The book concludes with chapters on security/permissions and data cataloguing/discovery with Alteryx Connect. Like the rest of the book, there's a lot of screenshots and stuff that you would expect to find in the documentation, but not much in the way of general principles.
Last updated