DataOps methodology
https://www.coursera.org/learn/ibm-data-ops-methodology
This course should probably be a (short) book as the video aspect doesn't really add anything. It often felt like someone just talking through a set of lists (hence the format of the notes below), but some of the lists were useful, and there were plenty of practice questions.
It could have done with going into much more depth on certain topics. For example, there was almost nothing on how you might apply testing or continuous deployment in practice. The course tells you that you should do those things, but gives no insight on how you might go about them or what good or bad practices look like. Overall, this course was a disappointment. A bunch of lists is useful, but it hardly constitutes a course. Time to find a book instead...
It also felt designed for DataOps at a large organization rather than a small business or startup, even though most of the principles should be applicable for both large and small businesses. At times, the extensive structure suggested felt like it would conflict with the goal of reducing cycle times.
Establish DataOps: Prepare for operation
Data strategy
Elements:
Business objectives and priorities
Measures and KPIs
Data requirements
Capabilities and architecture
Some considerations:
Regulatory requirements
Risk
Data origin and architecture
Data monetization
Skills management
Data quality
Expense management
Data teams
Potential roles in a data team:
Chief Data Officer
Data Steward
Data Quality Analyst
Data Engineer
Data Scientist
The course suggests using a RACI matrix (or responsibility assignment matrix):
Responsible: Who does the work?
Accountable: Who is ultimately accountable?
Consulted: Who should be consulted?
Informed: Who needs to be kept updated?
Establish DataOps: Optimize for operation
Toolchain
Aspects:
Version control
Automated processes and workflows
Data and logic tests
Continuous deployment
Communication and process management
Baseline
Organizational maturity and readiness
System and application inventory
Standardized terminology
Governance and oversight
Models and operating standards
Data domains and stewardship
Business glossary
Data classification (internal, confidential, etc.)
Reference data (lookups/allowed values)
Catalogue
Data management and information governance
Policies and rules
These could be driven by regulatory requirements or by business needs
External influences
Reputational risk
Fines/penalties
Third parties and competition
Business priorities
Measuring value:
Speed of delivery
Meeting requirements
Return on effort
Various advice was given on data task KPIs, but a lot of the ideas felt overly bureaucratic for most tasks at more organizations. I could see them providing value for particularly large projects, but most of the time it might be better to take more of an agile and iterative approach with lighter-touch tracking.
The course recommends that about 20% of each sprint be used to reduce technical debt. It also suggests considering which tasks might work well together and putting them in the same sprint.
Iterate DataOps: Know your data
This module covered data discovery and classifying data into taxonomies. Nothing in particular stood out as worth noting.
Iterate DataOps: Trust your data
A golden record consolidates data from multiple sources into a single, reliable version, providing a single source of truth.
Iterate DataOps: Use your data
Data virtualization is a technology that allows applications to access and manipulate data without needing to know their physical location or format. It creates a single virtual layer that integrates data from multiple sources, enabling users to view and interact with the data as if they were in one place, regardless of where they actually reside.
Message-oriented movement refers to the paradigm in software architecture where communication between distributed systems is achieved through messages. This approach emphasizes asynchronous communication, allowing components to operate independently without being tightly coupled.
Improve DataOps
This section basically just said that you should check how well you're doing everything previously discussed.
Last updated