Raoul Harris
  • Introduction
  • Technical books
    • Data engineering with Alteryx
    • Deep learning in Python
    • Generative AI in action
    • Generative deep learning
    • Outlier analysis
    • Understanding deep learning
    • Understanding machine learning: from theory to algorithms (in progress)
    • Review: Deep learning: foundations and concepts
  • Technical courses
    • Advanced SQL Server masterclass for data analytics
    • Building full-stack apps with AI
    • Complete Cursor
    • DataOps methodology
    • DeepLearning.AI short courses
    • Generative AI for software development
      • Introduction to generative AI for software development
      • Team software engineering with AI
      • AI-powered software and system design
    • Generative AI with large language models
    • Generative pre-trained transformers
    • IBM DevOps and software engineering
      • Introduction to agile development and scrum
      • Introduction to cloud computing
      • Introduction to DevOps
    • Machine learning in production
    • Reinforcement learning specialization
      • Fundamentals of reinforcement learning
      • Sample-based learning methods
      • Prediction and control with function approximation
  • Non-technical books
    • Management skills for everyday life (in progress)
  • Non-technical courses
    • Business communication and effective communication specializations
      • Business writing
      • Graphic design
      • Successful presentation
      • Giving helpful feedback (not started)
      • Communicating effectively in groups (not started)
    • Illinois Tech MBA courses
      • Competitive strategy (in progress)
    • Leading people and teams specialization
      • Inspiring and motivating individuals
      • Managing talent
      • Influencing people
      • Leading teams
Powered by GitBook
On this page
  • Establish DataOps: Prepare for operation
  • Data strategy
  • Data teams
  • Establish DataOps: Optimize for operation
  • Toolchain
  • Baseline
  • Business priorities
  • Iterate DataOps: Know your data
  • Iterate DataOps: Trust your data
  • Iterate DataOps: Use your data
  • Improve DataOps
  1. Technical courses

DataOps methodology

https://www.coursera.org/learn/ibm-data-ops-methodology

This course should probably be a (short) book as the video aspect doesn't really add anything. It often felt like someone just talking through a set of lists (hence the format of the notes below), but some of the lists were useful, and there were plenty of practice questions.

It could have done with going into much more depth on certain topics. For example, there was almost nothing on how you might apply testing or continuous deployment in practice. The course tells you that you should do those things, but gives no insight on how you might go about them or what good or bad practices look like. Overall, this course was a disappointment. A bunch of lists is useful, but it hardly constitutes a course. Time to find a book instead...

It also felt designed for DataOps at a large organization rather than a small business or startup, even though most of the principles should be applicable for both large and small businesses. At times, the extensive structure suggested felt like it would conflict with the goal of reducing cycle times.

Establish DataOps: Prepare for operation

Data strategy

Elements:

  • Business objectives and priorities

  • Measures and KPIs

  • Data requirements

  • Capabilities and architecture

Some considerations:

  • Regulatory requirements

  • Risk

  • Data origin and architecture

  • Data monetization

  • Skills management

  • Data quality

  • Expense management

Data teams

Potential roles in a data team:

  • Chief Data Officer

  • Data Steward

  • Data Quality Analyst

  • Data Engineer

  • Data Scientist

The course suggests using a RACI matrix (or responsibility assignment matrix):

  • Responsible: Who does the work?

  • Accountable: Who is ultimately accountable?

  • Consulted: Who should be consulted?

  • Informed: Who needs to be kept updated?

Establish DataOps: Optimize for operation

Toolchain

Aspects:

  • Version control

  • Automated processes and workflows

  • Data and logic tests

  • Continuous deployment

  • Communication and process management

Baseline

  • Organizational maturity and readiness

    • System and application inventory

    • Standardized terminology

  • Governance and oversight

    • Models and operating standards

      • Data domains and stewardship

      • Business glossary

      • Data classification (internal, confidential, etc.)

      • Reference data (lookups/allowed values)

      • Catalogue

    • Data management and information governance

    • Policies and rules

      • These could be driven by regulatory requirements or by business needs

  • External influences

    • Reputational risk

    • Fines/penalties

    • Third parties and competition

Business priorities

Measuring value:

  • Speed of delivery

  • Meeting requirements

  • Return on effort

Various advice was given on data task KPIs, but a lot of the ideas felt overly bureaucratic for most tasks at more organizations. I could see them providing value for particularly large projects, but most of the time it might be better to take more of an agile and iterative approach with lighter-touch tracking.

The course recommends that about 20% of each sprint be used to reduce technical debt. It also suggests considering which tasks might work well together and putting them in the same sprint.

Iterate DataOps: Know your data

This module covered data discovery and classifying data into taxonomies. Nothing in particular stood out as worth noting.

Iterate DataOps: Trust your data

A golden record consolidates data from multiple sources into a single, reliable version, providing a single source of truth.

Iterate DataOps: Use your data

Data virtualization is a technology that allows applications to access and manipulate data without needing to know their physical location or format. It creates a single virtual layer that integrates data from multiple sources, enabling users to view and interact with the data as if they were in one place, regardless of where they actually reside.

Message-oriented movement refers to the paradigm in software architecture where communication between distributed systems is achieved through messages. This approach emphasizes asynchronous communication, allowing components to operate independently without being tightly coupled.

Improve DataOps

This section basically just said that you should check how well you're doing everything previously discussed.

Last updated 8 months ago