How to Be a Responsible Citizen Data Professional
No doubt you’ve been reminded repeatedly about the need to use caution, and for good reason. How do you move forward with your deliverables and your plans to compete using data without causing a panic?
Assuage the concerns of your IT team
First off, the panic is already happening. IT people generally have a fear of business people, and you are now “playing” in their sandbox. They are the ones getting alerts from the system when resources are taxed, or something goes down, or some other situation that’s not ideal. They are often responding to multiple issues simultaneously: it’s the nature of their job. And these issues go above and beyond their scheduled project deliverables that they have to meet. You can still help mitigate the panic; there are strategies you can employ to ensure you are not melting data centers and you’re getting your work done, while giving your IT counterparts peace of mind.
It’s all about responsibility
It may seem obvious, but the first hurdle to you and IT sailing away into the data horizon together is responsibility. Demonstrating responsibility is a good first step in gaining trust and creating rapport, which will grow into peer relationships between business and technology. I think of it as taking Dad’s car out for the first time – everyone is nervous about it. And bringing the car back with a dent in the side and an empty tank of gas will endear you to no one. How do you establish yourself as a responsible CDP?
Understand the environment
The first and possibly most important area of responsibility is understanding the data landscape. An easy way to get a rundown of the environment is to ask your IT person, “What’s the worst that could happen?” It is vital that you know and can speak to the impacts of “experimenting,” which is what you will be doing – constantly. That’s why it’s called data discovery. Being able to convey your comprehension of the environment will earn you miles of trust with your technology team.
Know how much data you really need
Environment is a broad term, the two important areas of concern being the source and the target. The data source (Sharepoint, Salesforce, Oracle, Google AdWords, Google Analytics, and so on) may or may not throttle your consumption of resources. So, when you go to acquire that data, be sure you consider what you are asking for. Are you asking for all sales interactions since inception every time you retrieve data? Do you need to? Internally hosted applications are particularly susceptible to being overloaded, which is a big reason technology is nervous about granting access. Speaking to the concept of “smart queries“ can be a tremendous help in gaining trust.
Use resources wisely
On the other side of things, the target also has the potential for over-consumption. Start from the beginning with a data set that is as small as possible; this will help manage performance. The more data there is, the more processing is required to get to your answer. There could be unintended consequences if your cloud data warehouse (CDW) is supporting multiple users and applications and everyone wants to run resource-intensive processes. This is exactly what your CDW is designed to accommodate, but just because we can use the massive processing power of the CDW, doesn’t mean we should. Being respectful and cognizant of resources will help keep a lid on costs, too.
Embrace the learning curve
The second hurdle to get over is gaining technical knowledge and getting up to speed. The good news is, you know more than you think you do.
First, you’ll be surprised how far your professional knowledge goes to help you through the learning curve of becoming technically proficient. All your knowledge of the systems, data and business processes, leading to your life as a CDP, will help you rapidly learn and apply new technical skills. Focus on what you know, and keep it really, really simple while you explore. Let your existing knowledge help you discover a new skill.
We put a lot of stock into learning this tool or that, and I might be forever shunned from the ETL community for what I am about to say. But in the end, you’re just moving data from point A to point B, then putting it into a format that is hopefully meaningful. That’s what ETL tools do. Matillion has loads of additional features such as a Scheduler, Versioning, Document Generation, Audit Logging, and so on, but ultimately you’ll be working with data, so that is what you should focus on: getting your data into the cloud, and transforming it.
Let your knowledge build organically – and ask questions
Learning data skills is like learning Outlook. I don’t recall ever taking a class, or having someone show me how to send an email. I learned over time, with experience and dialogue. As my needs grew, my knowledge grew.
Building rapport with the technology team members who use the tool could really be handy while you learn. Knowing how to ask specific questions (keeping in mind the responsibility tenet) can make a huge difference. “Hey <Tech Person> – I am trying to get data out of <blah system>, but am having some troubles and am concerned about overloading the <blah system>; do you have any examples you could share that might help me out?” There is very little in the programming world that is original, and chances are pretty good that someone has already done something similar to what you are trying to accomplish.
Data literacy 101
The ability to learn on the fly is a wonderful skill. That said, there are some concepts that you’ll want to get your head around before you get started.
- Database terminology –such as Table, Join, Data Type, Fact vs Dimension, and Key – is good to have a grasp of. Each of those terms preceded by “database” in a google search will yield months of bedtime stories.
- Some basic programming concepts are worth skimming as well, such as timestamp, which is particularly nerve-racking and useful.
- Batch processing is another subject worth understanding and will send you down a rabbit hole if you let it.
No doubt you’ll be interacting with foreign topics at the onset, so it’s helpful to have some exposure to what they are at least at a high-level. Knowing the context of the data you are already an expert in will help a lot with the initial learning curve. Nothing I have mentioned so far is all that complex; it’s certainly easier than trying to interpret the tax code.
Don’t reinvent the wheel
Once you get into your CDW, chances are you’ll find data there that’s already been created by others. You may or may not have access to read from them all, but it is helpful from a reference perspective (cheat off your neighbor’s exam), and also helps from a redundancy perspective. A good portion of the reference data you need to make sense of your new data might already be there. If you can get a “tour” through the existing landscape, this will give you a huge advantage when building out your content, both with re-use, but also with a reference of how to build things out. For example, there may be naming conventions and taxonomy that are important to know.
Know your data best practices
The last, and maybe most important thing to study up on is best practices. Data best practices will help your “code” last the test of time and hopefully help you avoid some headaches. Interestingly, if you start with best practices, you might get the relevant database and programming topics simultaneously. As you take your tour of the CDW, ask your new “tech friends” about the patterns and best practices they employ and why they use them. It could make all the difference in your experience, and theirs.
Time to get started
Data is a commodity. Everything makes data these days: stoves, cars, payroll systems, phones and so on. The next logical evolutionary step is doing something with all that data. And once you’ve gained a little knowledge and found some new friends in the data department, you’ll be amazed at how quickly you’ll be able to put all that data to work and come up with more questions and even more answers than you ever . What’s the worst that could happen?
To learn more about how Matillion ETL can empower data teams and citizen data professionals to do more with data, request a demo.
10 Best Practices for Maintaining Data Pipelines
Mastering Data Pipeline Maintenance: A Comprehensive GuideBeyond ...News
Matillion Adds AI Power to Pipelines with Amazon Bedrock
Data Productivity Cloud adds Amazon Bedrock to no-code generative ...Blog
Data Mesh vs. Data Fabric: Which Approach Is Right for Your Organization? Part 3
In our recent exploration, we've thoroughly analyzed two key ...