web analytics

Occupational Hazard In Data Analytics?

Whenever I had a chance during my lecture, I will advise my students that based on my experience, don’t be overconfident on your results and conclusions that the decision-makers will accept your analysis without any sort of argument and confrontation during the presentation.

We as the analyst, have to equip our selves not just with the technical skills but also soft skills, i.e. communication and political skills. Political skill is defined as: “The ability to effectively understand others at work, and to use such knowledge to influence others to act in ways that enhance one’s personal and/or organizational objectives” (Ferris, Treadway et al., 2005).

In the context of data science, the ‘others’ are usually the end-user of your data product, and whoever will be affected by the decisions made that will be greatly influenced by your analysis.

Why? Because our conclusions are derived from facts, from a single source, a single point of truth. We don’t assume or judge, we show patterns and insights which are unknown, hidden in the huge data. The analyst doesn’t have any hidden agenda. Just doing our job. What we present is the result of applying an algorithm. But, not everybody is going to easily accept, appreciate, and like your analysis. Why? To answer the second why is not easy. Usually is not about you. Nothing personal. It has been always ‘business’. To answer the second why question, I guess it is where some little readings on science politics and behavioural science will become handy.

For example, a good case study is the Rebekah Jones case. Ms. Jones, she is a real person (this case is not a fiction), who was the architect and manager of Florida’s COVID-19 dashboard. Her dashboard was fantastic. Florida’s COVID-19 dashboard, created by a team of Florida Department of Health data scientists and public health officers led by Jones, was praised by White House officials for its accessibility. Jones packaged data for academic and private researchers who are interested to do predictive analytics and explore impacts based on the collected data.

However, the US was divided between to stay closed (just like Malaysia’s Movement Control Order) or to open their states and do businesses as usual. As Florida starts to reopen, Ms. Jones announced she’d been removed from her position. Rebekah Jones said in an email to CBS12 News that her removal was “not voluntary” and that she was removed from her position because she was ordered to censor some data, but refused to “manually change data to drum up support for the plan to reopen. For further information, please read the news at CBS and Florida Today.

And see what The Daily Show did…

“Data doesn’t lie, but now you can …. – Excel: Coronavirus Edition

The Rebekah Jones case is one living example of the occupational hazard of a data scientist, who suffers as a result of doing her job. As a result of being truthful and showing bravery to uphold the highest standard of integrity. May God bless her and hope we will have the same courage to ‘politely refuse’ to take orders that will ‘modify’ data, that will ‘modify’ the truth.

Understanding Terms in Data Analysis

Data analysis is a process of inspecting, cleansing, transforming and modeling data with the goal of discovering useful information, informing conclusions and supporting decision-making.

Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names.

Data mining is a particular data analysis technique that focuses on statistical modeling and knowledge discovery for predictive rather than purely descriptive purposes.

Business intelligence covers data analysis that relies heavily on aggregation, focusing mainly on business information.

In statistical applications, data analysis can be divided into descriptive statistics, exploratory data analysis (EDA), and confirmatory data analysis (CDA).

EDA focuses on discovering new features in the data while CDA focuses on confirming or falsifying existing hypotheses.

Predictive analytics focuses on the application of statistical models for predictive forecasting or classification.

Text analytics applies statistical, linguistic, and structural techniques to extract and classify information from textual sources, a species of unstructured data.

All of the above are varieties of data analysis.

Source: Wikipedia