AI Hype and its effects on a knowledge worker

This is not going to be a brief history of AI development and how it impacts knowledge workers especially in the technology industry. I am going to jump directly to how it has impacted me. I am a Data Analyst by profession. The first time I experienced the “magic” of AI was when I created my OpenAI account, in early 2022, just to see what all the hype about “ChatGPT” was. I was blown away. So much so that I started using it regularly for coding projects and writing code for automating a lot of boring stuff in my job. I started reading articles about Sam Altman and watched him rise into the zeitgeist as some kind of a “prophet”. I was impressed with the guy. I gradually started using ChatGPT for pretty much everything: cooking recipes, career advice, opinions, research, advice, ideas…

Then came the AI hype storm. 2023 and all anyone could talk about was AI. Roughly mid year 2023, and pretty much every third software service provider was selling some kind of “AI” based product. Every week there was a new model coming out from the MAANG companies. Some open source, most closed. Every week there was some new benchmark test bragging about the strength of one model over the other. By the end of 2023, I felt that I had missed the AI Hype train. Then came the LinkedIn posts, the YouTube shorts, the tutorial storm, the courses…

2024 and I had started feeling a little worried. Senior leadership in my company had started talking about “AI”. There was no meeting in which AI was not discussed. Everyone was talking about it. No one had a clue about how to incorporate it. Including me. No shame in admitting it.

The latter half of 2024, I lost my job, along with many others. Cost cutting measures, we were told. But to be honest, the AI hype had derailed our product offering to the world. Because now everyone wanted to do “something with AI”. While not a direct loss, I had lost my job to AI. In a way. At the same time, while job hunting, I was using AI to sharpen my skill set, learn new technologies, write blogs, develop projects, prepare for interviews, modify my resume, build cover letters. AI was helping me find a job and land interviews. I was also slowly realizing how powerful this tool was.

In the right hands, AI is an incredibly powerful tool. In the wrong hands, AI is an incredibly powerful weapon. Very generic statement, but it’s the truth. But let’s come back to my little world. What really worries me is that companies are going to use it to automate everything. But the true control of the tool will be limited only to a few companies. Everyone else, will depend on them. These companies are OpenAI, Microsoft, Google, Amazon and Meta. Open source models will give us some home but not for long. Leaders of these companies will keep echoing the optimism that AI will change the world for better. People like Sam Altman will continue to say an optimistic thing one day and a pessimistic thing another day. We will continue to hope that AI will change the world for the better. For some time.

While I do see how incredibly powerful it is, AI tools don’t seem to be solving our biggest issues. Or atleast no one is interested in solving the big problems using AI. I don’t hear about a any new models being trained to detect cancers or identify new chemicals or solve healthcare insurance problems or identify the best way to reforest land or identify new ways to build buildings etc. All I hear or read about is how AI can do your job better. Or how one can use AI to earn money. Or how companies can use AI to save money. Money. That is all I ever hear about.

I am not worried about “the singularity”. I think it is a very vague term. I say, let it happen. Let it come. Sooner than later. Let there be a “sentient” AI. Let it happen now rather than when we are all dependent on AI for all our knowledge needs. Will it be the end of us? I don’t know. I hope not. Will it save us? I hope not. I think we should be able to that ourselves. Also, it doesn’t seem so.

But, can it help us? Can it help us be better? I hope so.

Data Analyst Toolbox: Calculation and Use Cases of Variance and Standard Deviation

Disclaimer: I have taken the help of ChatGPT to describe some of the more technical concepts in this write up, like definitions and explanations. Everything else, are my words.


In this series of posts I will try to build out a toolbox for data analysts. A basic understanding of statistics is critical for any data analyst or data scientist. While statistics as a subject is vast and capable of inducing cold sweats in most people due to its seemingly complex nature, it is also misunderstood and perhaps the most used concepts are simple enough for a person with basic math skills.

For example, today’s topic of basic descriptive statistics. Mean, median, mode, range, variance and standard deviation. For a given dataset, and for a single value:

Mean: The average of a dataset.

Median: The middle value of a dataset.

Mode: The most frequent value in a dataset.

Range: The difference between the highest and lowest values.

Variance: A measure of how spread out the values are.

Standard Deviation: The square root of the variance.

These terms are pretty self explanatory. But Variance and Standard Deviation may need more explanation. Here are the definitions in a bit more detail.

Variance

Definition: Variance is a measure of how spread out the values in a dataset are around the mean. It quantifies the extent to which each number in the dataset differs from the mean.

Standard Deviation

Definition: Standard deviation is the square root of the variance. It provides a measure of the average distance of each data point from the mean and is expressed in the same units as the data.

How to calculate?

These metrics are extremely easy to calculate with a simple python code. In my example here, I am using the House Prices dataset downloaded from Kaggle. The column in the dataset we are analyzing is: SalePrice which is the price of the house. The python code, which can very easily be generated by ChatGPT. Here is an example:

Use Case and Pitfalls

Descriptive statistics may not be as “fancy” or “complex” as other statistical methods but they are crucial. For example, the mean value of a dataset can give us a fair idea of what the data looks like. Specially useful in pricing and sales. For example, in my job, I use these measures for understanding the price of our product offering as compared to other similar products. Median and Mode are also helpful in understanding where our product price lies if I were to lay down all similar products in the market on a table in front of me. Are we close to middle or are we too pricey.

One thing to take note is the word “similar”. When I say “similar” products, its important to understand what that means. Lets say we are trying to understand better if an online course I am selling is priced correctly or not. If I compare all available courses online (irrespective of what the course topic is) then my descriptive statistics will be misleading. Not all courses are created equal. Not all courses deliver the same value. And I cannot really compare a course which teaches you a technical skill and a course which is soft skills focused. The average price for my technical course may be too low (or too high).

While an AI tool can help you write the code, its in the data gathering phase where a human intellect is required. For these descriptive statistics to make sense, we must compare similar courses. And the “similarity” can be a complex. How deep should I categorize the data collection ? Should I look at all course prices of the same subject? Or should I look at all course prices of the same topic?

Inversely, if you already have a dataset, then these statistics can be used to understand the quality of the data. For example, continuing with our online course prices example, if the median and mean have a massive difference, then we are probably looking courses which are vastly different. If the variance or standard deviation of the course prices is too high, we are either looking at very different courses or we have an error in our data or we have an “outlier”. So these descriptive statistics can be used as a data quality measurement tool as well.

Results

From a house prices perspective, these results look accurate. The mean and median are not “wildly different” suggesting a uniform dataset, perhaps house prices from a zipcodes not too far from each other.

The high value of the range suggests that cheapest and the most expensive house have a massive difference, which can be true. But this is something to investigate.

The standard deviation is also high suggesting that there is either a big difference between the house prices or there are certainly some outliers. Further investigation is required.

Whats next?

I would probably do a histogram of the prices to understand how the dataset looks like. I already did and here are the results.

Looks like we have a significant number of houses which are above 200K . There is greater variability in prices above 200K suggesting that could be the reason for the high standard deviation.

Hopefully this was brief enough and useful enough! See you in the next one.

More Use Cases (Thanks ChatGPT!)

Finance:

  • Variance: In finance, variance is used to measure the volatility of a stock’s returns. A higher variance indicates a more volatile stock.
  • Standard Deviation: Standard deviation is used to gauge the risk associated with an investment. A higher standard deviation means more risk as the investment returns are more spread out from the mean.

Quality Control:

  • Variance and Standard Deviation: In manufacturing, these measures help ensure product consistency. Low variance and standard deviation indicate that the product quality is consistent, with minimal deviation from the desired specifications.

Healthcare:

  • Variance and Standard Deviation: In medical research, these measures help analyze the effectiveness of treatments. They can indicate how varied patients’ responses are to a treatment.

The promise and threat of AI

In the winter of 2014, my uncle forwarded me a video of him sitting in a Tesla Model S reading a magazine as the car drives itself on a dark road at a speed of about 50 MPH. I can see the car changing lanes and adjusting its speed so as not to crash into the car in front of it. All this while, my uncle continues (to pretend) reading his magazine. I know he is not, because the magazine is upside down. His one eye is still on the road and he is nervous but he is in no way involved in driving the car. It was driving itself. I remember my reaction to the video. I believe I said, ‘Wow, thats cool’. But I did not dwell on it further. Cool gimmicks of expensive toys. Fast forward three years. I am sitting in my office looking at a company wide demo for robotic process automation, or RPA. I see how a computer program fills out around a hundred data entry forms automatically, after reading relevant data from a hundred hand written forms scanned into the program. An activity which took a person around 90 hours to do was done by the machine in 9, without any errors. Even if there was an error, the next time it was certain that the error would not be repeated. This time, I didn’t say, ‘Wow, that’s cool’. This time I started looking for online courses to learn more about similar technologies. My heart rate was slightly elevated and I felt that my resume needed to be strengthened. This time, I felt a slight churn in the bottom of my stomach. The machines are coming.

Artificial intelligence is a system’s ( a machine’s) ability to correctly interpret external data and/or information, use it and learn from it in order to achieve its goals. This is the simplest definition. Intelligence is a feature we associate with animals (including humans). Artificial intelligence is when a machine demonstrates this animal capability.

We cannot be sure when the term came into being. It was definitely popularised by science fiction stories in which robots or automatons of some kind develop human like intelligence and demonstrate the ability to feel and emote. The term has been demonised and glorified in equal measure, however, the demonization generally tends to stick with us. How many AI centric science fiction movies do you remember where the robots are NOT trying to kill the humans ?

We are nowhere near that stage though. Our best AI programs are either limited to accomplishing tasks in a narrow field, like driving cars or trucks and controlling airplanes or they are analytical AIs which learn from past experience and predict outcomes which are affected by a limited range of parameters. But this doesn’t mean that there is no need to be alarmed. Here is one story which should give you a fair idea of how fast things are progressing.

StockFish 8 is a chess playing computer program which has years of practice defeating chess players, nay, champions from all over the world. It was trained in a multitude of chess strategies by the best computer programmers in the world. It has consistently been ranked as the most powerful chess computer program in the world. Based on the the hardware available to it, it can scale up or scale down its level. Now let’s introduce our protagonist (antagonist?) Enter, AlphaZero. An algorithm which is self trained (played only against itself). It’s a neural network based algorithm which was able to train itself in 4 hours, to surpass the levels of StockFish 8. In a 100 match competition with StockFish 8, AlphaZero won 28 games, drew the remaining 72. Add up the numbers. Its 100 games. No losses. 4 hours. It took 4 hours to beat the best chess playing computer in the world, without any human help whatsoever. Did a tingle just run down your spine?

Sensationalism always sells. It’s more popular than hope. Which is why, for many people outside the scientific community, AI is, either a threat which needs to be taken care of immediately, or its still in its infancy and poses absolutely no threat. Both the viewpoints are wrong. AI hold promise too. It is being used to revolutionary effects in the medical field in helping decode the genome, identifying genes which cause life threatening diseases and even predicting the next flu outbreak helping scientists and medical professionals prepare. AI hold promise in so many other fields. It is already being used as a fraud prevention tool in the finance and banking sectors. AI can one day replace mundane jobs like bank tellers, supermarket cashiers and bus conductors. Introduced within robots, AI can replace other mundane jobs like janitors and cleaners. Such robots may one day perform dangerous tasks like bomb defusal and working with radioactive wastes and putting out dangerous forest fires. AI has the potential to revolutionize farming. Even in the area of arts, AI has shown promise, both as an artist (AI generated paintings and AI created songs) and as a tool (detecting fraudulent paintings)

 But while the promise paints a future perfect picture, there is an ugly side of AI as well. Surprisingly, it’s not the AI’s fault. Just like any machine we humans have built, AI serves our purposes. For example, an AI which detects fraudulent transactions for a bank, does so to increase the bank’s profits by reducing costs. Similarly, robots which automate a factory floor, don’t do so on their own volition. They do not think that this will reduce human injury. They do so, because a human made the decision to set them up in order to increase his or his company’s profit, by cutting down on costs. Let’s consider a scenario. If tomorrow, Amazon or Google come out with a machine which can replace the human cashier in supermarkets, for a one time investment of 100,000, but will enable the stores to remain open 24*7, throughout the year, will not ask for health or dental benefits and will never take a day off, all retail companies will clamour to get their hands on it. Overnight, the world’s cashiers will be out of jobs! It crueler than it sounds. Why? Because we, the middle class, will not be immediately impacted. We might be glad that the stores now remain open 24 hours and we don’t have to make small talk with the person on the cash register. Now imagine an AI which can do better business analysis then you. Imagine an AI which can code better than you. Imagine an AI which can read spreadsheets better than you. Imagine an AI, which can do you job better than you. You get the picture. Chances are that such an AI is already being developed by someone.

So how can we get the most benefit out of AI? The answer is simple. By focusing our attention on the ‘WE’ in this question. How can ‘WE’ …. We have to expand our circle of inclusion. We have to look beyond ourselves, look beyond our family, our community, our city, our country and consider the humanity as a whole. Can we let the machines take over our mundane jobs while we make sure that the people these machines replace can come out of their poverty traps ? Can we make sure that people in third world countries are able to access AI doctors which prescribe them the right medicines for free, without letting actual doctors feel worthless? Can we build cars in a person-less factory while making sure people are still going to be able to afford them ?

WE have to tell our government’s to focus on such questions. WE have to make our business leaders and entrepreneurs think about such questions. We have to think about such questions every day. Before we worry about AI making humanity extinct, we have to worry about AI making humanity irrelevant. And WE have to do it fast. Because it took a machine 4 hours to be the best at chess …. the best ever. It may take just a few more for the machine to be best at everything and make us all irrelevant.