Journey of Creating a Scraper That Tweets: A Comedy of Errors

Sun, Jun 25, 2023
5-minute read

Introduction

In a world filled with boundless possibilities, I embarked on a mission to bring something useful to life. Armed with determination, a sprinkle of coding knowledge, and a dash of naivety, I set out on an adventure that would test my patience and sanity. This is the tale of how I birthed a scraper that Tweets and the failure-filled path I took to make it happen. ¹

The Backstory: A Fascination with Finance and Market Sentiment

My long-lived fascination with the world of finance led me to explore different ways to track market sentiment. While browsing the internet, I stumbled upon the Fear and Greed Index, a gauge of market sentiment that seemed to dance in harmony with the stock market’s whims. Intrigued by its potential, I yearned for a simpler way to track the index without visiting CNN’s website. Thus, my desire to create something met a worthy idea, kicking off the process of coding.²

Trial and Error in the World of Technology

Supplied with determination and a thirst for knowledge, I embarked on what I thought was an unambitious quest. With dreams of efficiency and simplicity in mind, I opted for the serverless approach. Mind you, I knew how to work with a server, but thought serverless would be more eloquent. Little did I know the comedy of errors that awaited me in the world of serverless development.

First Approach: AWS Lambda – The Quickest Route to Headaches

Seeking a cost-effective and efficient solution, I delved into the realm of serverless computing with AWS Lambda. Prepared with code and an optimistic spirit, I soon discovered the daunting task of packaging dependencies ³ and navigating the labyrinthine complexities of Lambda. Error after error left me scratching my head as if caught in an eternal loop of confusion. Maxing out the Lambda package size was the last straw.

Second Approach: Lambda Layer Madness – The Never-Ending Rabbit Hole

Undeterred by the struggles faced with AWS Lambda, I ventured into the realm of Lambda layers that promised a larger maximum package size. Armed with optimism and an arsenal of failed attempts, I sought a glimmer of hope. I tried nearly everything imaginable, from zipping dependencies to exploring virtual environments and even trying the unthinkable⁴ — a Docker container with a browser, packaged for Lambda — I danced on the precipice of insanity. Each path I explored seemed to lead to yet another dead end. This time, I realized that the modern browser packages tend to be larger than the maximum layer size, and I didn’t think it viable for my skill level to try to reduce the package size by removing the extraneous for a headless⁵ browser.

Third Approach: Virtual Private Server – The Eventual Triumph of Practicality

With despair threatening to consume me, I made a radical decision: abandon the dream of serverless elegance and embrace the practicality of a virtual private server (VPS) — the way I knew would work. Infused with newfound realism, I set out to conquer the technical hurdles that lay ahead. First, I tried to use what AWS offers for free: a small virtual private server⁶ running ARM-architecture. After a few hours spent on trying to setup all the dependencies on the VPS, I realized that running a modern browser on a server with ARM-architecture is difficult or maybe impossible. For me, it was too hard.

Finally, after countless hours spent on trying to get things to work inside AWS free tier, I resigned to use a VPS with the traditional x64_86 architecture, which I knew capable of running everything I needed. With a newfound dent in my wallet, a glimmer of success appeared on the horizon. The traditional VPS solution proved to be the savior I had longed for, the one that worked!

Lessons Learned: From Hilarious Missteps to Enlightening Revelations

Through the trials and tribulations of creating a scraper that tweets, I gained some knowledge. I discovered some intricacies of serverless computing, packaging dependencies, containerization, optical character recognition, and even the quirks of large language models. I was reminded of old tools and techniques, but also witnessed the limitations of AI assistance, like ChatGPT’s propensity for hallucinations and its knack for creating for-loops that would make any programmer cringe.

The Next Project: Speed or Learning?

Equipped with newfound wisdom, I face a crucial question for my next endeavor: should I prioritize swift execution or embrace the opportunity to learn new technologies? It’s a dilemma that lies at the heart of every creator’s journey, balancing the desire for effectiveness with the thirst for knowledge. The coin of creation flips in the air, and I eagerly await the next chapter of my nerdy adventures.

Conclusion: Triumph in Chaos

My journey into the chaotic world of creating a scraper that tweets was riddled with mishaps and laughter. Through a series of failed attempts and head-scratching moments, I emerged with a working solution that brings me both joy and questionable financial insights. The experience reminded me of the importance of perseverance, adaptability, and the ability to find humor in the face of chaos. With each step forward, I embrace the joy of building, knowing that the next adventure will surely be another comedy of errors waiting to unfold.

Thanks for reading! 💌 Subscribe now to get updates directly into your inbox.📫

The Twitter account tweets once every weekday. Every tweet includes the latest figures for the Fear and Greed Index and a screenshot of the gauge. The index attempts to track the general market and investor sentiment by aggregating seven indicators into a single value. Historically, it has tracked the S&P 500 stock index movements quite reliably, although it is questionable if the index is a leading or lagging indicator. ↩︎
To be clear, I had the idea to build something like this in 2018, but it took me a while to do something about it! ↩︎
For example, Selenium, Chromedriver, Chrome, Pillow, and Twython ↩︎
Probably telling that Python developers consider this best practice! ↩︎
headless browser basically means a browser without a graphical user interface, meaning there is nothing to see, only some code traversing the web ↩︎
The instance type that AWS offers for free is EC2 t4g.small — if you’re curious see here ↩︎

building creating development learning coding AWS