LLMine - Idea to Open Source in 5 hours

Hi Fellow Indie Hackers,

Open Source projects are always so fun to build. Late-night discussions with one of my best buddies led us to build LLMine, a platform for unstructured text mining using configurable LLM chains. In this post, I want to take you through our journey of conceiving the idea, trying to build it and what we learned from it.

GitHub Repo: https://github.com/LLMine/llmine_core
(Leave us your mighty GitHub Star, if you like what we are building)

That night, my friend Divyansh and I were exploring ways to build a Customer Review analysis tool. We have always wondered, how a consumer-facing company keep up with numerous reviews. Even more complex, how do they quickly identify complaints and issues so that they can act on them? After all, nobody likes to lose a customer right?

S01-E1 Our Initial Idea 😇

As Tech geeks excited by the recent advancements in Generative AI, it was almost a no-brainer to try and use LLMs to tell us which reviews are important for us and in what ways. 30 minutes into the discussion we already had a plan.

Our plan was to ask GPT-3.5 / GPT-4 if a review was

A complaint, compliment or a suggestion.
If it is a complaint, we ask how severe is it.
If it is a complaint, we ask what the user mentions to be pleasurable.
Then we ask what Module/Product it is related to.
We store those data and visualize them with detailed dashboards.
Then we cluster the reviews into buckets and find what themes each cluster represents.
Finally we prepare detailed, nice-looking reports and send them across for cross-team communication.

We are young blood devs; we love to jump and start building, especially when you can literally see what you want to build in your head. I am sure, most of you fellow hackers would relate to this "itch to build".

S01- E2 Scratching the "Itch to Build" 🤢

The first thing we did was draw up a database schema, which was fairly simple with around 5 or 6 tables excluding auth flow. We were starting to see that no matter what text content you have, you can build a very similar DB Schema and mine the required details via prompts to store it in the DB. In about 25 minutes we had a working schema design on our whiteboard, that we could potentially write code on.

We spent a few minutes on the OpenAI Playground to try our prompt templates and we finalize a few workable ones. This took us around another 20 minutes.

S01 - E3 The Developer's Fatigue of Writing YET AGAIN: 🤨

Even after building an uncountable number of products and having time-tested, prod-tested templates to start our projects with, we find ourselves doing the same boring things. Write models, migrations, views, and background tasks before we can soothe our soaring itch to see something in action. Django is blazing fast when it comes to building such services, we have done it hundreds of times before. Yet we persisted (ChatGPT wrote the mundane code for us 😅), saw a little progress and suddenly the Engineers in us started to take over.

This part was the most boring so it felt like an eternity, but I assume it was something like 40 minutes to get here.

S01 - E4 The Engineering: 🦾

We wanted to make our code cleaner and less coupled with the models and corresponding prompts. So, we decided to create a table to hold all the prompts we are going to run when a review comes in and then create an entry in a mapping table, storing how the LLM replied to the prompt. We realized we wanted strong type-checking against expected text responses from the LLM. To achieve that, we created return types as JSON where we specify the contract using a JSONSchema spec and as Labels where the LLM chooses from an array of given valid Labels.

We are lazy engineers; Too Lazy to type an actual customer review to test if our quick and dirty prototype works. I had my resume open in a Google Doc, copied a part of the text entered it in our reviews table. 🤓 Our system asked OpenAI if it was a complaint, compliment or suggestion, and it replied "Not Sure" 😒. So we changed the prompt in the DB, to "Did the candidate mention use of Spring Boot in the projects?" and options to "Yes"/"No"/"Not Sure". OpenAI promptly replied, "Yes". Wait, what were we building again?

By this time we were already past 3.5 hours since we started discussing our idea for customer review analysis using LLMs.

S01 - E5 The Aha Moment: Realizing We Built A Framework, Not Just A Tool: 🥸

You know that feeling when you're so engrossed in doing a puzzle that you forget the bigger picture? That's how we felt. And when we stepped back, we were struck by an epiphany. In our pursuit to build a Customer Review Analysis tool, we had inadvertently constructed something more universal, and more powerful—a framework for mining insights from any text using configurable LLM prompts.

We looked at our whiteboard. It no longer reads “Review Analysis Tool”; in our minds, it has transformed into “LLMine: A Universal Text Mining Framework.” We could run any prompt to any text and receive insights based on that. It wasn’t limited to identifying complaints or compliments. It could tell us whether a resume mentioned the use of Spring Boot, or if an academic article referenced a specific theory. The utility was suddenly boundless.

It's funny how "lazy engineering" can lead to unexpected creativity. Our laziness in typing a genuine review led us to test with random text from my resume. And that small, almost trivial action, widened the scope of our project in a way we hadn't anticipated. It made us realize that we weren't building a tool but a framework that could serve as the foundation for myriad applications—be it consumer reviews, sentiment analysis, academic research, or anything that involves understanding the text.

S01 - E6 The Rename Game 👻

This was perhaps the first time, we came up with a name almost instantly, loved the name and we bought a domain. We added a new model Content Pool and renamed our Review model to InjestedTextContent (yeah we misspelt the table name in excitement). We added an ExtracterChain model to hold a collection of related ExtracterPrompt objects (yeah, yeah, we misspell a lot). After only a few lines of editing our Celery Task, we had our very first prototype of what we call LLMine-Core.

This was the end of our 5-hour journey to Open Sourcing LLMine under Apache 2.0 Licence.

S02 - Teaser - What’s Next for LLMine? 🔥

Our next steps are clear. Instead of focusing solely on customer reviews, we’re going to evolve LLMine into a fully-fledged framework for text mining.

So far, we have already prepared REST APIs and Webhooks (Swagger Spec included) that devs can use today to build on top of LLMine-Core. We can also use the modified Django Admin site to configure our chains over our pools.

We currently support OpenAI models only, but we love Llama2 as well 😊 You know what to expect. Supporting remote lambda functions as steps in our extractor chains is another feature we are looking at soon. We have a high-level roadmap on our GitHub Readme.

But for now, we are in awe of the potential this "happy accident" has uncovered. As Indie Hackers, we’re fueled by such moments. It's all about that magical blend of meticulous planning and chaotic tinkering.

So, that's our journey from setting out to build a simple Review Analysis tool to ending up with LLMine, a universal text mining framework. Fellow Indie Hackers, don’t underestimate the detours; you never know where they might lead you. Check out our GitHub Repo, and if you think what we’re building is cool, please give us that sweet GitHub Star!

Thank you for reading our story. Let's keep hacking and building something awesome. We’d love to hear your thoughts, suggestions, and, yes, even your complaints (LLMine can categorize them for us!).

Signing off,
The LLMine Team 🛠️

Say something nice to arpanpreneur…

2

Your journey from conceptualizing a simple Review Analysis tool to creating the versatile LLMine text mining framework is impressive. It's a testament to how creative problem-solving and iterative development can lead to unexpected and powerful outcomes.

Given the wide range of potential applications for LLMine, what do you see as the most exciting or impactful use case for this framework in the future?

premsaini

·
a year ago
·
1. 1
  
  Thank you for your kind words. I think the most crucial thing LLMine does is that it simplifies otherwise complex tasks, thanks to the LLM technology itself. For example, anybody with minimal resources can now build Spam Filters, Sentiment Analyzers and so on in a matter of clicks.
  
  Personally, I am excited to see complex data-cleaning pipelines being built with LLMine chains for training AI models.
  
  I am a B2B SaaS fan, so would love to see someone ingest complex text like Sales Call Transcripts and tell what they did right and what they didn't, maybe feed the discovery insights about the customer back to CRM. Or maybe you can ingest public social media posts from your prospects and configure prompts to create cold messages that they are likely to open/reply to? (Just a thought).
  
  arpanpreneur
  
  ·
  a year ago
  ·
2

I can literally relate to every bit of it. Thanks for putting it so beautifully. I couldn't agree more on the fact that we started with a different idea and it turned out to be one of the best things I've ever worked on (will be working too 😅). Let's keep working on it!!!

divyanshbhowmick

·
a year ago
·
1. 1
  
  Couldn't have come up with the core idea without the direction you nudged the discussions towards. Always such a delight working with you, work doesn't seem like work @divyanshbhowmick
  
  arpanpreneur
  
  ·
  a year ago
  ·
1

Great journey! From what I've experienced evaluating LLM based apps is critical! LLMs can be very confident even when absolutely wrong, regular evaluation is essential.

VladN7

·
8 months ago
·