Ideal database for a ChatGPT clone

In ChatGPT, when you’re chatting with the LLM, a user message can have multiple GPT responses, and a GPT response can have multiple user messages. I’m making a ChatGPT clone that must fully support this.

I was curious how ChatGPT represents this internally, so I went into Chrome DevTools and found the request that returns all the user messages and GPT responses. The JSON essentially looks like this:

"mapping": { "message": { "id": "c6587e15-387b-4b14-9773-a0df62b1d92f", "parent": "aaa2582c-8505-433e-907c-5188dd41a2b7", "children": [ "aaa27ee8-fe01-4e1d-8404-4be75cce4104", "aaa2e314-3cf1-4f12-b312-0a3195eb78f8", "aaa2be8d-5281-4059-b664-74bae761568f", "aaa20046-153c-4258-8f7b-e2fea392a9d9 ] } ... more messages ... } 

Essentially, everything is considered a message, and a parent-child relationship is established between all of them. Messages have a parent and can have multiple children (the first message would have a null parent ID).

I am very split on whether to use a relational (Postgres) database or a NoSQL (MongoDB) database to store the messages. MongoDB is very good for scaling horizontally, and is usually the main choice for chat applications, since they typically have few relations but vast volume. Also the data can be un-structured, which is nice since the GPT output could be not just text, but contain images.

At the same time, unlike most chat applications, mine needs to support a hierarchical, many-to-many relationship, so Postgres might be better?

What database do you think ChatGPT is using internally? Thanks!

submitted by /u/parrot15
[link] [comments]

from Software Development – methodologies, techniques, and tools. Covering Agile, RUP, Waterfall + more! https://ift.tt/qXP5rg6

Leave a comment

Design a site like this with WordPress.com
Get started
search previous next tag category expand menu location phone mail time cart zoom edit close