AIExplained

AIExplained

Hallucinations, Bias, and the Secrets of LLMs

Added 2023-12-05 18:00:12 +0000 UTC

Concrete methods to reduce hallucinations, and an exploration of the hidden biases of models that I am 99.8% confident you will learn something new from, but a deep dive into RepE, a stunning new method from Andy Zou, who, surprise-surprise, I managed to interview for this video! Great for practitioners and anyone looking to understand more about the secrets of LLMs.

Slides: https://docs.google.com/presentation/d/1LXPgqRs8Pu_YojuCQUdieDmaAtWHc8uILyQXOzKkFFI/edit?usp=sharing

Hallucinations, Bias, and the Secrets of LLMs

Comments

Bias and hallucination I successfully manage by using prompt engineering, retrieval-augmented generation (RAG), and fine-tuning. Data privacy I can manage using Azure AI studio. The biggest problem for me that I can't understand is people's mindset - people love doing boring, repetitive work - when they are busy they feel important, needed and safe. For 20 years I have been working in a big pharmaceutical company - this observation is based on my experience from corpo word.

Michal Babula

2023-12-14 09:57:26 +0000 UTC

To be honest, I'm not sure how the instructions are fed in to the model. Depending on how it gets fed, I'd imagine it wouldn't work very well. Like you said, I often have to remind ChatGPT of its custom instructions as well (though I've found the new custom GPTs to be a lot better than regular ChatGPT with custom instructions!).

Paulo

2023-12-08 18:47:24 +0000 UTC

Interesting! What happens if you put this in custom instructions? For me I need to remind ChatGPT sometimes that it has custom instructions

Dennis Hulsebos

2023-12-07 13:08:00 +0000 UTC

Fantastic comment and actually useable tips their Paulo, really appreciate it. A Waluigi effect almost at times.

Philip

2023-12-07 01:16:12 +0000 UTC

Thank you for taking the time to present these guidelines and sharing the tidbits from the interviews! I'll go ahead and add one trick I've found over my use of GPT-4 that I've found to reduce (or rather invert) sycophancy. I use this whenever I want the model to critically evaluate a section of text or code I'm providing. I'll create a new chat, and introduce myself as if I were a professor, mentioning that I need the model's aid to grade student submissions. I'll explain the task, then provide my own text or code. The output using this technique consistently points out more flaws in the content I provide, though I haven't measured the exact margin. My hypothesis (or rather conjecture) is that, as mentioned in the interviews and cited papers, RLHF tends to cause the model to act sycophantically - primarily toward the user. If the model somehow perceives the user to be an evaluator of provided content, instead of the creator of the provided content, its honesty is partially disinhibited. I find this specific method also avoids creating an excessively strong inverse effect, which is typically what happens if you present your content saying you don't like it (the model will begin to attack your output, including elements where you as the user may ultimately judge appropriate). Also regarding sycophancy, I ironically find that asking ChatGPT to be deliberately dishonest - especially when reviewing code - yields more useful feedback, as it becomes overtly sarcastic, pointing out mistakes that it usually doesn't point out if you simply ask it to be honest. My guess for this is that humans are much quicker to point out dishonesty than to point out honesty. I'd imagine LLMs might have a more comprehensive internal representation of dishonesty than 'regular' honesty - the latter is simply 'indirectly labelled' in far fewer occasions in the training data, I'd imagine.

Paulo

2023-12-07 01:02:17 +0000 UTC

More Creators

kaboozey

kaboozey

patreon

PolakPeasant

PolakPeasant

gumroad

CaffeinatedCorgi

CaffeinatedCorgi

gumroad

天野　翔（あまの　かける）

天野　翔（あまの　かける）

fanbox

鼻眼鏡

鼻眼鏡

fanbox

AlyonaCap

AlyonaCap

patreon

墨雪吟

墨雪吟

fanbox

GROWTHUPHERO

GROWTHUPHERO

patreon

ZFetish

ZFetish

fanbox

hoshime

hoshime

fanbox

Idylla

Idylla

patreon

Freeuseguys

Freeuseguys

fanbox

alexanderdinh

alexanderdinh

patreon

Castrodour

Castrodour

patreon

simcelebrity00

simcelebrity00

patreon

ante

ante

patreon

allandox

allandox

boosty

KID

KID

patreon

frey

frey

patreon

Tao2675

Tao2675

patreon

Simtury

Simtury

patreon

The3nderGameR

The3nderGameR

patreon

dateariane

dateariane

patreon

AiEroticaWorks

AiEroticaWorks

patreon

Just Roll With It

Just Roll With It

patreon

Tacosnnmn

Tacosnnmn

fanbox

ErlisaTakanashi

ErlisaTakanashi

patreon

ワシゴー

ワシゴー

fanbox

thebestultrahiper

thebestultrahiper

patreon

☽ Pudding ☾

☽ Pudding ☾

gumroad

olchas

olchas

patreon

Josh Cloud

Josh Cloud

gumroad

Nae22

Nae22

patreon

XTC

XTC

fanbox

mmawizzard

mmawizzard

patreon

GalaxiMonkey

GalaxiMonkey

patreon

qlinicx

qlinicx

fanbox

Ishi - アニメーター

Ishi - アニメーター

fanbox

hibernotion

hibernotion

patreon

Coyzy Movie Night

Coyzy Movie Night

patreon