overview for Blaed

2

Free Open-Source AI LLM Guide (lemmy.world)

submitted 2 years ago by Blaed@lemmy.world to c/selfhosted@lemmy.world

0 comments fedilink

cross-posted from: https://lemmy.world/post/2219010

Hello everyone!

We have officially hit 1,000 subscribers! How exciting!! Thank you for being a member of !fosai@lemmy.world. Whether you're a casual passerby, a hobby technologist, or an up-and-coming AI developer - I sincerely appreciate your interest and support in a future that is free and open for all.

It can be hard to keep up with the rapid developments in AI, so I have decided to pin this at the top of our community to be a frequently updated LLM-specific resource hub and model index for all of your adventures in FOSAI.

The ultimate goal of this guide is to become a gateway resource for anyone looking to get into free open-source AI (particularly text-based large language models). I will be doing a similar guide for image-based diffusion models soon!

In the meantime, I hope you find what you're looking for! Let me know in the comments if there is something I missed so that I can add it to the guide for everyone else to see.

Getting Started With Free Open-Source AI

Have no idea where to begin with AI / LLMs? Try starting with our Lemmy Crash Course for Free Open-Source AI.

When you're ready to explore more resources see our FOSAI Nexus - a hub for all of the major FOSS & FOSAI on the cutting/bleeding edges of technology.

If you're looking to jump right in, I recommend downloading oobabooga's text-generation-webui and installing one of the LLMs from TheBloke below.

Try both GGML and GPTQ variants to see which model type performs to your preference. See the hardware table to get a better idea on which parameter size you might be able to run (3B, 7B, 13B, 30B, 70B).

8-bit System Requirements

Model VRAM Used Minimum Total VRAM Card Examples RAM/Swap to Load*

LLaMA-7B 9.2GB 10GB 3060 12GB, 3080 10GB 24 GB

LLaMA-13B 16.3GB 20GB 3090, 3090 Ti, 4090 32 GB

LLaMA-30B 36GB 40GB A6000 48GB, A100 40GB 64 GB

LLaMA-65B 74GB 80GB A100 80GB 128 GB

4-bit System Requirements

Model Minimum Total VRAM Card Examples RAM/Swap to Load*

LLaMA-7B 6GB GTX 1660, 2060, AMD 5700 XT, RTX 3050, 3060 6 GB

LLaMA-13B 10GB AMD 6900 XT, RTX 2060 12GB, 3060 12GB, 3080, A2000 12 GB

LLaMA-30B 20GB RTX 3080 20GB, A4500, A5000, 3090, 4090, 6000, Tesla V100 32 GB

LLaMA-65B 40GB A100 40GB, 2x3090, 2x4090, A40, RTX A6000, 8000 64 GB

*System RAM (not VRAM), is utilized to initially load a model. You can use swap space if you do not have enough RAM to support your LLM.

When in doubt, try starting with 3B or 7B models and work your way up to 13B+.

FOSAI Resources

Fediverse / FOSAI

The Internet is Healing

FOSAI Welcome Message

FOSAI Crash Course

FOSAI Nexus Resource Hub

LLM Leaderboards

HF Open LLM Leaderboard

LMSYS Chatbot Arena

LLM Search Tools

LLM Explorer

Open LLMs

Model	VRAM Used	Minimum Total VRAM	Card Examples	RAM/Swap to Load*
LLaMA-7B	9.2GB	10GB	3060 12GB, 3080 10GB	24 GB
LLaMA-13B	16.3GB	20GB	3090, 3090 Ti, 4090	32 GB
LLaMA-30B	36GB	40GB	A6000 48GB, A100 40GB	64 GB
LLaMA-65B	74GB	80GB	A100 80GB	128 GB

Model	Minimum Total VRAM	Card Examples	RAM/Swap to Load*
LLaMA-7B	6GB	GTX 1660, 2060, AMD 5700 XT, RTX 3050, 3060	6 GB
LLaMA-13B	10GB	AMD 6900 XT, RTX 2060 12GB, 3060 12GB, 3080, A2000	12 GB
LLaMA-30B	20GB	RTX 3080 20GB, A4500, A5000, 3090, 4090, 6000, Tesla V100	32 GB
LLaMA-65B	40GB	A100 40GB, 2x3090, 2x4090, A40, RTX A6000, 8000	64 GB

Large Language Model Hub

Download Models

oobabooga

text-generation-webui - a big community favorite gradio web UI by oobabooga designed for running almost any free open-source and large language models downloaded off of HuggingFace which can be (but not limited to) models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, and many others. Its goal is to become the AUTOMATIC1111/stable-diffusion-webui of text generation. It is highly compatible with many formats.

Exllama

A standalone Python/C++/CUDA implementation of Llama for use with 4-bit GPTQ weights, designed to be fast and memory-efficient on modern GPUs.

gpt4all

Open-source assistant-style large language models that run locally on your CPU. GPT4All is an ecosystem to train and deploy powerful and customized large language models that run locally on consumer-grade processors.

TavernAI

The original branch of software SillyTavern was forked from. This chat interface offers very similar functionalities but has less cross-client compatibilities with other chat and API interfaces (compared to SillyTavern).

SillyTavern

Developer-friendly, Multi-API (KoboldAI/CPP, Horde, NovelAI, Ooba, OpenAI+proxies, Poe, WindowAI(Claude!)), Horde SD, System TTS, WorldInfo (lorebooks), customizable UI, auto-translate, and more prompt options than you'd ever want or need. Optional Extras server for more SD/TTS options + ChromaDB/Summarize. Based on a fork of TavernAI 1.2.8

Koboldcpp

A self contained distributable from Concedo that exposes llama.cpp function bindings, allowing it to be used via a simulated Kobold API endpoint. What does it mean? You get llama.cpp with a fancy UI, persistent stories, editing tools, save formats, memory, world info, author's note, characters, scenarios and everything Kobold and Kobold Lite have to offer. In a tiny package around 20 MB in size, excluding model weights.

KoboldAI-Client

This is a browser-based front-end for AI-assisted writing with multiple local & remote AI models. It offers the standard array of tools, including Memory, Author's Note, World Info, Save & Load, adjustable AI settings, formatting options, and the ability to import existing AI Dungeon adventures. You can also turn on Adventure mode and play the game like AI Dungeon Unleashed.

h2oGPT

h2oGPT is a large language model (LLM) fine-tuning framework and chatbot UI with document(s) question-answer capabilities. Documents help to ground LLMs against hallucinations by providing them context relevant to the instruction. h2oGPT is fully permissive Apache V2 open-source project for 100% private and secure use of LLMs and document embeddings for document question-answer.

Models

The Bloke

The Bloke is a developer who frequently releases quantized (GPTQ) and optimized (GGML) open-source, user-friendly versions of AI Large Language Models (LLMs).

These conversions of popular models can be configured and installed on personal (or professional) hardware, bringing bleeding-edge AI to the comfort of your home.

Support TheBloke here.

https://ko-fi.com/TheBlokeAI

70B

Llama-2-70B-chat-GPTQ

Llama-2-70B-Chat-GGML

Llama-2-70B-GPTQ

Llama-2-70B-GGML

llama-2-70b-Guanaco-QLoRA-GPTQ

30B

30B-Epsilon-GPTQ

13B

Llama-2-13B-chat-GPTQ

Llama-2-13B-chat-GGML

Llama-2-13B-GPTQ

Llama-2-13B-GGML

llama-2-13B-German-Assistant-v2-GPTQ

llama-2-13B-German-Assistant-v2-GGML

13B-Ouroboros-GGML

13B-Ouroboros-GPTQ

13B-BlueMethod-GGML

13B-BlueMethod-GPTQ

llama-2-13B-Guanaco-QLoRA-GGML

llama-2-13B-Guanaco-QLoRA-GPTQ

Dolphin-Llama-13B-GGML

Dolphin-Llama-13B-GPTQ

MythoLogic-13B-GGML

MythoBoros-13B-GPTQ

WizardLM-13B-V1.2-GPTQ

WizardLM-13B-V1.2-GGML

OpenAssistant-Llama2-13B-Orca-8K-3319-GGML

7B

Llama-2-7B-GPTQ

Llama-2-7B-GGML

Llama-2-7b-Chat-GPTQ

LLongMA-2-7B-GPTQ

llama-2-7B-Guanaco-QLoRA-GPTQ

llama-2-7B-Guanaco-QLoRA-GGML

llama2_7b_chat_uncensored-GPTQ

llama2_7b_chat_uncensored-GGML

More Models

Any of KoboldAI's Models

Luna-AI-Llama2-Uncensored-GPTQ

Nous-Hermes-Llama2-GGML

Nous-Hermes-Llama2-GPTQ

FreeWilly2-GPTQ

GL, HF!

Are you an LLM Developer? Looking for a shoutout or project showcase? Send me a message and I'd be more than happy to share your work and support links with the community.

If you haven't already, consider subscribing to the free open-source AI community at !fosai@lemmy.world where I will do my best to make sure you have access to free open-source artificial intelligence on the bleeding edge.

Thank you for reading!

1

Free Open-Source AI LLM Guide (lemmy.world)

submitted 2 years ago by Blaed@lemmy.world to c/technology@lemmy.world

0 comments fedilink

cross-posted from: https://lemmy.world/post/2219010

Hello everyone!

We have officially hit 1,000 subscribers! How exciting!! Thank you for being a member of !fosai@lemmy.world. Whether you're a casual passerby, a hobby technologist, or an up-and-coming AI developer - I sincerely appreciate your interest and support in a future that is free and open for all.

It can be hard to keep up with the rapid developments in AI, so I have decided to pin this at the top of our community to be a frequently updated LLM-specific resource hub and model index for all of your adventures in FOSAI.

The ultimate goal of this guide is to become a gateway resource for anyone looking to get into free open-source AI (particularly text-based large language models). I will be doing a similar guide for image-based diffusion models soon!

In the meantime, I hope you find what you're looking for! Let me know in the comments if there is something I missed so that I can add it to the guide for everyone else to see.

Getting Started With Free Open-Source AI

Have no idea where to begin with AI / LLMs? Try starting with our Lemmy Crash Course for Free Open-Source AI.

When you're ready to explore more resources see our FOSAI Nexus - a hub for all of the major FOSS & FOSAI on the cutting/bleeding edges of technology.

If you're looking to jump right in, I recommend downloading oobabooga's text-generation-webui and installing one of the LLMs from TheBloke below.

Try both GGML and GPTQ variants to see which model type performs to your preference. See the hardware table to get a better idea on which parameter size you might be able to run (3B, 7B, 13B, 30B, 70B).

8-bit System Requirements

Model VRAM Used Minimum Total VRAM Card Examples RAM/Swap to Load*

LLaMA-7B 9.2GB 10GB 3060 12GB, 3080 10GB 24 GB

LLaMA-13B 16.3GB 20GB 3090, 3090 Ti, 4090 32 GB

LLaMA-30B 36GB 40GB A6000 48GB, A100 40GB 64 GB

LLaMA-65B 74GB 80GB A100 80GB 128 GB

4-bit System Requirements

Model Minimum Total VRAM Card Examples RAM/Swap to Load*

LLaMA-7B 6GB GTX 1660, 2060, AMD 5700 XT, RTX 3050, 3060 6 GB

LLaMA-13B 10GB AMD 6900 XT, RTX 2060 12GB, 3060 12GB, 3080, A2000 12 GB

LLaMA-30B 20GB RTX 3080 20GB, A4500, A5000, 3090, 4090, 6000, Tesla V100 32 GB

LLaMA-65B 40GB A100 40GB, 2x3090, 2x4090, A40, RTX A6000, 8000 64 GB

*System RAM (not VRAM), is utilized to initially load a model. You can use swap space if you do not have enough RAM to support your LLM.

When in doubt, try starting with 3B or 7B models and work your way up to 13B+.

FOSAI Resources

Fediverse / FOSAI

The Internet is Healing

FOSAI Welcome Message

FOSAI Crash Course

FOSAI Nexus Resource Hub

LLM Leaderboards

HF Open LLM Leaderboard

LMSYS Chatbot Arena

LLM Search Tools

LLM Explorer

Open LLMs

Model	VRAM Used	Minimum Total VRAM	Card Examples	RAM/Swap to Load*
LLaMA-7B	9.2GB	10GB	3060 12GB, 3080 10GB	24 GB
LLaMA-13B	16.3GB	20GB	3090, 3090 Ti, 4090	32 GB
LLaMA-30B	36GB	40GB	A6000 48GB, A100 40GB	64 GB
LLaMA-65B	74GB	80GB	A100 80GB	128 GB

Model	Minimum Total VRAM	Card Examples	RAM/Swap to Load*
LLaMA-7B	6GB	GTX 1660, 2060, AMD 5700 XT, RTX 3050, 3060	6 GB
LLaMA-13B	10GB	AMD 6900 XT, RTX 2060 12GB, 3060 12GB, 3080, A2000	12 GB
LLaMA-30B	20GB	RTX 3080 20GB, A4500, A5000, 3090, 4090, 6000, Tesla V100	32 GB
LLaMA-65B	40GB	A100 40GB, 2x3090, 2x4090, A40, RTX A6000, 8000	64 GB

Large Language Model Hub

Download Models

oobabooga

text-generation-webui - a big community favorite gradio web UI by oobabooga designed for running almost any free open-source and large language models downloaded off of HuggingFace which can be (but not limited to) models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, and many others. Its goal is to become the AUTOMATIC1111/stable-diffusion-webui of text generation. It is highly compatible with many formats.

Exllama

A standalone Python/C++/CUDA implementation of Llama for use with 4-bit GPTQ weights, designed to be fast and memory-efficient on modern GPUs.

gpt4all

Open-source assistant-style large language models that run locally on your CPU. GPT4All is an ecosystem to train and deploy powerful and customized large language models that run locally on consumer-grade processors.

TavernAI

The original branch of software SillyTavern was forked from. This chat interface offers very similar functionalities but has less cross-client compatibilities with other chat and API interfaces (compared to SillyTavern).

SillyTavern

Developer-friendly, Multi-API (KoboldAI/CPP, Horde, NovelAI, Ooba, OpenAI+proxies, Poe, WindowAI(Claude!)), Horde SD, System TTS, WorldInfo (lorebooks), customizable UI, auto-translate, and more prompt options than you'd ever want or need. Optional Extras server for more SD/TTS options + ChromaDB/Summarize. Based on a fork of TavernAI 1.2.8

Koboldcpp

A self contained distributable from Concedo that exposes llama.cpp function bindings, allowing it to be used via a simulated Kobold API endpoint. What does it mean? You get llama.cpp with a fancy UI, persistent stories, editing tools, save formats, memory, world info, author's note, characters, scenarios and everything Kobold and Kobold Lite have to offer. In a tiny package around 20 MB in size, excluding model weights.

KoboldAI-Client

This is a browser-based front-end for AI-assisted writing with multiple local & remote AI models. It offers the standard array of tools, including Memory, Author's Note, World Info, Save & Load, adjustable AI settings, formatting options, and the ability to import existing AI Dungeon adventures. You can also turn on Adventure mode and play the game like AI Dungeon Unleashed.

h2oGPT

h2oGPT is a large language model (LLM) fine-tuning framework and chatbot UI with document(s) question-answer capabilities. Documents help to ground LLMs against hallucinations by providing them context relevant to the instruction. h2oGPT is fully permissive Apache V2 open-source project for 100% private and secure use of LLMs and document embeddings for document question-answer.

Models

The Bloke

The Bloke is a developer who frequently releases quantized (GPTQ) and optimized (GGML) open-source, user-friendly versions of AI Large Language Models (LLMs).

These conversions of popular models can be configured and installed on personal (or professional) hardware, bringing bleeding-edge AI to the comfort of your home.

Support TheBloke here.

https://ko-fi.com/TheBlokeAI

70B

Llama-2-70B-chat-GPTQ

Llama-2-70B-Chat-GGML

Llama-2-70B-GPTQ

Llama-2-70B-GGML

llama-2-70b-Guanaco-QLoRA-GPTQ

30B

30B-Epsilon-GPTQ

13B

Llama-2-13B-chat-GPTQ

Llama-2-13B-chat-GGML

Llama-2-13B-GPTQ

Llama-2-13B-GGML

llama-2-13B-German-Assistant-v2-GPTQ

llama-2-13B-German-Assistant-v2-GGML

13B-Ouroboros-GGML

13B-Ouroboros-GPTQ

13B-BlueMethod-GGML

13B-BlueMethod-GPTQ

llama-2-13B-Guanaco-QLoRA-GGML

llama-2-13B-Guanaco-QLoRA-GPTQ

Dolphin-Llama-13B-GGML

Dolphin-Llama-13B-GPTQ

MythoLogic-13B-GGML

MythoBoros-13B-GPTQ

WizardLM-13B-V1.2-GPTQ

WizardLM-13B-V1.2-GGML

OpenAssistant-Llama2-13B-Orca-8K-3319-GGML

7B

Llama-2-7B-GPTQ

Llama-2-7B-GGML

Llama-2-7b-Chat-GPTQ

LLongMA-2-7B-GPTQ

llama-2-7B-Guanaco-QLoRA-GPTQ

llama-2-7B-Guanaco-QLoRA-GGML

llama2_7b_chat_uncensored-GPTQ

llama2_7b_chat_uncensored-GGML

More Models

Any of KoboldAI's Models

Luna-AI-Llama2-Uncensored-GPTQ

Nous-Hermes-Llama2-GGML

Nous-Hermes-Llama2-GPTQ

FreeWilly2-GPTQ

GL, HF!

Are you an LLM Developer? Looking for a shoutout or project showcase? Send me a message and I'd be more than happy to share your work and support links with the community.

If you haven't already, consider subscribing to the free open-source AI community at !fosai@lemmy.world where I will do my best to make sure you have access to free open-source artificial intelligence on the bleeding edge.

Thank you for reading!

0

Microsoft Announces: LongNet - Scaling LLM Transformers to 1,000,000,000 Tokens & Context Length (lemmy.world)

submitted 2 years ago by Blaed@lemmy.world to c/technology@lemmy.world

1 comments fedilink

cross-posted from: https://lemmy.world/post/1115513

Microsoft Announces a New Breakthrough: LongNet: Scaling AI/LLM Transformers to 1,000,000,000 Tokens & Context Length

Official Microsoft Breakthroughs:

https://arxiv.org/pdf/2307.02486.pdf

https://github.com/microsoft/unilm/tree/master

See one of the first implementations of LongNet here:

https://github.com/kyegomez/LongNet

In the realm of large language models, scaling sequence length has emerged as a significant challenge. Current methods often grapple with computational complexity or model expressivity, limiting the maximum sequence length. This paper introduces LongNet, a Transformer variant designed to scale sequence length to over 1 billion tokens without compromising performance on shorter sequences. The key innovation is dilated attention, which exponentially expands the attentive field as the distance increases.

Features

LongNet offers several compelling advantages:

Linear Computation Complexity: It maintains a linear computational complexity and a logarithmic dependency between tokens.

Distributed Trainer: LongNet can serve as a distributed trainer for extremely long sequences.

Dilated Attention: This new feature is a drop-in replacement for standard attention and can be seamlessly integrated with existing Transformer-based optimization.

(+ many others that are hard to fit here - please read the full paper here for more insights)

Experimental results show that LongNet delivers strong performance on both long-sequence modeling and general language tasks. This work paves the way for modeling very long sequences, such as treating an entire corpus or even the whole Internet as a sequence.

If computation and inference hurdles are continually overcome the way they are now - we may be seeing near infinite context lengths sooner than many had initially thought. How exciting!

Arxiv Paper | The Abstract:

(take this graph with a grain of salt - this is not indicative of logarithmic scaling)

Scaling sequence length has become a critical demand in the era of large language models. However, existing methods struggle with either computational complexity or model expressivity, rendering the maximum sequence length restricted. In this work, we introduce LONGNET, a Transformer variant that can scale sequence length to more than 1 billion tokens, without sacrificing the performance on shorter sequences. Specifically, we propose dilated attention, which expands the attentive field exponentially as the distance grows. LONGNET has significant advantages:

It has a linear computation complexity and a logarithm dependency between tokens.

It can be served as a distributed trainer for extremely long sequences.

Its dilated attention is a drop-in replacement for standard attention, which can be seamlessly integrated with the existing Transformer-based optimization.

Experiments results demonstrate that LONGNET yields strong performance on both long-sequence modeling and general language tasks.

Our work opens up new possibilities for modeling very long sequences, e.g., treating a whole corpus or even the entire Internet as a sequence. Code is available at https://aka.ms/LongNet.

Click here to read the rest of the paper!