Agentic Weekly AI News TL;DR
Build an automated pipeline that scrapes the last 7 days of AI papers & posts (ArXiv, OpenAI, Anthropic, Hugging Face, DeepLearning.AI), processes them with Unstructured’s Hi Res partitioner for clean text, stores structured chunks in MongoDB, and generates both detailed summaries and an executive brief for a weekly newsletter.
Unstructured API Workflows hi_res Scraping ArXiv OpenAI Anthropic Hugging Face DeepLearning.AI S3 MongoDB SummarizationGraph RAG for Academic Papers
Learn how to build a GraphRAG system for research papers using Unstructured’s Named Entity Recognition to extract custom entities and relationships, then query them with Neo4j to answer complex questions that require understanding connections between models, datasets, and metrics.
Unstructured API Workflows Graph RAG S3 Neo4jUnstructured API Walkthrough
This walkthrough provides you with deep, hands-on experience with the Unstructured API. As you follow along, you will learn how to use many of the Unstructured API’s features for partitioning, enriching, chunking, and embedding.
Unstructured API Workflows Workflow Endpoint Local FileEverything (from) Everywhere All at Once
Set up an AI assistant that answers questions by querying your company’s scattered data. Retrieve context from contracts in Azure, sales decks in OneDrive, and emails in Outlook through a single RAG pipeline.
Unstructured API Workflows RAG Azure Blob Storage Outlook OneDrive AstraDBAgentic RAG with Visually Grounded Answers and Visual Citations
Learn how to build an AI-powered document processing system that extracts both text and images from PDFs in S3, generates intelligent descriptions for visual elements, and enables a searchable knowledge base that can answer questions about charts, diagrams, and product visuals.
Unstructured API Workflows S3 Image Processing Visual RAG Enterprise AIBuilding a Hybrid RAG System: From Fragmented Data to Unified Intelligence
Learn how to build a comprehensive hybrid RAG system that processes multiple data sources simultaneously - combining S3 PDFs and Elasticsearch records into a unified knowledge base for enterprise AI applications.
Unstructured API Workflows S3 Elasticsearch Hybrid RAG Enterprise AIDropbox-to-Pinecone Connector API Quickstart
Learn how to set up and run a custom workflow that uses a free Dropbox storage location as a source and a free Pinecone serverless index as a destination, suitable for powering RAG applications.
Unstructured API Workflows Dropbox PineconeGetting Started with Unstructured API and PostgreSQL
Learn how to build data processing workflows using the Unstructured API and Python SDK to preprocess unstructured files from S3 and store the structured outputs in PostgreSQL for retrieval.
Unstructured API Workflows S3 PostgreSQLUnstructured Partition Endpoint Quickstart
This notebook calls the Unstructured Python SDK to have Unstructured process a local file by using the Unstructured Partition Endpoint.
Unstructured API Partition Endpoint Local filePreserving Table Structure for Better Retrieval
This notebook explores using Unstructured API to process financial documents while preserving tabular structure in a way that’s usable by downstream applications.
Unstructured API Workflows S3 Astra DBHistorical research about MLK with the Unstructured API
This notebook explores how you can use Unstructured to gather and process declassified historical records surrounding the assassination of Dr. Martin Luther King, Jr. These processed documents can then be analyzed by using Elasticsearch and RAG.
Unstructured API Workflows S3 VLM NER Elasticsearch MLK National ArchivesRAG without Embeddings
Learn how to build a RAG pipeline without any embedding models. Use Unstructured to preprocess documents, index them into Elasticsearch, and retrieve using classic BM25 scoring.
Unstructured API Workflows Elasticsearch BM25Getting Started with Unstructured API and Redis
Learn how to build data processing workflows using the Unstructured API and Python SDK to preprocess unstructured files from S3 and store the structured outputs in Redis Cloud for retrieval.
Unstructured API Workflows S3 RedisCreate a S3 to Qdrant Pipeline using the Unstructured API
This notebook walks through using the Unstructured Workflow Endpoint to set up a complete pipeline that pulls documents from S3, processes them using Unstructured, and stores the resulting embeddings in Qdrant for fast vector search and retrieval.
Unstructured API Workflows S3 Qdrant VLM EmbeddingsTwo-stage retrieval: similarity search + rerankers
Improve RAG precision with a two-stage retrieval pipeline: fast vector search followed by reranking using Cohere’s re-ranker models.
Unstructured API Workflows Cohere PineconeCreate a S3 to MongoDB Pipeline using the Unstructured API
Learn how to build an end-to-end document processing pipeline that processes PDFs from S3 and stores structured results in MongoDB. Features VLM-powered partitioning, semantic chunking, and vector embeddings using the Unstructured Workflows API.
Unstructured API Workflows S3 MongoDB VLM EmbeddingsGetting Started with Unstructured API and IBM watsonx.data
Learn how to create data processing workflows with Unstructured API and its Python SDK to preprocess all of your unstructured data from your Azure Blob Storage into your IBM watsonx.data instance.
Unstructured API Workflows Azure Blob Storage IBM watsonx.dataUsing Unstructured with Snowflake Cortex Search for RAG
Use Snowflake Cortex and RAG to do natural-language searches across a Snowflake table that contains data provided by Unstructured. Additional Snowflake Cortex functions are also explored.
Unstructured API Snowflake Cortex RAG Search Workflows S3Agentic RAG with LangGraph and Together AI
Build Agentic RAG with
LangGraph and Together AI and compare the results with Vanilla RAG in pure PythonUnstructured API Workflows Agents LangGraph Together AI Astra DBGetting Started with Unstructured API and Snowflake
Learn how to create data processing workflows with Unstructured API and its Python SDK to preprocess all of your unstructured data from your Azure Blob Storage into your Snowflake Table.
Unstructured API Workflows Azure Blob Storage SnowflakeBuilding Graph-Based RAG Applications
Learn how to use the Unstructured API to create a Graph RAG-based workflow that writes data with named entity recognition (NER) to your Astra DB.
Unstructured API Workflows Graph RAG NER Astra DBGetting Started with Unstructured API and Delta Tables in Databricks
Learn how to create data processing workflows with Unstructured API and its Python SDK to preprocess all of your unstructured data into your Delta Table.
Unstructured API Workflows Databricks S3RAG for Online Documentation
Crawl websites with Firecrawl and build a RAG workflow powered by Unstructured and MongoDB Atlas vector search.
Unstructured API Workflows MongoDBUnstructured Workflow Endpoint Quickstart
Build an end-to-end workflow in Unstructured programmatically by using the Unstructured Workflow Endpoint.
Unstructured API Workflows S3RAG with Databricks Vector Search with Context from Multiple Sources
Build RAG with Databricks Vector Search with context preprocessed from multiple sources by Unstructured.
Databricks Introductory notebookAgentic RAG with Hugging Face smolagents vs Vanilla RAG
Build Agentic RAG with
smolagents library and compare the results with Vanilla RAG in pure PythonGPT-4o smolagents Agents DataStax S3 Advanced notebookLLama3.2 RAG evaluation on unstructured text
Evaluate Llama3.2 for your RAG system with Unstructured, GPT-4o, Ragas, and LangChain
GPT-4o Ragas LangChain Llama3.2 Pinecone S3 Advanced notebookMultimodal RAG: Enhancing RAG outputs with image results
Process a file in S3 with Unstructured and return images in your RAG output
S3 FAISS GPT-4o-mini Advanced notebookQuantitative Reasoning with tables inside PDFs
From Pixels to Insights: Seamlessly Extracting and Visualizing Table Data with Unstructured and Hex
Unstructured API Hex Advanced notebookPII removal with GLiNER in unstructured data ETL
Remove Personally Identifiable Information (PII) as a part of unstructured data preprocessing.
Unstructured API PII GLiNER Advanced notebookCustom metadata extraction and self-querying retrieval
Extract custom metadata, and enable metadata pre-filtering in your RAG.
Unstructured API MongoDB Metadata Advanced notebookSelecting an embedding model for custom data
End-to-end data processing pipeline using Unstructured Serverless API.
Unstructured API Hugging Face Advanced notebookRAG with PDFs, LangChain and Llama 3
A RAG system with the Llama 3 model from Hugging Face.
Unstructured API 🤗 Hugging Face LangChain Llama 3 Introductory notebookUnstructured data ETL from S3 to SingleStore DB
Learn to ingest, partition, chunk, embed and load data from an S3 bucket into SingleStore DB.
Unstructured API SingleStoreDB AWS S3 Introductory notebookGoogle Drive to DataStax Astra DB
Embed your Google Drive Docs in an Astra Vector Database with Unstructured Serverless API
Unstructured API Google DataStax Introductory notebookWeaviate RAG quickstart
Embed your local documents in an Weaviate Vector Database with Unstructured Serverless API
Unstructured API OpenAI Weaviate Introductory notebookPreprocess PDFs in AWS S3, load into Elasticsearch
Ingest PDF documents from an S3 bucket, transform them into a normalized JSON with Unstructured Serverless API, chunk, embed and load into Elasticsearch.
Unstructured API AWS S3 Elasticsearch Introductory notebookPreprocess documents in Google Drive, load into Databricks Volume
Preprocess documents from a Google Drive Unstructured Serverless API and load them into Databricks Volume.
Unstructured API Google Drive Databricks Introductory notebookSource references in RAG responses
Add document source references to RAG responses based on documents metadata.
Unstructured API RAG LangChain Intermediate notebookQuery processed PDF with HuggingChat
Send a PDF to Unstructured for processing, and send a subset of the returned PDF’s processed text to HuggingChat for chatbot-style querying.
Unstructured API 🤗 Hugging Face 🤗 HuggingChat Introductory notebookLlama 3 Local RAG with emails
Build a local RAG app for your emails with Unstructured, LangChain and Ollama.
Unstructured API LangChain Ollama Llama 3 Introductory notebookBuilding RAG With PowerPoint presentations
A RAG solution that is based on PowerPoint files.
Unstructured API 🤗 Hugging Face LangChain Llama 3 Introductory notebookSynthetic test dataset generation
Build a Synthetic Test Dataset for your RAG system in 5 easy steps
Unstructured API GPT-4o Ragas LangChain Advanced notebook
