The
        crawler is hybrid, using async python requests and 
puppeteer with 
uBlock Origin. The way
        detection works is we count the
        number of uBO blocked requests on the page, and if too many (threshold is set
        to 5), we kick it out, leaving only "clean" pages
        in the index.
Crawler
        is also unique in a sense that it will follow an interesting dead link to its 
internet
        archive page, trying its best to preserve the page in our
        index (you will see those results under "Internet Archive" section).
        
 Content and semantic metadata is
        extracted using 
trafilatura and 
readability.js,
        while
        page language is detected using 
fastText.
        
        To produce search results, triple rankings with full-text search (via 
Elasticsearch / 
Typesense) and NLP-based
        semantic search (using 
sentence
        transformers to produce embeddings and 
SCANN
        to search through vector space) are combined to produce final
        ranking of results. Responses are served using 
FastAPI. 
        
        "Interesting Recently" section is provided by 
TinyGem, a content recommendation
        and bookmarking tool built using similar stack.
        
        Teclis also uses results with permission from 
Marginalia
        Search (another noncommercial search engine).