For enquiries call:

Phone

+1-469-442-0620

April flash sale-mobile

HomeBlogWeb DevelopmentWhat is GridFS in MongoDB?

What is GridFS in MongoDB?

Published
18th Sep, 2023
Views
view count loader
Read it in
12 Mins
In this article
    What is GridFS in MongoDB?

    Many applications use file management and file storage as key elements to improve data processing. File storage frequently involves the use of a third-party CDN (Content Delivery Network), such as Amazon Web Services, although this complicates management. It would be preferable to access all your resources from a single cloud storage location rather than several different ones, as there is a possibility of failure during retrieval.

    Until the addition of GridFS in MongoDB, it was difficult to store files directly into a database using a single API request. See how GridFS uses indexing and storing data in small sizes for faster retrieval and the methods used in achieving this. Explore the benefits and limitations of using GridFS.

    What is GridFS?

    GridFS is a driver specification for uploading and retrieving files from MongoDB. It is a specification for storing and retrieving files larger than the 16 MB limit of BSON documents. It divides a file into portions, or chunks, and saves each chunk as a separate document, rather than storing the file as a single document.

    Each chunk can only be 255 KB in size. This signifies that the final chunk is usually equal to or less than 255 KB. That's quite cool!.

    GridFS is an appropriate technique for storing files in MongoDB, supplementing the schema-less (and thus faster) retrieval of the information offered by the document model.

    Because files are separated into smaller parts, it is easier to access specific areas of a file, saving memory-intensive tasks such as loading the whole file.

    When reading from GridFS, the driver reassembles all chunks as needed. This means you can read chunks of a file based on the query rangelike, listen to a segment of an audio file or retrieve a segment of a video clip.

    GridFS Collections MongoDB GridFS Indexes

    For efficiency, GridFS employs indexes on each of the chunks and file collections. For convenience, drivers that adhere to the GridFS specification automatically build these indexes.

    This specification defines a simple GridFS API. This specification also describes advanced GridFS capabilities that drivers may choose to offer in their implementations. Additionally, this work seeks to define the meaning and purpose of all fields in the GridFS data model, disambiguate GridFS nomenclature, and document previously unspecified configuration choices. You can also add as many indexes as you need to meet the needs of your application.

    The Chunks Index

    GridFS uses the files_id and n fields to create a unique compound index on the chunks collection. This enables efficient chunk retrieval, as shown in the following example:

    db.fs.chunks.find( { files_id: myFileID } ).sort( { n: 1 } )

    Drivers that follow the GridFS specification will automatically check for the existence of this index before performing read and write operations. For information on the unique behavior of your GridFS application, consult the corresponding driver documentation.

    If this index does not exist, you can issue the following operation to create it using The MongoDB Shell (mongosh)., It's a complete JavaScript and Node.js 14.x REPL environment for working with MongoDB deployments. You may use the MongoDB Shell to directly test queries and actions against your database.

    db.fs.chunks.createIndex( { files_id: 1, n: 1 }, { unique: true } );

    The Files Index

    It makes use of an index on the files collection based on the filename and UploadDate columns. It enables efficient file retrieval, as illustrated in the following example:

    db.fs.files.find( { filename: myFileName } ).sort( { uploadDate: 1 } )

    If this index does not already exist, you can use mongo shell to build it:

    db.fs.files.createIndex( { filename: 1, uploadDate: 1 } );

    Drivers that follow the GridFS specification will automatically check for the existence of this index before performing read and write operations. For information on the unique behavior of your GridFS application, consult the corresponding driver documentation.

    MongoDB GridFS Sharding 

    GridFS is divided into two collections: files and chunks.

    Chunks Collection

    Chunks stores the binary chunks. Use either { files_id: 1, n: 1 } or { files_id: 1 } as the shard key index to shard the chunks collection. files_id is an ObjectId that updates in a monotonic manner.

    You cannot utilize Hashed Sharding if the MongoDB driver uses filemd5.

    Each document in the chunks collection represents a unique chunk of a file in GridFS. This collection's documents take the following format:

    { 
      "_id" : <ObjectId>, 
      "files_id" : <ObjectId>, 
      "n" : <num>, 
      "data" : <binary> 
    }

    The following fields are included in some or all of the documents in the chunks collection:

    • chunks._id: Unique ObjectId.
    • chunks.files_id: In the files collection, we can specify the _id of the parent document.
    • chunks.n: The chunk's sequence number. GridFS assigns a number to each chunk, beginning with 0.
    • chunks.data: The payload of the chunk as a BSON Binary type.

    Files Collection

    ‘Files’ stores the file’s metadata. The file collection is minimal and consists mainly of metadata. GridFS keys do not lend themselves to equitable distribution in a sharded system. This allows all of the file metadata records to reside on a single primary shard.

    If you need to shard the files collection, utilize the _id field in association with an application field.

    Each document in the file collection represents a file in GridFS.

    {
      "_id" : <ObjectId>,
      "length" : <num>,
      "chunkSize" : <num>,
      "uploadDate" : <timestamp>,
      "md5" : <hash>,
      "filename" : <string>,
      "contentType" : <string>,
      "aliases" : <string array>,
      "metadata" : <any>,
    }

    The following fields are included in some or all of the documents in the files collection:

    • files.length: The document's size in bytes.
    • files._id: The _id is of the data type you specified when creating the original document. BSON ObjectId is the default type for MongoDB documents.
    • files.chunkSize: Each chunk's size in bytes. Except for the last chunk, which is only as large as needed, GridFS breaks the document into chunks of size chunkSize. The standard size is 255 kilobytes (kB).
    • files.uploadDate: GridFS's initial storage of the document. The type of this value is Date.
    • files.md5: The filemd5 command returns an MD5 hash of the entire file. It is of the string type.
    • files.metadata: The metadata field can contain any type of data and any additional information you choose to store. If you want to add more arbitrary fields to documents in the files collection, add them to a metadata object.
    • files.aliases: An array of alias strings.
    • files.contentType: It is entirely optional. A MIME type that is appropriate for the GridFS file.
    • files.filename: It is entirely optional. The GridFS file's human-readable name.

    Example:

    {
    "_id" : ObjectId("6177da181964fd7f82e2aaa9"),
    "length" : 15720,
    "chunkSize" : 261120,
    "uploadDate" : ISODate("2021-10-26T16:06:08.091+05:30"),
    "filename" : "ishanfile.docx",
    "contentType" : "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
    }

    The files collection, like the chunks collection, employs a compound index based on the filename and uploadDate columns to enable for efficient file retrieval, for example:

    db.fs.files.find( { filename: fileName } ).sort( { uploadDate: 1 } )

    If this index does not exist, run the following command in a mongo shell:

    db.gfs.file.createIndex( { filename: 1, uploadDate: 1 }, { unique: true } );

    This will give the output as:

    What is GridFS in MongoDB?

    How to Read and Write files in MongoDB GridFS?

    To follow the tutorial further, your machine must have following software installed:

    • Node.js
    • MongoDB with MongoDB Compass
    • VS Code

    Step 1: Make a folder named mongo_grid. Launch the VSCode editor and navigate to this folder. This folder will be transformed into a workspace, containing all of the code files contained within it.

    Step 2: In this workspace, create folders titled filestoread and filestowrite, which will contain files that will be read and saved into a database, as well as files read from the database.

    Step 3: Open the VS Code terminal and run npm init –y

    This command will create a workspace package.json file with certain preset sections.

    Install gridfs-stream and mongoose using the following command:

    npm install gridfs-stream
    npm install mongoose

    In the devDependencies section of the package.json file, define the following packages:

    What is GridFS in MongoDB?

    The gridfs-stream package allows you to effortlessly stream files to and from MongoDB GridFS. The mongoose package contains the MongoDB object modelling tool, which is meant to function in an asynchronous environment to conduct operations on the MongoDB database.

    Step 4: Maintain the following project folder structure:

    What is GridFS in MongoDB?

    Put a few images/videos/audios in the filestoread folder. These files will be utilized for writing and reading activities. A sample gfs.png file is utilized in this example.

    Step 5: Open the MongoDB Compass and connect to the MongoDB Database. Create a database with the name filesDB and collection named files.

    Step 6: For writing a file in GridFS, create a javascript file and name it as writefile.js and write this code in the file:

    //1. Load the mongoose driver
    var mongooseDv = require("mongoose");
    //2. Connect to MongoDB and its database
    mongooseDv.connect('mongodb://localhost/filesDB', { useMongoClient: true });
    //3. The Connection Object
    var connection = mongooseDv.connection;
    if (connection !== "undefined") {
        console.log(connection.readyState.toString());
        //4. The Path object
        var path = require("path");
        //5. The grid-stream
        var grid = require("gridfs-stream");
        //6. The File-System module
        var fs = require("fs");
        //7.Read the video/image file from the videoread folder
        var filesrc = path.join(__dirname, "./filestoread/gfs.png");
        //8. Establish connection between Mongo and GridFS
        grid.mongo = mongooseDv.mongo;
        //9.Open the connection and write file
        connection.once("open", () => {
            console.log("Connection Open");
            var gridfs = grid(connection.db);
            if (gridfs) {
                //9a. create a stream, this will be
                //used to store file in database
                var streamwrite = gridfs.createWriteStream({
                    //the file will be stored with the name
                    filename: "gfs.png"
                });
                //9b. create a readstream to read the file
                //from the filestored folder
                //and pipe into the database
                fs.createReadStream(filesrc).pipe(streamwrite);
                //9c. Complete the write operation
                streamwrite.on("close", function (file) {
                    console.log("successfully written in database");
                });
            } else {
                console.log("No Grid FS Object");
            }
        });
    } else {
        console.log('Not connected');
    }
    console.log("done");

    The file from the filestoread folder is supplied as a parameter to the fs module's createReadStream() function. The write-stream formed with the gridfs object is accepted by the pipe() function. This stream is intended for use with the image file.

    Step 7: Run the code using node writefile

    This will give the following output:

    What is GridFS in MongoDB?

    Now check the MongoDB Compass and the data in the filesDB will look like:

    What is GridFS in MongoDB?

    You can view the file in fs.files:

    What is GridFS in MongoDB?

    Step 8: For reading a file, create a javascript file and name it readfile.js:

    var mongooseDv = require("mongoose");
    var schema = mongooseDv.Schema;
    mongooseDv.connect('mongodb://localhost/filesDB', { useMongoClient: true });
    var connection = mongooseDv.connection;
    if (connection !== "undefined") {
        console.log(connection.readyState.toString());
        var path = require("path");
        var grid = require("gridfs-stream");
        var fs = require("fs");
        var videosrc = path.join(__dirname, "./filestowrite/videos.mp4");
        grid.mongo = mongooseDv.mongo;
        connection.once("open", () => {
            console.log("Connection Open");
            var gridfs = grid(example.db);
            if (gridfs) {
                var fsstreamwrite = fs.createWriteStream(
                    path.join(__dirname, "./filestowrite/gfs.png")
                );
                var readstream = gridfs.createReadStream({
                    filename: "gfs.png"
                });
                readstream.pipe(fsstreamwrite);
                readstream.on("close", function (file) {
                    console.log("File Read successfully from database");
                });
            } else {
                console.log("No Grid FS Object");
            }
        });
    } else {
        console.log(Not connected');
    }
    console.log("done");

    Step 9: Run the above code using node readfile

    This will give the following output:

    What is GridFS in MongoDB?

    This will read the file from the MongoDB GridFS and write the file to the filestowrite folder:

    What is GridFS in MongoDB?

    When to Use the MongoDB GridFS Storage System

    The MongoDB GridFS storage system is not widely utilized, although the following conditions may demand its use: 

    • When the present file system has a restriction on the number of files that can be stored in a given directory.
    • When only a portion of the information saved has to be accessed, GridFS allows you to recall sections of the file without having to examine the entire document.
    • When distributing files and their metadata via geographically distributed replica sets, GridFS allows the metadata to automatically sync and deploy data across numerous targeted computers.

    When Not to Use the MongoDB GridFS Storage System

    GridFS should not be used if you need to update the entire file's content. As an alternative, you can keep numerous copies of each file and specify the most recent version in the metadata. After uploading the new version of the file, you can use an atomic update to update the metadata field that indicates "latest" status, and then remove older versions if necessary.

    And if your files are all less than the BSON Document Size restriction of 16 MB, consider storing each file in a single document rather than utilizing GridFS. To store binary data, you can use the BinData data type. For further information on utilizing BinData, consult your driver's documentation.

    Looking to learn Python? Discover the power of this versatile python programming language. Join our Python course today and unlock endless possibilities. No need to worry about the cost, we offer affordable options for everyone. Start your coding journey now!

    MongoDB GridFS Limitations

    The GridFS File System has the following restrictions:

    • Serving files alongside database content might severely deplete your RAM working set. If you don't want to disrupt your working set, you should serve your files from a different mongodb server. 
    • File serving performance will be slower than serving the file natively from your webserver and filesystem. However, the additional management benefits may outweigh the slowdown.
    • GridFS does not support atomic file updates. If this scenario occurs, you will need to keep various versions of your files and select the appropriate version.

    The power and rise of GridFS

    GridFS is a gift for developers who want to store huge files in MongoDB. The GridFS storage system allows developers to store big files and retrieve portions of those files as needed. As a result, GridFS is an outstanding MongoDB feature that can be used with a variety of applications. The true benefit of this method is that only a piece of the file can be read without having to load the complete file into memory. This makes GridFS an extremely useful tool for modern applications.


    Profile

    Abhresh Sugandhi

    Author

    Abhresh is specialized as a corporate trainer, He has a decade of experience in technical training blended with virtual webinars and instructor-led session created courses, tutorials, and articles for organizations. He is also the founder of Nikasio.com, which offers multiple services in technical training, project consulting, content development, etc.

    Share This Article
    Ready to Master the Skills that Drive Your Career?

    Avail your free 1:1 mentorship session.

    Select
    Your Message (Optional)

    Upcoming Web Development Batches & Dates

    NameDateFeeKnow more
    Course advisor icon
    Course Advisor
    Whatsapp/Chat icon