10X Sale
kh logo
All Courses
  1. Tutorials
  2. Data Science

Data Loading for ML Projects

Updated on Aug 22, 2025
 
12,511 Views

The input data to a learning algorithm usually has a row x column structure, and is usually a CSV file. CSV refers to comma separated values which is a simple file format that helps in storing tabular data structure. This CSV format can be easily loaded into a Pandas dataframe with the help of the read_csv function. The CSV file can be loading using other libraries as well, and we will look at a few approaches in this post.

Let us now load CSV files in different methods:

Using Python Standard Library

There are built-in modules, such as ‘csv’, that contains a reader function, which can be used to read the data present in a csv file. The CSV file can be opened in read mode, and the reader function can be used. Below is an example demonstrating the same:

import numpy as np 
import csv 
path = path to csv file 
with open(path,'r') as infile: 
reader = csv.reader(infile,delimiter = ',') 
headers = next(reader) 
data = list(reader) 
data = np.array(data).astype(float)

The headers or the column names can be printed using the following line of code:

print(headers) 

The dimensions of the dataset can be determined using the shape attribute as shown in the following line of code:

print(data.shape) 
Output: 
250, 302 

The nature of data can be determined by examining the first few rows of the dataset using the below line of code:

data[:2] 

Using numpy package

The numpy package has a function named ‘loadtxt’ that can be used to read CSV data. Below is an example demonstrating the same using StringIO.

from numpy import loadtxt 
from io import StringIO 
c = StringIO("0 1 2 \n3 4 5") 
data = loadtxt(c) 
print(data.shape)

Output:

(2, 3) 

Using pandas package

There are a few things to keep in mind while dealing with CSV files using Pandas package.

  • The file header is basically the name of the column which describes that type of data the column holds. If the file already has a header, the function automatically assigns the same names to every column, otherwise every column needs to be manually named.
  • In any case, we need to explicitly mention in the read_csv function whether or not the CSV file contain header.
  • Comments in a CSV file are written using the # symbol.

Let us look at an example to understand how the CSV file is read as a dataframe.

import numpy as np 
import pandas as pd 
#Obtain the dataset 
df = pd.read_csv("path to csv file", sep=",") 
df[:5]

Output:

target012 ...295296297298299 
  1. 0  1.0 -0.098 2.165 0.681 ...  -2.097 1.051 -0.414 1.038 -1.065 
  2. 1  0.0 1.081 -0.973 -0.383 ...  -1.624 -0.458 -1.099 -0.936 0.973 
  3. 2  1.0 -0.523 -0.089 -0.348 ...  -1.165 -1.544 0.004 0.800 -1.211 
  4. 3  1.0 0.067 -0.021 0.392 ...  0.467 -0.562 -0.254 -0.533 0.238 
  5. 4  1.0 2.347 -0.831 0.511 ...  1.378 1.246 1.478 0.428 0.253 
[5 rows x 302 columns] 

Conclusion

In this post, we saw how input data can be loaded for machine learning projects.

+91

By Signing up, you agree to ourTerms & Conditionsand ourPrivacy and Policy

Get your free handbook for CSM!!
Recommended Courses