Data Engineering

Save as PDF

Opens your browser print dialog — select "Save as PDF" to download.

Roll No................................

CD-701 (GS)

B.Tech., VII Semester

Examination, December 2024

Grading System (GS)

Time : Three Hours Maximum Marks : 70

[Note : i)] Attempt any five questions.

किन्हीं पाँच प्रश्नों को हल कीजिए।

ii) All questions carry equal marks.

सभी प्रश्नों के समान अंक हैं।

iii) In case of any doubt or dispute the English version question should be treated as final.

किसी भी प्रकार के संदेह अथवा विवाद की स्थिति में अंग्रेजी भाषा के प्रश्न को अंतिम माना जायेगा।

a) Discuss a role of a data engineer in a data-driven organization. डेटा-संचालित संगठन में डेटा इंजीनियर की भूमिका पर चर्चा करें।

b) Explain the five Vs of data with suitable examples for each. प्रत्येक के लिए उपयुक्त उदाहरणों के साथ डेटा के पाँच V की व्याख्या करें।

a) Describe the data engineer's involvement in data-driven decision-making using a real-world example. वास्तविक दुनिया के उदाहरण का उपयोग करके डेटा-संचालित निर्णय लेने में डेटा इंजीनियर की भागीदारी का वर्णन करें।

b) How modern data architecture on cloud platforms supports the processing and consumption of data. क्लाउड प्लेटफ़ॉर्म पर आधुनिक डेटा आर्किटेक्चर डेटा की प्रोसेसिंग और खपत का समर्थन कैसे करता है?

a) Compare ETL and ELT approaches in the context of data preparation. डेटा तैयारी के संदर्भ में ETL और ELT दृष्टिकोण की तुलना करें।

b) Write about the process of creating a scalable infrastructure for an organization with an example. एक उदाहरण के लिए स्केलेबल बुनियादी ढाँचा बनाने की प्रक्रिया के बारे में एक उदाहरण सहित लिखें।

a) Explain the role of Data Lake and Data Warehouse storage in modern data architecture. आधुनिक डेटा आर्किटेक्चर में डेटा लेक और डेटा वेयरहाउस स्टोरेज की भूमिका की व्याख्या करें।

b) Discuss how big data processing frameworks contribute to handling large-scale data processing? चर्चा करें कि बड़े डेटा प्रोसेसिंग ढांचे बड़े पैमाने पर डेटा प्रोसेसिंग को संभालने में कैसे यो��दान देते हैं?

a) Discuss the key features of Apache Spark and its advantages over traditional MapReduce in terms of big data processing. अपार्चे स्पार्क की प्रमुख विशेषताओं और बड़े डेटा प्रोसेसिंग के संदर्भ में पारंपरिक MapReduce पर इसके फायदों पर चर्चा करें।

b) Describe Amazon EMR and its role in simplifying big data processing on the cloud. अमेज़न EMR और क्लाउड पर बड़े डेटा प्रोसेसिंग को सरल बनाने में इसकी भूमिका का वर्णन करें।

a) Discuss the key stages of the Machine Learning (ML) lifecycle. मशीन लर्निंग (ML) जीवनचक्र के प्रमुख चरणों पर चर्चा करें।

b) Explain how AWS SageMaker supports the development and deployment of machine learning models in a scalable manner? बताएं कि AWS SageMaker स्केलेबल तरीके से मशीन लर्निंग मॉडल के विकास और तैनाती का समर्थन कैसे करता है?

a) Explain the process of securing cloud storage with suitable diagrams. उपयुक्त आरेखों के साथ क्लाउड स्टोरेज को सुरक्षित करने की प्रक्रिया को समझाएँ।

b) Describe the importance of pre-processing and feature engineering in developing an effective ML model. एक प्रभावी ML मॉडल विकसित करने में प्��ी-प्रोसेसिंग और फीचर इंजीनियरिंग के महत्व का वर्णन करें।

Write short notes on any two: किन्हीं दो पर संक्षिप्त टिप्पणियाँ लिखें।

a) Modern Data Strategies अ) आधुनिक डेटा रणनीतियाँ

b) Apache Hadoop ब) अपाचे Hadoop

c) Purpose-built data ingestion tools स) उद्देश्य-निर्मित डेटा अंतर्ग्रहण उपकरण

d) Data wrangling and Data Discovery द) डेटा टक्करार और डेटा डिस्कवरी