Data Science - RGPV 2024 Question Paper

Save as PDF

Opens your browser print dialog — select "Save as PDF" to download.

Roll No.

CI-801 (CSIT) (GS)
B.Tech., VIII Semester
Examination, May 2024
Grading System (GS)
Data Science

Time : Three Hours

Maximum Marks : 70

Note: i)

Answer any five questions.
किन्हीं पाँच प्रश्नों को हल कीजिए।

ii)

All questions carry equal marks.
सभी प्रश्न के समान अंक हैं।

iii)

In case of any doubt or dispute the English version question should be treated as final.
किसी भी प्रकार के संदेह अथवा विवाद की स्थिति में अंग्रेजी भाषा के प्रश्न को अंतिम माना जायेगा।

1. a)

What are the three characteristics of Big Data and what are the main considerations in processing Big Data?
बिग डाटा की तीन विशेषताएँ क्या हैं? और बिग डाटा को संसाधित करने में मुख्य विचार क्या हैं?

Illustrate the various phases involved in Big Data Analytics with neat diagram.
बिग डाटा एनालिटिक्स में शामिल विभिन्न चरणों को साफ-सुथरे आरेख के साथ चित्रित करें।

2. a)

Which itemsets satisfy the minimum support of 0.5?
कौन से आइटमसेट 0.5 के न्यूनतम समर्थन को पूरा करते हैं?

3. a)

How are interesting rules identified? How are interesting rules distinguished from coincidental rules?
दिलचस्प नियमों की पहचान कैसे की जाती है? दिलचस्प नियम संयोग के नियमों से कैसे भिन्न हैं?

In MapReduce how job scheduling is done in case of the fair scheduler?
MapReduce में फेयर शेड्यूलर के मामले में जॉब शेड्यूलिंग कैसे की जाती है?

Use K-means clustering algorithm to divide the following data into two clusters.
निम्न डाटा को दो समूहों में विभाजित करने के लिए K-साधन क्लस्टरिंग एल्गोरिथम का उपयोग करें।

	1	2	3	4	5
X1	1	2	3	4	5
X2	1	1	3	2	5

4. a)

Explain storage mechanism in HBASE.
HBASE में स्टोरेज मैकेनिज्म को समझाइए।

Draw and explain HDFS Architecture. Explain the functions of NameNode and DataNodes. What is a secondary NameNode. Is it a suitable to the NamedNode.
HDFS आर्किटेक्चर को बनाएं और समझाएं। NameNode और DataNodes के कार्यों की व्याख्या करें। द्वितीयक NameNode क्या है। क्या यह NameNode के लिए उपयुक्त है।

5. a)

What R function is used to encode a vector as a category?
वेक्टर को एक श्रेणी के रूप में एन्कोड करने के लिए किस R फंक्शन का उपयोग किया जाता है?

What is a rug plot used for in a density plot?
डेंसिटी प्लॉट में रग प्लॉट का उपयोग किस लिए किया जाता है?

6. a)

Compare and contrast the terms structured, unstructured and semi-structured data
संरचित, असंरचित और अर्ध-संरचित डाटा की तुलना करें।

What is an analytic sandbox, and why is it important?
विश्लेषणात्मक सैंडबॉक्स क्या है, और यह क्यों महत्वपूर्ण है?

7. a)

Explain how MapReduce in Hadoop used to perform a word count on the specified dataset?
बताएं कि कैसे Hadoop में MapReduce निर्दिष्ट डेटासेट पर शब्द गणना करता है।

Explain following commands with syntax and one example of each
निम्नलिखित कमांड को सिंटैक्स के साथ समझाइए और प्रत्येक का कम से कम एक उदाहरण दें

copy from Local in Hadoop
showing the content of output file in Hadoop

Hadoop में लोकल से कॉपी करें
हड़ूप में आउटपुटफाइल की सामग्री दिखा रहा है

Explain any two of the following:
निम्नलिखित में से किन्हीं दो की व्याख्या कीजिए।

Advantages of Apriori algorithm
linear regression
Decision tree
Different Naive Bayes classifier

एप्रीओरी एल्गोरिथम के लाभ
रेखीय प्रतिगमन
निर्णय वृक्ष
डिफ़रेंट Naive Bayes क्लासिफायर