From 2e268fdc6a2847b73689e184873dce993c69e262 Mon Sep 17 00:00:00 2001 From: BobCochran Date: Sat, 5 Oct 2019 20:45:32 -0400 Subject: [PATCH 01/43] Create documentation folder. Add dictionary downloaded from FEC. --- documentation/indiv_dictionary.txt | 288 +++++++++++++++++++++++++++++ 1 file changed, 288 insertions(+) create mode 100644 documentation/indiv_dictionary.txt diff --git a/documentation/indiv_dictionary.txt b/documentation/indiv_dictionary.txt new file mode 100644 index 0000000..109fee8 --- /dev/null +++ b/documentation/indiv_dictionary.txt @@ -0,0 +1,288 @@ +revised 09/29/2008 +INDIVIDUAL CONTRIBUTIONS FILE +Federal Election Commission +999 E Street, NW +Washington, DC 20463 + +DATA DESCRIPTION +Individual Contributions File + +The zipped files should be downloaded as binary and unzipped. + +Summary: The individual contributions file contains each contribution from an individual to a federal committee if the contribution was at least $200. + +Universe: All individual contributions $200 or more. + +Associated Files: + +Data File: INDIVXX.ZIP +Frequency Counts: INDIVXX.TXT +Data Dictionary: INDIV_DICTIONARY.TXT + + +The variables have been formatted in the following ways: + + + +Mnemonics +Currently +Used by the Field +CommissionVariable Columns Desc. + + +--------------------------------------------------------------- + +ITEM-FILER Filer Identification Number 1-9 9s +ITEM-AMEND Amendment Indicator 10 1s +ITEM-REPT Report Type 11-13 3s +ITEM-PGI Primary-General Indicator 14 1s +ITEM-MICRO Microfilm Location (YYOORRRFFFF) 15-25 11s +ITEM-TRANS Transaction Type 26-28 3s +ITEM-NAME Contributor/Lender/Transfer Name 29-62 34s +ITEM-CTY City/Town 63-80 18s +ITEM-ST State 81-82 2s +ITEM-ZIP Zip Code 83-87 5s +ITEM-OCCU Occupation 88-122 35s +IT-TMN Transaction Date - Month 123-124 2d +IT-TDY Transaction Date - Day 125-126 2d +IT-TCC Transaction Date - Century 127-128 2d +IT-TYY Transaction Date - Year 129-130 2d +ITEM-AMT Amount 131-137 7n +ITEM-OID Other Identification Number 138-146 9s +ITEM-RN FEC Record Number 147-153 7s + +Data Type: s = string (alpha or alpha-numeric); d = date; n = numeric + + +Variable Documentation + + + Filer Identification Number + Columns 1-9 + String + +A 9-character alpha-numeric code assigned to a committee by the Federal Election Commission. + + --------- + Amendment Indicator + Columns 10-10 + String + +A AMENDMENT +C CONSOLIDATED +M MULTI-CANDIDATE +N NEW +S SECONDARY +T TERMINATED + +Indicates if the report being filed is new (N), an Amendment (A) to a previous report, or a termination (T) report. + + --------- + Report Type + Columns 11-13 + String + +Indicates the type of report filed. + +10D PRE-ELECTION +10G PRE-GENERAL +10P PRE-PRIMARY +10R PRE-RUN-OFF +10S PRE-SPECIAL +12C PRE-CONVENTION +12G PRE-GENERAL +12P PRE-PRIMARY +12R PRE-RUN-OFF +12S PRE-SPECIAL +30D POST-ELECTION +30G POST-GENERAL +30P POST-PRIMARY +30R POST-RUN-OFF +30S POST-SPECIAL +60D POST-ELECTION +ADJ COMP ADJUST AMEND +CA COMPREHENSIVE AMEND +M1 JANUARY MONTHLY +M10 OCTOBER MONTHLY +M11 NOVEMBER MONTHLY +M12 DECEMBER MONTHLY +M2 FEBRUARY MONTHLY +M3 MARCH MONTHLY +M4 APRIL MONTHLY +M5 MAY MONTHLY +M6 JUNE MONTHLY +M7 JULY MONTHLY +M8 AUGUST MONTHLY +M9 SEPTEMBER MONTHLY +MY MID-YEAR REPORT +Q1 APRIL QUARTERLY +Q2 JULY QUARTERLY +Q3 OCTOBER QUARTERLY +TER TERMINATION REPORT +YE YEAR-END +90S POST INAUGURAL SUPPLEMENT +90D POST INAUGURAL +48H 48 HOUR NOTIFICATION +24H 24 HOUR NOTIFICATION +. + +-------- + Primary-General Indicator + Columns 14-14 + String + +C CONVENTION +G GENERAL +P PRIMARY +R RUNOFF +S SPECIAL + +This code indicates the type of election or if the committee is retiring debt. Numeric codes are for those committees that are retiring previous election cycle debt. Alpha codes are for those committees active in the current election cycle. + + --------- + Microfilm Location (YYOORRRFFFF) + Columns 15-25 + String + +Indicates the physical location of the filing. + + --------- + Transaction Type + Columns 26-28 + String + +10 NON-FEDERAL RECEIPT FROM PERSONS LEVIN (L-1A) +11 TRIBAL CONTRIBUTION +12 NON-FEDERAL OTHER RECEIPT LEVIN (L-2) +13 INAUGURAL DONATION ACCEPTED +15 CONTRIBUTION +15C CONTRIBUTION FROM CANDIDATE +15E EARMARKED CONTRIBUTION +15F LOANS FORGIVEN BY CANDIDATE +15I EARMARKED INTERMEDIARY IN +15J MEMO (FILER'S % OF CONTRIBUTION GIVEN TO JOIN +15T EARMARKED INTERMEDIARY TREASURY IN +15Z IN-KIND CONTRIBUTION RECEIVED FROM REGISTERED +16C LOANS RECEIVED FROM THE CANDIDATE +16F LOANS RECEIVED FROM BANKS +16G LOAN FROM INDIVIDUAL +16H LOAN FROM CANDIDATE/COMMITTEE +16J LOAN REPAYMENTS FROM INDIVIDUAL +16K LOAN REPAYMENTS FROM CANDIDATE/COMMITTEE +16L LOAN REPAYMENTS RECEIVED FROM UNREGISTERED EN +16R LOANS RECEIVED FROM REGISTERED FILERS +16U LOAN RECEIVED FROM UNREGISTERED ENTITY +17R CONTRIBUTION REFUND RECEIVED FROM REGISTERED +17U REF/REB/RET RECEIVED FROM UNREGISTERED ENTITY +17Y REF/REB/RET FROM INDIVIDUAL/CORPORATION +17Z REF/REB/RET FROM CANDIDATE/COMMITTEE +18G TRANSFER IN AFFILIATED +18H HONORARIUM RECEIVED +18J MEMO (FILER'S % OF CONTRIBUTION GIVEN TO JOIN +18K CONTRIBUTION RECEIVED FROM REGISTERED FILER +18S RECEIPTS FROM SECRETARY OF STATE +18U CONTRIBUTION RECEIVED FROM UNREGISTERED COMMI +19 ELECTIONEERING COMMUNICATION DONATION RECEIVE +19J MEMO (ELECTIONEERING COMMUNICATION % OF DONAT +20 DISBURSEMENT - EXEMPT FROM LIMITS +20A NON-FEDERAL DISBURSEMENT LEVIN (L-4A) VOTER R +20B NON-FEDERAL DISBURSEMENT LEVIN (L-4B) VOTER I +20C LOAN REPAYMENTS MADE TO CANDIDATE +20D NON-FEDERAL DISBURSEMENT LEVIN (L-4D) GENERIC +20F LOAN REPAYMENTS MADE TO BANKS +20G LOAN REPAYMENTS MADE TO INDIVIDUAL +20R LOAN REPAYMENTS MADE TO REGISTERED FILER +20V NON-FEDERAL DISBURSEMENT LEVIN (L-4C) GET OUT +22G LOAN TO INDIVIDUAL +22H LOAN TO CANDIDATE/COMMITTEE +22J LOAN REPAYMENT TO INDIVIDUAL +22K LOAN REPAYMENT TO CANDIDATE/COMMITTEE +22L LOAN REPAYMENT TO BANK +22R CONTRIBUTION REFUND TO UNREGISTERED ENTITY +22U LOAN REPAID TO UNREGISTERED ENTITY +22X LOAN MADE TO UNREGISTERED ENTITY +22Y CONTRIBUTION REFUND TO INDIVIDUAL +22Z CONTRIBUTION REFUND TO CANDIDATE/COMMITTEE +23Y INAUGURAL DONATION REFUND +24A INDEPENDENT EXPENDITURE AGAINST +24C COORDINATED EXPENDITURE +24E INDEPENDENT EXPENDITURE FOR +24F COMMUNICATION COST FOR CANDIDATE (C7) +24G TRANSFER OUT AFFILIATED +24H HONORARIUM TO CANDIDATE +24I EARMARKED INTERMEDIARY OUT +24K CONTRIBUTION MADE TO NON-AFFILIATED +24N COMMUNICATION COST AGAINST CANDIDATE (C7) +24P CONTRIBUTION MADE TO POSSIBLE CANDIDATE +24R ELECTION RECOUNT DISBURSEMENT +24T EARMARKED INTERMEDIARY TREASURY OUT +24U CONTRIBUTION MADE TO UNREGISTERED +24Z IN-KIND CONTRIBUTION MADE TO REGISTERED FILER +29 ELECTIONEERING COMMUNICATION DISBURSEMENT(S) + + + --------- + Name (Contributor / Lender / Transfer) + Columns 20-62 + String + +Reported name of the contributor. + + --------- + City + Columns 63-80 + String + + --------- + State + Columns 81-82 + String + + --------- + US Postal ZIP Code + Columns 83-87 + String + + Note: City, State, and ZIP Code information are reported. + + --------- + Occupation + Columns 88-122 + String + + Reported occupation of donor. + + --------- + Columns 123-124 + Date + + --------- + Day + Columns 125-126 + Date + + --------- + Year + Columns 127-130 + Date + + --------- + Amount + Columns 131-137 + Numeric + + In the fixed width text file, the amounts are in COBOL format. If the value is negative, the right most column will contain a special character: ] = -0, j = -1, k = -2, l = -3, m = -4, n = -5, o = -6, p = -7, q = -8, and r = -9. + + + --------- + Other Identification Number + Columns 138-146 + String + +For contributions from individuals this variable is null. For contributions from candidates or other committees this variable will indicate that contributor. + + --------- + FEC Record Number + Columns 147-153 + String + From f5062c1642447de478946b73dfb2193e4c428d43 Mon Sep 17 00:00:00 2001 From: BobCochran Date: Sun, 6 Oct 2019 09:30:16 -0400 Subject: [PATCH 02/43] Added header file to documentation folder. --- documentation/indiv_header_file.csv | 1 + 1 file changed, 1 insertion(+) create mode 100644 documentation/indiv_header_file.csv diff --git a/documentation/indiv_header_file.csv b/documentation/indiv_header_file.csv new file mode 100644 index 0000000..50e8636 --- /dev/null +++ b/documentation/indiv_header_file.csv @@ -0,0 +1 @@ +CMTE_ID,AMNDT_IND,RPT_TP,TRANSACTION_PGI,IMAGE_NUM,TRANSACTION_TP,ENTITY_TP,NAME,CITY,STATE,ZIP_CODE,EMPLOYER,OCCUPATION,TRANSACTION_DT,TRANSACTION_AMT,OTHER_ID,TRAN_ID,FILE_NUM,MEMO_CD,MEMO_TEXT,SUB_ID From 193d1eed2b2b066437bb995c7581af60ffd145a8 Mon Sep 17 00:00:00 2001 From: BobCochran Date: Sun, 6 Oct 2019 09:41:36 -0400 Subject: [PATCH 03/43] Expanded the .gitignore coverage --- .gitignore | 100 +++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 98 insertions(+), 2 deletions(-) diff --git a/.gitignore b/.gitignore index eaa4e5e..bba6e68 100644 --- a/.gitignore +++ b/.gitignore @@ -1,2 +1,98 @@ -itcont.txt -node_modules/ \ No newline at end of file +# Logs +logs +*.log +npm-debug.log* +yarn-debug.log* +yarn-error.log* +lerna-debug.log* + +# Diagnostic reports (https://nodejs.org/api/report.html) +report.[0-9]*.[0-9]*.[0-9]*.[0-9]*.json + +# Runtime data +pids +*.pid +*.seed +*.pid.lock + +# Directory for instrumented libs generated by jscoverage/JSCover +lib-cov + +# Coverage directory used by tools like istanbul +coverage +*.lcov + +# nyc test coverage +.nyc_output + +# Grunt intermediate storage (https://gruntjs.com/creating-plugins#storing-task-files) +.grunt + +# Bower dependency directory (https://bower.io/) +bower_components + +# node-waf configuration +.lock-wscript + +# Compiled binary addons (https://nodejs.org/api/addons.html) +build/Release + +# Dependency directories +node_modules/ +jspm_packages/ + +# TypeScript v1 declaration files +typings/ + +# TypeScript cache +*.tsbuildinfo + +# Optional npm cache directory +.npm + +# Optional eslint cache +.eslintcache + +# Microbundle cache +.rpt2_cache/ +.rts2_cache_cjs/ +.rts2_cache_es/ +.rts2_cache_umd/ + +# Optional REPL history +.node_repl_history + +# Output of 'npm pack' +*.tgz + +# Yarn Integrity file +.yarn-integrity + +# dotenv environment variables file +.env +.env.test + +# parcel-bundler cache (https://parceljs.org/) +.cache + +# next.js build output +.next + +# nuxt.js build output +.nuxt + +# gatsby files +.cache/ +public + +# vuepress build output +.vuepress/dist + +# Serverless directories +.serverless/ + +# FuseBox cache +.fusebox/ + +# DynamoDB Local files +.dynamodb/ From 507bbc425df6a246ee57c7567ae79d3156d062de Mon Sep 17 00:00:00 2001 From: BobCochran Date: Sun, 6 Oct 2019 09:47:15 -0400 Subject: [PATCH 04/43] Removed .vscode directory, not needed for Linux and probably needs to be rebuilt after project is done. --- .vscode/launch.json | 14 -------------- .vscode/settings.json | 2 -- 2 files changed, 16 deletions(-) delete mode 100644 .vscode/launch.json delete mode 100644 .vscode/settings.json diff --git a/.vscode/launch.json b/.vscode/launch.json deleted file mode 100644 index 524e500..0000000 --- a/.vscode/launch.json +++ /dev/null @@ -1,14 +0,0 @@ -{ - // Use IntelliSense to learn about possible attributes. - // Hover to view descriptions of existing attributes. - // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387 - "version": "0.2.0", - "configurations": [ - { - "type": "node", - "request": "launch", - "name": "Launch Program", - "program": "${workspaceFolder}/readFileStream.js" - } - ] -} \ No newline at end of file diff --git a/.vscode/settings.json b/.vscode/settings.json deleted file mode 100644 index 7a73a41..0000000 --- a/.vscode/settings.json +++ /dev/null @@ -1,2 +0,0 @@ -{ -} \ No newline at end of file From 98964aec947b97fc5f0d9e44fa7f89a315d60620 Mon Sep 17 00:00:00 2001 From: BobCochran Date: Sun, 6 Oct 2019 17:21:57 -0400 Subject: [PATCH 05/43] Start to reformat FEC individual donation input to json output. --- .../reformat_sec_data_to_json.js | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 database/mongodb_version4/reformat_sec_data_to_json.js diff --git a/database/mongodb_version4/reformat_sec_data_to_json.js b/database/mongodb_version4/reformat_sec_data_to_json.js new file mode 100644 index 0000000..7d36c62 --- /dev/null +++ b/database/mongodb_version4/reformat_sec_data_to_json.js @@ -0,0 +1,17 @@ +/* This code uses the Node.js readline API to read political campaign + * donation data obtained from the United States Federal Election + * Commission (the "FEC".) Each line of input from the *.txt file + * is reformatted into an output record that is in JSON format, and + * uses the specific data types documented by the MongoDB version 4.x + * database server. + * + * The specific input files being reformatted by this code are the + * SEC records of political donations by individuals, of USD $200.00 + * or more. For example, the "indiv20.zip" file in the SEC bulk + * downloads area contains multiple *.txt files, each of which records + * a political donation of $200.00 or more by a named individual. + * The record layout is provided in the "documentation" folder appearing + * at the root folder of this repository. + * + */ + From 57c043df164b44c55c7658a64af213fda18cd9e8 Mon Sep 17 00:00:00 2001 From: BobCochran Date: Sun, 6 Oct 2019 17:37:00 -0400 Subject: [PATCH 06/43] Rename the node script, then put in the initial code. --- ...o_json.js => reformat_fec_data_to_json.js} | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) rename database/mongodb_version4/{reformat_sec_data_to_json.js => reformat_fec_data_to_json.js} (66%) diff --git a/database/mongodb_version4/reformat_sec_data_to_json.js b/database/mongodb_version4/reformat_fec_data_to_json.js similarity index 66% rename from database/mongodb_version4/reformat_sec_data_to_json.js rename to database/mongodb_version4/reformat_fec_data_to_json.js index 7d36c62..96c35a6 100644 --- a/database/mongodb_version4/reformat_sec_data_to_json.js +++ b/database/mongodb_version4/reformat_fec_data_to_json.js @@ -7,7 +7,7 @@ * * The specific input files being reformatted by this code are the * SEC records of political donations by individuals, of USD $200.00 - * or more. For example, the "indiv20.zip" file in the SEC bulk + * or more. For example, the "indiv20.zip" file in the FEC bulk * downloads area contains multiple *.txt files, each of which records * a political donation of $200.00 or more by a named individual. * The record layout is provided in the "documentation" folder appearing @@ -15,3 +15,20 @@ * */ +const fs = require('fs'); +const readline = require('readline'); + +const rl = readline.createInterface({ + input: fs.createReadStream('process.argv[2]'), + crlfDelay: Infinity +}); + +rl.on('line', (line) => { + console.log(`Line from file: ${line}`); +}); + +rl.on('close', () => { + console.log('Have a great day!') +}) + + From 840983803b1bed333e0e87d68af12fed5ad487fa Mon Sep 17 00:00:00 2001 From: BobCochran Date: Sun, 6 Oct 2019 18:08:32 -0400 Subject: [PATCH 07/43] Split the header line. --- .../reformat_fec_data_to_json.js | 27 ++++++++++++++++--- 1 file changed, 24 insertions(+), 3 deletions(-) diff --git a/database/mongodb_version4/reformat_fec_data_to_json.js b/database/mongodb_version4/reformat_fec_data_to_json.js index 96c35a6..056e28a 100644 --- a/database/mongodb_version4/reformat_fec_data_to_json.js +++ b/database/mongodb_version4/reformat_fec_data_to_json.js @@ -18,17 +18,38 @@ const fs = require('fs'); const readline = require('readline'); +//Count number of lines + +var lineCount = 0; + +//An array that holds the header line of the csv file. +var myHdr = []; + const rl = readline.createInterface({ - input: fs.createReadStream('process.argv[2]'), + input: fs.createReadStream(process.argv[2]), crlfDelay: Infinity }); rl.on('line', (line) => { - console.log(`Line from file: ${line}`); + + lineCount++ + + if (lineCount === 1) { + + myHdr = line.split('|', 3) + + console.log('The first 3 elements from the header line are ' + myHdr) + + } + + console.log(`Line from file: ${line}`) + }); rl.on('close', () => { - console.log('Have a great day!') + + console.log('Number of lines processed is ' + lineCount) + }) From 5544c01f8de8198050d8132946e39056b43dad7a Mon Sep 17 00:00:00 2001 From: BobCochran Date: Sun, 6 Oct 2019 18:13:15 -0400 Subject: [PATCH 08/43] Test whether Niedringhaus splitting code works. --- database/mongodb_version4/reformat_fec_data_to_json.js | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/database/mongodb_version4/reformat_fec_data_to_json.js b/database/mongodb_version4/reformat_fec_data_to_json.js index 056e28a..86442b8 100644 --- a/database/mongodb_version4/reformat_fec_data_to_json.js +++ b/database/mongodb_version4/reformat_fec_data_to_json.js @@ -36,7 +36,9 @@ rl.on('line', (line) => { if (lineCount === 1) { - myHdr = line.split('|', 3) + //Next line is how Paige Niedringhaus splits the line + + myHdr = line.split('|')[3] console.log('The first 3 elements from the header line are ' + myHdr) From 467861ebe0dc78ddbe2dfe535078128ad44952f8 Mon Sep 17 00:00:00 2001 From: BobCochran Date: Sun, 6 Oct 2019 18:24:24 -0400 Subject: [PATCH 09/43] Split the full header line. --- .../reformat_fec_data_to_json.js | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/database/mongodb_version4/reformat_fec_data_to_json.js b/database/mongodb_version4/reformat_fec_data_to_json.js index 86442b8..b6b1fce 100644 --- a/database/mongodb_version4/reformat_fec_data_to_json.js +++ b/database/mongodb_version4/reformat_fec_data_to_json.js @@ -26,21 +26,34 @@ var lineCount = 0; var myHdr = []; const rl = readline.createInterface({ + input: fs.createReadStream(process.argv[2]), + crlfDelay: Infinity + }); +// Split and save the first line -- treat that as the header line. + rl.on('line', (line) => { lineCount++ if (lineCount === 1) { - //Next line is how Paige Niedringhaus splits the line + /* Code by the original author splits a line using a + * technique like this: + * + * myHdr = line.split('|')[3] + * + * It has the effect of skipping the first 3 elements and + * capturing the fourth element -- and only the fourth. + * + */ - myHdr = line.split('|')[3] + myHdr = line.split('|') - console.log('The first 3 elements from the header line are ' + myHdr) + console.log('Elements from the header line are ' + myHdr) } From 17c968e0262f1a7d4c8e1598ee1ed85319491998 Mon Sep 17 00:00:00 2001 From: BobCochran Date: Sun, 6 Oct 2019 18:33:04 -0400 Subject: [PATCH 10/43] Split the transaction line. --- database/mongodb_version4/reformat_fec_data_to_json.js | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/database/mongodb_version4/reformat_fec_data_to_json.js b/database/mongodb_version4/reformat_fec_data_to_json.js index b6b1fce..5a149a1 100644 --- a/database/mongodb_version4/reformat_fec_data_to_json.js +++ b/database/mongodb_version4/reformat_fec_data_to_json.js @@ -57,7 +57,13 @@ rl.on('line', (line) => { } - console.log(`Line from file: ${line}`) + if (lineCount > 1) { + + var myTrans = line.split('|') + + console.log('Elements from the transaction ' + myTrans) + + } }); From 45137da3565fd6696e5b8c190752e2b4b9ac3d09 Mon Sep 17 00:00:00 2001 From: BobCochran Date: Sun, 6 Oct 2019 18:58:26 -0400 Subject: [PATCH 11/43] Start actual reformat to json. --- database/mongodb_version4/reformat_fec_data_to_json.js | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/database/mongodb_version4/reformat_fec_data_to_json.js b/database/mongodb_version4/reformat_fec_data_to_json.js index 5a149a1..231139c 100644 --- a/database/mongodb_version4/reformat_fec_data_to_json.js +++ b/database/mongodb_version4/reformat_fec_data_to_json.js @@ -61,7 +61,15 @@ rl.on('line', (line) => { var myTrans = line.split('|') - console.log('Elements from the transaction ' + myTrans) + var jstring = "{ " + + for (i = 0; i < 2; i++) { + + jstring = jstring + "\"" + myHdr[i] + "\" : " + myTrans[i] + + } + + console.log(jstring) } From 1657a24210988f630fb6678e4c0a51695e9f748d Mon Sep 17 00:00:00 2001 From: BobCochran Date: Sun, 6 Oct 2019 19:04:26 -0400 Subject: [PATCH 12/43] Build the json output. --- database/mongodb_version4/reformat_fec_data_to_json.js | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/database/mongodb_version4/reformat_fec_data_to_json.js b/database/mongodb_version4/reformat_fec_data_to_json.js index 231139c..346e9a5 100644 --- a/database/mongodb_version4/reformat_fec_data_to_json.js +++ b/database/mongodb_version4/reformat_fec_data_to_json.js @@ -65,7 +65,7 @@ rl.on('line', (line) => { for (i = 0; i < 2; i++) { - jstring = jstring + "\"" + myHdr[i] + "\" : " + myTrans[i] + jstring = jstring + "\"" + myHdr[i] + "\" : " + "\"" + myTrans[i] + "\"" } From 4cb0105c18dedda964bc8cfbf8fa3706729a4b87 Mon Sep 17 00:00:00 2001 From: BobCochran Date: Sun, 6 Oct 2019 19:07:00 -0400 Subject: [PATCH 13/43] Build the json output. --- database/mongodb_version4/reformat_fec_data_to_json.js | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/database/mongodb_version4/reformat_fec_data_to_json.js b/database/mongodb_version4/reformat_fec_data_to_json.js index 346e9a5..52ba867 100644 --- a/database/mongodb_version4/reformat_fec_data_to_json.js +++ b/database/mongodb_version4/reformat_fec_data_to_json.js @@ -65,7 +65,7 @@ rl.on('line', (line) => { for (i = 0; i < 2; i++) { - jstring = jstring + "\"" + myHdr[i] + "\" : " + "\"" + myTrans[i] + "\"" + jstring = jstring + "\"" + myHdr[i] + "\" : " + "\"" + myTrans[i] + "\"\, " } From 95147bbaf895db60951f32cc47b63033be510d43 Mon Sep 17 00:00:00 2001 From: BobCochran Date: Sun, 6 Oct 2019 19:09:09 -0400 Subject: [PATCH 14/43] Build the json output. --- database/mongodb_version4/reformat_fec_data_to_json.js | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/database/mongodb_version4/reformat_fec_data_to_json.js b/database/mongodb_version4/reformat_fec_data_to_json.js index 52ba867..f4c67e1 100644 --- a/database/mongodb_version4/reformat_fec_data_to_json.js +++ b/database/mongodb_version4/reformat_fec_data_to_json.js @@ -63,7 +63,7 @@ rl.on('line', (line) => { var jstring = "{ " - for (i = 0; i < 2; i++) { + for (i = 0; i < 3; i++) { jstring = jstring + "\"" + myHdr[i] + "\" : " + "\"" + myTrans[i] + "\"\, " From c7739dfabe93e5c5103095f9f50ad0c7a9ce969f Mon Sep 17 00:00:00 2001 From: BobCochran Date: Sun, 6 Oct 2019 19:12:32 -0400 Subject: [PATCH 15/43] Build the json output. --- database/mongodb_version4/reformat_fec_data_to_json.js | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/database/mongodb_version4/reformat_fec_data_to_json.js b/database/mongodb_version4/reformat_fec_data_to_json.js index f4c67e1..1fae799 100644 --- a/database/mongodb_version4/reformat_fec_data_to_json.js +++ b/database/mongodb_version4/reformat_fec_data_to_json.js @@ -63,7 +63,7 @@ rl.on('line', (line) => { var jstring = "{ " - for (i = 0; i < 3; i++) { + for (i = 0; i < 5; i++) { jstring = jstring + "\"" + myHdr[i] + "\" : " + "\"" + myTrans[i] + "\"\, " From 53526a2cd98ca6a8d96c2c644b569d1a4f5b7402 Mon Sep 17 00:00:00 2001 From: BobCochran Date: Sun, 6 Oct 2019 19:16:13 -0400 Subject: [PATCH 16/43] Build the json output. --- database/mongodb_version4/reformat_fec_data_to_json.js | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/database/mongodb_version4/reformat_fec_data_to_json.js b/database/mongodb_version4/reformat_fec_data_to_json.js index 1fae799..2efa4b7 100644 --- a/database/mongodb_version4/reformat_fec_data_to_json.js +++ b/database/mongodb_version4/reformat_fec_data_to_json.js @@ -63,7 +63,7 @@ rl.on('line', (line) => { var jstring = "{ " - for (i = 0; i < 5; i++) { + for (i = 0; i < 7; i++) { jstring = jstring + "\"" + myHdr[i] + "\" : " + "\"" + myTrans[i] + "\"\, " From f14083a1b2e1b4fb0decbb3550feffb1b379c372 Mon Sep 17 00:00:00 2001 From: BobCochran Date: Sun, 6 Oct 2019 19:24:42 -0400 Subject: [PATCH 17/43] Build the json output. --- database/mongodb_version4/reformat_fec_data_to_json.js | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/database/mongodb_version4/reformat_fec_data_to_json.js b/database/mongodb_version4/reformat_fec_data_to_json.js index 2efa4b7..62aae35 100644 --- a/database/mongodb_version4/reformat_fec_data_to_json.js +++ b/database/mongodb_version4/reformat_fec_data_to_json.js @@ -63,7 +63,7 @@ rl.on('line', (line) => { var jstring = "{ " - for (i = 0; i < 7; i++) { + for (i = 0; i < 13; i++) { jstring = jstring + "\"" + myHdr[i] + "\" : " + "\"" + myTrans[i] + "\"\, " From eb9910c410a421e479e822df593ee63ec8c3d9dc Mon Sep 17 00:00:00 2001 From: BobCochran Date: Wed, 9 Oct 2019 18:23:25 -0400 Subject: [PATCH 18/43] Continue reformatting. --- database/mongodb_version4/reformat_fec_data_to_json.js | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/database/mongodb_version4/reformat_fec_data_to_json.js b/database/mongodb_version4/reformat_fec_data_to_json.js index 62aae35..ac908fb 100644 --- a/database/mongodb_version4/reformat_fec_data_to_json.js +++ b/database/mongodb_version4/reformat_fec_data_to_json.js @@ -48,7 +48,8 @@ rl.on('line', (line) => { * * It has the effect of skipping the first 3 elements and * capturing the fourth element -- and only the fourth. - * + * What I wish to do is different: split every field out, + * in order to reformat them into json-ified records. */ myHdr = line.split('|') @@ -63,7 +64,7 @@ rl.on('line', (line) => { var jstring = "{ " - for (i = 0; i < 13; i++) { + for (i = 0; i < 16; i++) { jstring = jstring + "\"" + myHdr[i] + "\" : " + "\"" + myTrans[i] + "\"\, " From d67b5f05f8851a3a700be5aff398551b53b2515f Mon Sep 17 00:00:00 2001 From: BobCochran Date: Wed, 9 Oct 2019 19:30:31 -0400 Subject: [PATCH 19/43] Correctly reformat transaction date to an ISO8601 date value that mongoimport will accept. --- .../reformat_fec_data_to_json.js | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/database/mongodb_version4/reformat_fec_data_to_json.js b/database/mongodb_version4/reformat_fec_data_to_json.js index ac908fb..305dafb 100644 --- a/database/mongodb_version4/reformat_fec_data_to_json.js +++ b/database/mongodb_version4/reformat_fec_data_to_json.js @@ -64,7 +64,23 @@ rl.on('line', (line) => { var jstring = "{ " - for (i = 0; i < 16; i++) { + for (i = 0; i < 14; i++) { + + /* The 14th index value is the transaction date. This needs to be reformated + * from a MMDDYYYY string to a YYYY-MM-DD string that can be converted to + * ISO8601 date format acceptable to the MongoDB 'mongoimport' utility. + */ + + if (i === 13) { + + var myDateStr = myTrans[i] + + var theISODt = "ISODate\(\"" + myDateStr[4] + myDateStr[5] + myDateStr[6] + myDateStr[7] + "\-" + myDateStr[0] + myDateStr[1] + "\-" + + theISODt = theISODt + myDateStr[2] + myDateStr[3] + "\"" + "\)" + + console.log("Date value we are processing " + theISODt) + } jstring = jstring + "\"" + myHdr[i] + "\" : " + "\"" + myTrans[i] + "\"\, " From a35efd8e27a9c87298381051929c18800cc0093b Mon Sep 17 00:00:00 2001 From: BobCochran Date: Wed, 9 Oct 2019 19:46:16 -0400 Subject: [PATCH 20/43] Add ISODate value to json string. --- .../mongodb_version4/reformat_fec_data_to_json.js | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/database/mongodb_version4/reformat_fec_data_to_json.js b/database/mongodb_version4/reformat_fec_data_to_json.js index 305dafb..d7d7804 100644 --- a/database/mongodb_version4/reformat_fec_data_to_json.js +++ b/database/mongodb_version4/reformat_fec_data_to_json.js @@ -80,17 +80,19 @@ rl.on('line', (line) => { theISODt = theISODt + myDateStr[2] + myDateStr[3] + "\"" + "\)" console.log("Date value we are processing " + theISODt) - } + + jstring = jstring + "\"" + myHdr[i] + "\" : " + theISODt + "\"\, " + + } else { jstring = jstring + "\"" + myHdr[i] + "\" : " + "\"" + myTrans[i] + "\"\, " } - console.log(jstring) - - } + } -}); + console.log(jstring) +} } ); rl.on('close', () => { From 892ceb92dc3efe22e433d88f945a8eaed8a74655 Mon Sep 17 00:00:00 2001 From: BobCochran Date: Wed, 9 Oct 2019 20:17:33 -0400 Subject: [PATCH 21/43] Reformatting of a single transaction is complete. --- .../reformat_fec_data_to_json.js | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/database/mongodb_version4/reformat_fec_data_to_json.js b/database/mongodb_version4/reformat_fec_data_to_json.js index d7d7804..d46fe48 100644 --- a/database/mongodb_version4/reformat_fec_data_to_json.js +++ b/database/mongodb_version4/reformat_fec_data_to_json.js @@ -64,9 +64,9 @@ rl.on('line', (line) => { var jstring = "{ " - for (i = 0; i < 14; i++) { + for (i = 0; i < 21; i++) { - /* The 14th index value is the transaction date. This needs to be reformated + /* The 13th index value is the transaction date. This needs to be reformated * from a MMDDYYYY string to a YYYY-MM-DD string that can be converted to * ISO8601 date format acceptable to the MongoDB 'mongoimport' utility. */ @@ -81,8 +81,18 @@ rl.on('line', (line) => { console.log("Date value we are processing " + theISODt) - jstring = jstring + "\"" + myHdr[i] + "\" : " + theISODt + "\"\, " + jstring = jstring + "\"" + myHdr[i] + "\" : " + theISODt + "\, " + } + + /* The 20th index value is the final field to be reformatted. We want to close the + * string with a valid JSON closing brace. + */ + + else if (i === 20) { + + jstring = jstring + "\"" + myHdr[i] + "\" : " + "\"" + myTrans[i] + "\"" + " \}" + } else { jstring = jstring + "\"" + myHdr[i] + "\" : " + "\"" + myTrans[i] + "\"\, " From 71d5d1f213ce27aeaf5a83e8b5019dde39fb59e4 Mon Sep 17 00:00:00 2001 From: BobCochran Date: Wed, 9 Oct 2019 20:39:41 -0400 Subject: [PATCH 22/43] Correctly reformats one transaction to JSON. --- database/mongodb_version4/reformat_fec_data_to_json.js | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/database/mongodb_version4/reformat_fec_data_to_json.js b/database/mongodb_version4/reformat_fec_data_to_json.js index d46fe48..5c5445c 100644 --- a/database/mongodb_version4/reformat_fec_data_to_json.js +++ b/database/mongodb_version4/reformat_fec_data_to_json.js @@ -77,7 +77,7 @@ rl.on('line', (line) => { var theISODt = "ISODate\(\"" + myDateStr[4] + myDateStr[5] + myDateStr[6] + myDateStr[7] + "\-" + myDateStr[0] + myDateStr[1] + "\-" - theISODt = theISODt + myDateStr[2] + myDateStr[3] + "\"" + "\)" + theISODt = theISODt + myDateStr[2] + myDateStr[3] + "T00\:00\:00Z\"" + "\)" console.log("Date value we are processing " + theISODt) From 4520c92dab9ed86a3f717924e115be6aa0fed138 Mon Sep 17 00:00:00 2001 From: BobCochran Date: Thu, 10 Oct 2019 18:08:42 -0400 Subject: [PATCH 23/43] Start processing transaction amount. --- .../mongodb_version4/reformat_fec_data_to_json.js | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/database/mongodb_version4/reformat_fec_data_to_json.js b/database/mongodb_version4/reformat_fec_data_to_json.js index 5c5445c..dd76e62 100644 --- a/database/mongodb_version4/reformat_fec_data_to_json.js +++ b/database/mongodb_version4/reformat_fec_data_to_json.js @@ -83,6 +83,18 @@ rl.on('line', (line) => { jstring = jstring + "\"" + myHdr[i] + "\" : " + theISODt + "\, " + } + + /* The 14th index value is the transaction amount field. Reformat this into a + * $numberDecimal value (also known as Decimal128.) + */ + + else if (i === 14) { + + var myAmt = myTrans[i] + + console.log("Amount we are processing " + myAmt) + } /* The 20th index value is the final field to be reformatted. We want to close the From 7ea9898462fc7d7b6a4b62d51d11cec6403d1bc0 Mon Sep 17 00:00:00 2001 From: BobCochran Date: Thu, 10 Oct 2019 18:22:42 -0400 Subject: [PATCH 24/43] Continue reformatting money amount --- database/mongodb_version4/reformat_fec_data_to_json.js | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/database/mongodb_version4/reformat_fec_data_to_json.js b/database/mongodb_version4/reformat_fec_data_to_json.js index dd76e62..413ce00 100644 --- a/database/mongodb_version4/reformat_fec_data_to_json.js +++ b/database/mongodb_version4/reformat_fec_data_to_json.js @@ -86,14 +86,17 @@ rl.on('line', (line) => { } /* The 14th index value is the transaction amount field. Reformat this into a - * $numberDecimal value (also known as Decimal128.) + * $numberDecimal value (also known as Decimal128.) The value has to be formatted + * like so: "TRANSACTION_AMT" : {"$numberDecimal" : "120.00"} */ else if (i === 14) { var myAmt = myTrans[i] - console.log("Amount we are processing " + myAmt) + var theContr = "\{\"\$numberDecimal\" \: \"" + myAmt + "\"\}" + + console.log("Amount we are processing " + myAmt + " reformatted data is " + theContr) } From 7aab2aae812407f6377bfc05b7852fac8c7998d3 Mon Sep 17 00:00:00 2001 From: BobCochran Date: Thu, 10 Oct 2019 18:27:46 -0400 Subject: [PATCH 25/43] Reformat the contribution amount into decimal128. --- database/mongodb_version4/reformat_fec_data_to_json.js | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/database/mongodb_version4/reformat_fec_data_to_json.js b/database/mongodb_version4/reformat_fec_data_to_json.js index 413ce00..96069bd 100644 --- a/database/mongodb_version4/reformat_fec_data_to_json.js +++ b/database/mongodb_version4/reformat_fec_data_to_json.js @@ -94,9 +94,11 @@ rl.on('line', (line) => { var myAmt = myTrans[i] - var theContr = "\{\"\$numberDecimal\" \: \"" + myAmt + "\"\}" + var theContr = "\{\"\$numberDecimal\" \: \"" + myAmt + "\.00\"\}" console.log("Amount we are processing " + myAmt + " reformatted data is " + theContr) + + jstring = jstring + "\"" + myHdr[i] + "\" : " + theContr + "\, " } From a0a296eb38d6980cdaa25186395beb791ae06099 Mon Sep 17 00:00:00 2001 From: BobCochran Date: Thu, 10 Oct 2019 18:58:49 -0400 Subject: [PATCH 26/43] Try to process multiple lines of input. --- database/mongodb_version4/reformat_fec_data_to_json.js | 4 ---- 1 file changed, 4 deletions(-) diff --git a/database/mongodb_version4/reformat_fec_data_to_json.js b/database/mongodb_version4/reformat_fec_data_to_json.js index 96069bd..4aff19e 100644 --- a/database/mongodb_version4/reformat_fec_data_to_json.js +++ b/database/mongodb_version4/reformat_fec_data_to_json.js @@ -79,8 +79,6 @@ rl.on('line', (line) => { theISODt = theISODt + myDateStr[2] + myDateStr[3] + "T00\:00\:00Z\"" + "\)" - console.log("Date value we are processing " + theISODt) - jstring = jstring + "\"" + myHdr[i] + "\" : " + theISODt + "\, " } @@ -96,8 +94,6 @@ rl.on('line', (line) => { var theContr = "\{\"\$numberDecimal\" \: \"" + myAmt + "\.00\"\}" - console.log("Amount we are processing " + myAmt + " reformatted data is " + theContr) - jstring = jstring + "\"" + myHdr[i] + "\" : " + theContr + "\, " } From d72eb87b10565b112c09de13796805454b0cb40d Mon Sep 17 00:00:00 2001 From: BobCochran Date: Thu, 10 Oct 2019 19:23:29 -0400 Subject: [PATCH 27/43] Write reformatted output to a file. --- .gitignore | 3 +++ database/mongodb_version4/reformat_fec_data_to_json.js | 7 +++++++ 2 files changed, 10 insertions(+) diff --git a/.gitignore b/.gitignore index bba6e68..abd38aa 100644 --- a/.gitignore +++ b/.gitignore @@ -96,3 +96,6 @@ public # DynamoDB Local files .dynamodb/ + +# Reformatted output data +database/mongodb_version4/reformatted/ diff --git a/database/mongodb_version4/reformat_fec_data_to_json.js b/database/mongodb_version4/reformat_fec_data_to_json.js index 4aff19e..f434d42 100644 --- a/database/mongodb_version4/reformat_fec_data_to_json.js +++ b/database/mongodb_version4/reformat_fec_data_to_json.js @@ -33,6 +33,10 @@ const rl = readline.createInterface({ }); +// Create a writeStream so we can write the reformatted output to a file + +const writeStream = fs.createWriteStream( "./reformatted/test7a.json", { encoding: "utf8"} ); + // Split and save the first line -- treat that as the header line. rl.on('line', (line) => { @@ -115,6 +119,9 @@ rl.on('line', (line) => { } console.log(jstring) + + writeStream.write(jstring) + } } ); rl.on('close', () => { From 67e3be7eaf6a1448d260c517c5685ba97c1e1713 Mon Sep 17 00:00:00 2001 From: BobCochran Date: Thu, 10 Oct 2019 19:36:04 -0400 Subject: [PATCH 28/43] Reformat a larger file containing donations from people employed by Pfizer, Inc. --- database/mongodb_version4/reformat_fec_data_to_json.js | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/database/mongodb_version4/reformat_fec_data_to_json.js b/database/mongodb_version4/reformat_fec_data_to_json.js index f434d42..1987113 100644 --- a/database/mongodb_version4/reformat_fec_data_to_json.js +++ b/database/mongodb_version4/reformat_fec_data_to_json.js @@ -35,7 +35,7 @@ const rl = readline.createInterface({ // Create a writeStream so we can write the reformatted output to a file -const writeStream = fs.createWriteStream( "./reformatted/test7a.json", { encoding: "utf8"} ); +const writeStream = fs.createWriteStream( "./reformatted/test8.json", { encoding: "utf8"} ); // Split and save the first line -- treat that as the header line. From 2b48ed2c9f56f0d7d95eb2a9c435e3514e2b5f07 Mon Sep 17 00:00:00 2001 From: BobCochran Date: Thu, 10 Oct 2019 21:11:52 -0400 Subject: [PATCH 29/43] Edit README file. --- README.md | 45 ++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 42 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 5834927..b031ab3 100644 --- a/README.md +++ b/README.md @@ -1,13 +1,26 @@ # Node.js Large File / Data Reading & Performance Testing -This is an example of 3 different ways to use Node.js to process big data files. One file is the Node.js' `fs.readFile()`, another is with Node.js' `fs.createReadSteam()`, and the final is with the help of the NPM module `EventStream`. +The challenge is to process a really large text file sourced from the Federal Elections Commission. The input data consists of records of monetary contributions by individuals to poltitical entities. + +Code provided in this repository is in the form of Node.js scripts. They showcase 3 different approaches to process big data files. One script utilizes the Node.js `fs.readFile()` API, another utilizes `fs.createReadSteam()`, and the final script incorporates the external NPM module `EventStream`. -There is also use of `console.time` and `console.timeEnd` to determine the performance of the 3 different implementations, and which is most efficient processing the files. +`console.time` and `console.timeEnd` are used to determine the performance of the 3 different implementations, and which is most efficient processing of the input files. ### To Download the Really Large File Download the large file zip here: https://www.fec.gov/files/bulk-downloads/2018/indiv18.zip -The main file in the zip: `itcont.txt`, can only be processed by the `readFileEventStream.js` file, the other two implementations can't handle the 2.55GB file size in memory (Node.js can only hold about 1.5GB in memory at one time).* +### To Download the Dictionary and Header Files + +The indiv18.zip contains files which are essentially in a comma separated values style. There are 21 fields. To make sense of them, you need to get additional files from the data_dictionaries folder. Download these as well: + +bulk-downloads/data_dictionaries/indiv_dictionary.txt +bulk-downloads/data_dictionaries/indiv_header_file.csv + +dictionary.txt explains the data provided in each field of a contribution record. header_file.csv is formatted as a header record in comma separated values format, with one heading for each field provided in the contribution record. + +The indiv18.zip file contains several files in the archive, some of which are quite large. The zip file alone can take 5+ minutes to download, depending on connection speed. + +The main file in the zip: `itcont.txt`, is quite large. It can only be processed by the `readFileEventStream.js` file, the other two scripts in this repository can't handle the 2.55GB file size in memory (Node.js can only hold about 1.5GB in memory at one time).* *Caveat: You can override the standard Node memory limit using the CLI arugment `max-old-space-size=XYZ`. To run, pass in `node --max-old-space-size=8192 .js` (this will increase Node's memory limit to 8gb - just be careful not to make it too large that Node kills off other processes or crashes because its run out of memory) @@ -21,5 +34,31 @@ Then you'll see the answers required from the file printed out to the terminal. ### To Check Performance Testing Use one of the smaller files contained within the `indiv18` folder - they're all about 400MB and can be used with all 3 implementations. Run those along with the `console.time` and `performance.now()` references and you can see which solution is more performant and by how much. +### To Put FEC Contribution Records in a MongoDB v4.x Collection +It is possible to reformat the input records to a Javascript Object Notation (JSON) format compatible with MongoDB database version 4.x. You must do some additional preparation work. + +Download and unzip the indiv18.zip file. Download the header file noted above. Make note of the path where you unzipped the contribution files to. +The header file is in comma separated values format, using actual commas ',' as the separator. You must change the separator to a pipe symbol '|'. + +`sed 's/\,/\|/g' < indiv_header_file.csv > test1.csv` + +You must append individual contribution records to this test1.csv file. For testing purposes, I like to use egrep to extract records of interest, such as contributors employed by particular companies. + +`egrep 'PFIZER' >> test1.csv` + +Navigate to the database/mongodb_version4 folder. + +Create a new folder named 'reformatted' in that folder. + +On the command line, issue + +`node reformat_fec_data_to_json.js path/to/your/test1.csv` + +The input file test1.csv is reformatted to json and the output file is in the reformatted/ folder that you created. It will have a *.json extension. You can change the name of the output file by changing the writeStream arguments in the reformat_fec_data_to_json.js script. + +You can then import this reformatted data into a MongoDB version 4.x collection using the mongoimport utility, like so: + +`mongoimport --db fecdata --collection t1 --file test1.json` +Contributor BobCochran has only tested the script with 1,563 input records. The script has not been thoroughly tested, in other words. From d8a3b589a9f613a4b4c164da855104aaaf8a0cf7 Mon Sep 17 00:00:00 2001 From: BobCochran Date: Thu, 10 Oct 2019 21:15:40 -0400 Subject: [PATCH 30/43] Fix README formatting. --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index b031ab3..414e477 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # Node.js Large File / Data Reading & Performance Testing -The challenge is to process a really large text file sourced from the Federal Elections Commission. The input data consists of records of monetary contributions by individuals to poltitical entities. +The challenge is to process a really large text file sourced from the Federal Election Commission. The input data consists of records of monetary contributions by individuals to poltitical entities. Code provided in this repository is in the form of Node.js scripts. They showcase 3 different approaches to process big data files. One script utilizes the Node.js `fs.readFile()` API, another utilizes `fs.createReadSteam()`, and the final script incorporates the external NPM module `EventStream`. @@ -14,6 +14,7 @@ Download the large file zip here: https://www.fec.gov/files/bulk-downloads/2018/ The indiv18.zip contains files which are essentially in a comma separated values style. There are 21 fields. To make sense of them, you need to get additional files from the data_dictionaries folder. Download these as well: bulk-downloads/data_dictionaries/indiv_dictionary.txt + bulk-downloads/data_dictionaries/indiv_header_file.csv dictionary.txt explains the data provided in each field of a contribution record. header_file.csv is formatted as a header record in comma separated values format, with one heading for each field provided in the contribution record. From 559dde7d3e5284a6bc698dfa626e2134f84c21b1 Mon Sep 17 00:00:00 2001 From: BobCochran Date: Thu, 10 Oct 2019 21:30:09 -0400 Subject: [PATCH 31/43] Edit README.md document. --- README.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 414e477..4d75dec 100644 --- a/README.md +++ b/README.md @@ -21,12 +21,12 @@ dictionary.txt explains the data provided in each field of a contribution record The indiv18.zip file contains several files in the archive, some of which are quite large. The zip file alone can take 5+ minutes to download, depending on connection speed. -The main file in the zip: `itcont.txt`, is quite large. It can only be processed by the `readFileEventStream.js` file, the other two scripts in this repository can't handle the 2.55GB file size in memory (Node.js can only hold about 1.5GB in memory at one time).* +The main file in the zip: `itcont.txt`, is the largest in size at 2.55 GiB. It can only be processed by the `readFileEventStream.js` script file, the other two scripts in this repository can't handle the file size in memory. Node.js can only hold about 1.5GB in memory at one time.* -*Caveat: You can override the standard Node memory limit using the CLI arugment `max-old-space-size=XYZ`. To run, pass in `node --max-old-space-size=8192 .js` (this will increase Node's memory limit to 8gb - just be careful not to make it too large that Node kills off other processes or crashes because its run out of memory) +*Caveat: You can override the standard Node memory limit using the CLI arugment `max-old-space-size=XYZ`. To run, pass in `node --max-old-space-size=8192 .js` This will increase Node's memory limit to 8 GiB - just be careful not to make the value so large that Node kills off other processes or crashes because it runs out of memory. ### To Run -Before the first run, run `npm install` from the command line to install the `event-stream` and `performance.now` packages from Node. +Before the first run, run `npm install` from the command line to install the `event-stream` and `performance.now` packages from Node. You may want to check the package.json file to adjust which versions of the external modules you are installing. Add the file path for one of the files (could be the big one `itcont.txt` or any of its smaller siblings in the `indiv18` folder that were just downloaded), and type the command `node ` in the command line. @@ -36,14 +36,14 @@ Then you'll see the answers required from the file printed out to the terminal. Use one of the smaller files contained within the `indiv18` folder - they're all about 400MB and can be used with all 3 implementations. Run those along with the `console.time` and `performance.now()` references and you can see which solution is more performant and by how much. ### To Put FEC Contribution Records in a MongoDB v4.x Collection -It is possible to reformat the input records to a Javascript Object Notation (JSON) format compatible with MongoDB database version 4.x. You must do some additional preparation work. +It is possible to reformat the input records to a Javascript Object Notation (JSON) format compatible with MongoDB database version 4.x. You must do some additional preparation work. The instructions here assume you are familiar with the Linux command line and Linux-based utilities such as sed and egrep. Download and unzip the indiv18.zip file. Download the header file noted above. Make note of the path where you unzipped the contribution files to. The header file is in comma separated values format, using actual commas ',' as the separator. You must change the separator to a pipe symbol '|'. `sed 's/\,/\|/g' < indiv_header_file.csv > test1.csv` -You must append individual contribution records to this test1.csv file. For testing purposes, I like to use egrep to extract records of interest, such as contributors employed by particular companies. +You must append individual contribution records to this test1.csv file. For testing purposes, use egrep to extract records of interest, such as contributors employed by particular companies. `egrep 'PFIZER' >> test1.csv` @@ -61,5 +61,7 @@ You can then import this reformatted data into a MongoDB version 4.x collection `mongoimport --db fecdata --collection t1 --file test1.json` +The advantage of loading this data into a MongoDB collection is that you can then perform aggregation queries on the collection using the db.collection.aggregate() utility of MongoDB. + Contributor BobCochran has only tested the script with 1,563 input records. The script has not been thoroughly tested, in other words. From 3b7b2faa141ee21fb1b121a5ebebd7f6b29a5999 Mon Sep 17 00:00:00 2001 From: BobCochran Date: Thu, 10 Oct 2019 21:37:03 -0400 Subject: [PATCH 32/43] Edit README.md file. --- README.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 4d75dec..165936b 100644 --- a/README.md +++ b/README.md @@ -21,7 +21,7 @@ dictionary.txt explains the data provided in each field of a contribution record The indiv18.zip file contains several files in the archive, some of which are quite large. The zip file alone can take 5+ minutes to download, depending on connection speed. -The main file in the zip: `itcont.txt`, is the largest in size at 2.55 GiB. It can only be processed by the `readFileEventStream.js` script file, the other two scripts in this repository can't handle the file size in memory. Node.js can only hold about 1.5GB in memory at one time.* +The main file in the zip archive: `itcont.txt`, is the largest in size at 2.55 GiB. It can only be processed by the `readFileEventStream.js` script file. The other two scripts in this repository can't handle the input file size in memory. Node.js can only hold about 1.5GB in memory at one time.* *Caveat: You can override the standard Node memory limit using the CLI arugment `max-old-space-size=XYZ`. To run, pass in `node --max-old-space-size=8192 .js` This will increase Node's memory limit to 8 GiB - just be careful not to make the value so large that Node kills off other processes or crashes because it runs out of memory. @@ -63,5 +63,6 @@ You can then import this reformatted data into a MongoDB version 4.x collection The advantage of loading this data into a MongoDB collection is that you can then perform aggregation queries on the collection using the db.collection.aggregate() utility of MongoDB. -Contributor BobCochran has only tested the script with 1,563 input records. The script has not been thoroughly tested, in other words. +Contributor BobCochran has only tested the script with 1,563 input records. The script has not been thoroughly tested, in other words. To test the reformatting, Node.js version 10.16.3 was used. + From fdd770f105e76af05f003c20a4beadf7645b5dc8 Mon Sep 17 00:00:00 2001 From: BobCochran Date: Thu, 10 Oct 2019 21:40:47 -0400 Subject: [PATCH 33/43] Fix incorrect egrep example. --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 165936b..565a8d2 100644 --- a/README.md +++ b/README.md @@ -45,7 +45,7 @@ The header file is in comma separated values format, using actual commas ',' as You must append individual contribution records to this test1.csv file. For testing purposes, use egrep to extract records of interest, such as contributors employed by particular companies. -`egrep 'PFIZER' >> test1.csv` +`egrep 'PFIZER' itcont_2018_20181228_52010302.txt >> test1.csv` Navigate to the database/mongodb_version4 folder. From 8f40bf28b6f10bc35a7e2d0369f810e12ef5a702 Mon Sep 17 00:00:00 2001 From: BobCochran Date: Fri, 11 Oct 2019 08:14:39 -0400 Subject: [PATCH 34/43] Continue updating the README.md --- README.md | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 565a8d2..70a4235 100644 --- a/README.md +++ b/README.md @@ -1,17 +1,22 @@ -# Node.js Large File / Data Reading & Performance Testing +# Node.js Read Large Files Challenge -The challenge is to process a really large text file sourced from the Federal Election Commission. The input data consists of records of monetary contributions by individuals to poltitical entities. +The challenge is to efficiently process a really large text file sourced from the Federal Election Commission. The input data consists of records of monetary contributions by individuals to poltitical entities. Code provided in this repository is in the form of Node.js scripts. They showcase 3 different approaches to process big data files. One script utilizes the Node.js `fs.readFile()` API, another utilizes `fs.createReadSteam()`, and the final script incorporates the external NPM module `EventStream`. +## Performance Testing of the Different Large File Reading Strategies + `console.time` and `console.timeEnd` are used to determine the performance of the 3 different implementations, and which is most efficient processing of the input files. -### To Download the Really Large File +### To Download the Really Large FEC File + +The text file to be processed consists of records of politcal campaign contributions by individuals during the 2018 election cycle. + Download the large file zip here: https://www.fec.gov/files/bulk-downloads/2018/indiv18.zip ### To Download the Dictionary and Header Files -The indiv18.zip contains files which are essentially in a comma separated values style. There are 21 fields. To make sense of them, you need to get additional files from the data_dictionaries folder. Download these as well: +The indiv18.zip contains files which are essentially in a comma separated values style. There are 21 fields. To make sense of them, you need to get additional files from the data_dictionaries folder. A "Documentation" folder is provided which contains the two files listed below. However, these files apply to the 2018 election data. If the file layouts have changed in subsequent election years, you will need to download the correct ones for the election cycle you are processing. Generally, you will want to download from the Federal Election Commission "bulk downloads" site. The data_dictionaries folder should be checked for files named like the below. Download them if needed: bulk-downloads/data_dictionaries/indiv_dictionary.txt @@ -35,7 +40,7 @@ Then you'll see the answers required from the file printed out to the terminal. ### To Check Performance Testing Use one of the smaller files contained within the `indiv18` folder - they're all about 400MB and can be used with all 3 implementations. Run those along with the `console.time` and `performance.now()` references and you can see which solution is more performant and by how much. -### To Put FEC Contribution Records in a MongoDB v4.x Collection +### Option: Put FEC Contribution Records in a MongoDB v4.x Database Collection It is possible to reformat the input records to a Javascript Object Notation (JSON) format compatible with MongoDB database version 4.x. You must do some additional preparation work. The instructions here assume you are familiar with the Linux command line and Linux-based utilities such as sed and egrep. Download and unzip the indiv18.zip file. Download the header file noted above. Make note of the path where you unzipped the contribution files to. @@ -63,6 +68,6 @@ You can then import this reformatted data into a MongoDB version 4.x collection The advantage of loading this data into a MongoDB collection is that you can then perform aggregation queries on the collection using the db.collection.aggregate() utility of MongoDB. -Contributor BobCochran has only tested the script with 1,563 input records. The script has not been thoroughly tested, in other words. To test the reformatting, Node.js version 10.16.3 was used. +Contributor @BobCochran has only tested the script with 1,563 input records. The script has not been thoroughly tested, in other words. To test the reformatting, Node.js version 10.16.3 was used. From 8a20c59b0a0c806f17dc19b7f6ccd74b5dc52db4 Mon Sep 17 00:00:00 2001 From: BobCochran Date: Fri, 11 Oct 2019 08:35:16 -0400 Subject: [PATCH 35/43] Test transaction amount for a numeric value. --- .../reformat_fec_data_to_json.js | 20 ++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/database/mongodb_version4/reformat_fec_data_to_json.js b/database/mongodb_version4/reformat_fec_data_to_json.js index 1987113..4b6c5a3 100644 --- a/database/mongodb_version4/reformat_fec_data_to_json.js +++ b/database/mongodb_version4/reformat_fec_data_to_json.js @@ -35,7 +35,7 @@ const rl = readline.createInterface({ // Create a writeStream so we can write the reformatted output to a file -const writeStream = fs.createWriteStream( "./reformatted/test8.json", { encoding: "utf8"} ); +const writeStream = fs.createWriteStream( "./reformatted/test7b.json", { encoding: "utf8"} ); // Split and save the first line -- treat that as the header line. @@ -96,9 +96,23 @@ rl.on('line', (line) => { var myAmt = myTrans[i] - var theContr = "\{\"\$numberDecimal\" \: \"" + myAmt + "\.00\"\}" + /* Is the amount field a real number? */ - jstring = jstring + "\"" + myHdr[i] + "\" : " + theContr + "\, " + if (!isNaN(myAmt)) { + + var theContr = "\{\"\$numberDecimal\" \: \"" + myAmt + "\.00\"\}" + + jstring = jstring + "\"" + myHdr[i] + "\" : " + theContr + "\, " + + } else { + + + var theContr = "\{\"\$numberDecimal\" \: \"0" + "\.00\"\}" + + jstring = jstring + "\"" + myHdr[i] + "\" : " + theContr + "\, " + + + } } From cd4983b2a40e4a87f118a2342126e3232a3d6ce3 Mon Sep 17 00:00:00 2001 From: BobCochran Date: Fri, 11 Oct 2019 09:29:50 -0400 Subject: [PATCH 36/43] Try to catch nonumeric transaction amounts. --- database/mongodb_version4/reformat_fec_data_to_json.js | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/database/mongodb_version4/reformat_fec_data_to_json.js b/database/mongodb_version4/reformat_fec_data_to_json.js index 4b6c5a3..ec0c19d 100644 --- a/database/mongodb_version4/reformat_fec_data_to_json.js +++ b/database/mongodb_version4/reformat_fec_data_to_json.js @@ -35,7 +35,7 @@ const rl = readline.createInterface({ // Create a writeStream so we can write the reformatted output to a file -const writeStream = fs.createWriteStream( "./reformatted/test7b.json", { encoding: "utf8"} ); +const writeStream = fs.createWriteStream( "./reformatted/test10b.json", { encoding: "utf8"} ); // Split and save the first line -- treat that as the header line. @@ -98,7 +98,7 @@ rl.on('line', (line) => { /* Is the amount field a real number? */ - if (!isNaN(myAmt)) { + if (!isNaN(myAmt) || !isEmptyOrSpaces(myAmt)) { var theContr = "\{\"\$numberDecimal\" \: \"" + myAmt + "\.00\"\}" @@ -144,4 +144,6 @@ rl.on('close', () => { }) - +function isEmptyOrSpaces(str){ + return str === null || str.match(/^ *$/) !== null; +} From 1d7d358d6c1a3e2fd8b4b7c59c4645a56d0d7716 Mon Sep 17 00:00:00 2001 From: BobCochran Date: Fri, 11 Oct 2019 09:57:53 -0400 Subject: [PATCH 37/43] Try to catch nonumeric transaction amounts. --- database/mongodb_version4/reformat_fec_data_to_json.js | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/database/mongodb_version4/reformat_fec_data_to_json.js b/database/mongodb_version4/reformat_fec_data_to_json.js index ec0c19d..bf5f774 100644 --- a/database/mongodb_version4/reformat_fec_data_to_json.js +++ b/database/mongodb_version4/reformat_fec_data_to_json.js @@ -35,7 +35,7 @@ const rl = readline.createInterface({ // Create a writeStream so we can write the reformatted output to a file -const writeStream = fs.createWriteStream( "./reformatted/test10b.json", { encoding: "utf8"} ); +const writeStream = fs.createWriteStream( "./reformatted/test10c.json", { encoding: "utf8"} ); // Split and save the first line -- treat that as the header line. @@ -98,7 +98,7 @@ rl.on('line', (line) => { /* Is the amount field a real number? */ - if (!isNaN(myAmt) || !isEmptyOrSpaces(myAmt)) { + if (!isNaN(myAmt) || !isEmptyOrSpaces(myAmt) || (typeof myAmt !== 'undefined')) { var theContr = "\{\"\$numberDecimal\" \: \"" + myAmt + "\.00\"\}" From 20571a6a8321181dd2b908cd06680c08ea44e9c1 Mon Sep 17 00:00:00 2001 From: BobCochran Date: Sat, 26 Oct 2019 19:19:16 -0400 Subject: [PATCH 38/43] Correctly test for an empty string in the transaction amount field. --- database/mongodb_version4/reformat_fec_data_to_json.js | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/database/mongodb_version4/reformat_fec_data_to_json.js b/database/mongodb_version4/reformat_fec_data_to_json.js index bf5f774..d2418ca 100644 --- a/database/mongodb_version4/reformat_fec_data_to_json.js +++ b/database/mongodb_version4/reformat_fec_data_to_json.js @@ -35,7 +35,7 @@ const rl = readline.createInterface({ // Create a writeStream so we can write the reformatted output to a file -const writeStream = fs.createWriteStream( "./reformatted/test10c.json", { encoding: "utf8"} ); +const writeStream = fs.createWriteStream( "./reformatted/test8a.json", { encoding: "utf8"} ); // Split and save the first line -- treat that as the header line. @@ -98,12 +98,18 @@ rl.on('line', (line) => { /* Is the amount field a real number? */ - if (!isNaN(myAmt) || !isEmptyOrSpaces(myAmt) || (typeof myAmt !== 'undefined')) { + if (myAmt !== "") { var theContr = "\{\"\$numberDecimal\" \: \"" + myAmt + "\.00\"\}" jstring = jstring + "\"" + myHdr[i] + "\" : " + theContr + "\, " + console.log("The myTrans array " + myTrans) + + console.log("The myAmt value " + myAmt) + + console.log("The typeof for myAmt " + typeof myAmt) + } else { From 8d8b1f5760a93f30e40264f83fae9a45aa3b7d8e Mon Sep 17 00:00:00 2001 From: BobCochran Date: Sat, 26 Oct 2019 19:21:31 -0400 Subject: [PATCH 39/43] Small program to split an inspect a line with no contribution amount. --- database/mongodb_version4/test_array2.js | 36 ++++++++++++++++++++++++ 1 file changed, 36 insertions(+) create mode 100644 database/mongodb_version4/test_array2.js diff --git a/database/mongodb_version4/test_array2.js b/database/mongodb_version4/test_array2.js new file mode 100644 index 0000000..87f9ead --- /dev/null +++ b/database/mongodb_version4/test_array2.js @@ -0,0 +1,36 @@ +function splitString(stringToSplit, separator) { + const arrayOfStrings = stringToSplit.split(separator); + + console.log('The original string is: "' + stringToSplit + '"'); + console.log('The separator is: "' + separator + '"'); + + if (arrayOfStrings[14] === "") { + arrayOfStrings[14] = 0 + console.log("The transaction amount has been replaced.") + } + + if (arrayOfStrings.includes(undefined)) { + + console.log("There are undefined or empty elements in the arrayOfStrings") + } + console.log("The Object.values " + Object.values(arrayOfStrings)) + console.log(Object.values(arrayOfStrings).length) + console.log(arrayOfStrings.length) + console.log('The array has ' + arrayOfStrings.length + ' elements: ' + arrayOfStrings.join('/')); +} + +const tempestString = 'Oh brave new world that has such people in it.'; +const monthString = 'Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec'; +const monthString2 = 'Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep||Nov|Dec'; +const fecString = 'C00339655|N|YE|P|201901179143856769|15|IND|COCHRAN, ERNEST W|PARIS|TX|754606333|TEXAS ONCOLOGY, P.A.|PHYSICIAN SHAREHOLDER MED ONC|12312018|||201901021615-165|1305336|||4021920191640570973' + +const space = ' '; +const comma = ','; +const pipe = '|' + +//splitString(tempestString, space); +//splitString(tempestString); +//splitString(monthString2, pipe); +splitString(fecString, pipe) + + From 16b224ee0078a178b9fd9b5eec8de5e22b4e1792 Mon Sep 17 00:00:00 2001 From: BobCochran Date: Sat, 26 Oct 2019 19:35:15 -0400 Subject: [PATCH 40/43] Test of large input file; comment out console.log statements. --- database/mongodb_version4/reformat_fec_data_to_json.js | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/database/mongodb_version4/reformat_fec_data_to_json.js b/database/mongodb_version4/reformat_fec_data_to_json.js index d2418ca..0787b46 100644 --- a/database/mongodb_version4/reformat_fec_data_to_json.js +++ b/database/mongodb_version4/reformat_fec_data_to_json.js @@ -35,7 +35,7 @@ const rl = readline.createInterface({ // Create a writeStream so we can write the reformatted output to a file -const writeStream = fs.createWriteStream( "./reformatted/test8a.json", { encoding: "utf8"} ); +const writeStream = fs.createWriteStream( "./reformatted/test9a.json", { encoding: "utf8"} ); // Split and save the first line -- treat that as the header line. @@ -104,11 +104,11 @@ rl.on('line', (line) => { jstring = jstring + "\"" + myHdr[i] + "\" : " + theContr + "\, " - console.log("The myTrans array " + myTrans) +// console.log("The myTrans array " + myTrans) - console.log("The myAmt value " + myAmt) +// console.log("The myAmt value " + myAmt) - console.log("The typeof for myAmt " + typeof myAmt) +// console.log("The typeof for myAmt " + typeof myAmt) } else { @@ -138,7 +138,7 @@ rl.on('line', (line) => { } - console.log(jstring) +// console.log(jstring) writeStream.write(jstring) From 7120a77741bcf8a53f32bca423c274be5d0874ae Mon Sep 17 00:00:00 2001 From: BobCochran Date: Sat, 26 Oct 2019 20:03:17 -0400 Subject: [PATCH 41/43] Edit to indicate number of records tested, and indicate Node.js and MongoDB versions used. --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 70a4235..64ca19e 100644 --- a/README.md +++ b/README.md @@ -64,10 +64,10 @@ The input file test1.csv is reformatted to json and the output file is in the re You can then import this reformatted data into a MongoDB version 4.x collection using the mongoimport utility, like so: -`mongoimport --db fecdata --collection t1 --file test1.json` +`mongoimport --db fecdata --collection t1 --file reformatted/test1.json` -The advantage of loading this data into a MongoDB collection is that you can then perform aggregation queries on the collection using the db.collection.aggregate() utility of MongoDB. +The advantage of loading this data into a MongoDB collection is that you can then perform aggregation queries on the collection using the db.collection.aggregate() utility of MongoDB. You can also index the collection as you prefer. -Contributor @BobCochran has only tested the script with 1,563 input records. The script has not been thoroughly tested, in other words. To test the reformatting, Node.js version 10.16.3 was used. +Contributor @BobCochran has only tested the script with 271,237 input records. The script has not been thoroughly tested, in other words. To test the reformatting, Node.js versions 10.16.3 and 12.3.0 were used. The reformatted data was added to a standalone instance of MongoDB Enterprise server version 4.0.13, running in a Ubuntu version 18.04.3 LTS server. From e4b8f8564c41a7ed2581e0e9129020235f649a40 Mon Sep 17 00:00:00 2001 From: BobCochran Date: Sat, 26 Oct 2019 20:06:19 -0400 Subject: [PATCH 42/43] Change writeStream to output to test1.json --- database/mongodb_version4/reformat_fec_data_to_json.js | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/database/mongodb_version4/reformat_fec_data_to_json.js b/database/mongodb_version4/reformat_fec_data_to_json.js index 0787b46..88bf1c8 100644 --- a/database/mongodb_version4/reformat_fec_data_to_json.js +++ b/database/mongodb_version4/reformat_fec_data_to_json.js @@ -35,7 +35,7 @@ const rl = readline.createInterface({ // Create a writeStream so we can write the reformatted output to a file -const writeStream = fs.createWriteStream( "./reformatted/test9a.json", { encoding: "utf8"} ); +const writeStream = fs.createWriteStream( "./reformatted/test1.json", { encoding: "utf8"} ); // Split and save the first line -- treat that as the header line. From f8a2981619478f22ece533ca9c51dd077d472476 Mon Sep 17 00:00:00 2001 From: BobCochran Date: Sat, 26 Oct 2019 20:24:26 -0400 Subject: [PATCH 43/43] Change reference to BobCochran as contributor. --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 64ca19e..7c7fbee 100644 --- a/README.md +++ b/README.md @@ -68,6 +68,6 @@ You can then import this reformatted data into a MongoDB version 4.x collection The advantage of loading this data into a MongoDB collection is that you can then perform aggregation queries on the collection using the db.collection.aggregate() utility of MongoDB. You can also index the collection as you prefer. -Contributor @BobCochran has only tested the script with 271,237 input records. The script has not been thoroughly tested, in other words. To test the reformatting, Node.js versions 10.16.3 and 12.3.0 were used. The reformatted data was added to a standalone instance of MongoDB Enterprise server version 4.0.13, running in a Ubuntu version 18.04.3 LTS server. +Contributor BobCochran has only tested the script with 271,237 input records. To test the reformatting, Node.js versions 10.16.3 and 12.3.0 were used. The reformatted data was added to a standalone instance of MongoDB Enterprise server version 4.0.13, running in a Ubuntu version 18.04.3 LTS server.