Quantcast
Channel: Recent Questions - Stack Overflow
Viewing all articles
Browse latest Browse all 12111

Using hxselect Command in Bash: How to Store Output in Arrays?

$
0
0

I'm trying to use curl in Bash to fetch data from a web page using this script.

#!/bin/bash# Define variables for URL and browsersGCitta="fucecchio"sGTypo="sale-houses"sGDomain="real-estate"url="https://www.$sGDomain.it/$sGTypo/$sGCitta"# Get HTML content of the pagehtml_content=$(curl -s -L "$url")# Use html-xml-utils to extract announcementsannouncements=$(echo "$html_content" | hxselect 'li.nd-list__item.in-searchLayoutListItem')# Connect to SQLite databasedb_file="immo.db"# Initialize arrays for prices, links, descriptions, sizes, and auctionprices=()links=()descriptions=()sizes=()auctions=()# Loop through the announcementswhile IFS= read -r announcement; do    # Extract price    price=$(echo "$announcement" | hxselect 'div.in-listingCardPrice span' -c | grep -oP '(?<=€ )[0-9,.]+')    # Extract link    link=$(echo "$announcement" | hxselect 'a.in-listingCardTitle' -s '\n' | grep -o 'href="[^"]*"' | sed 's/^href="//' | sed 's/"$//')    # Extract description    description=$(echo "$announcement" | hxselect 'a.in-listingCardTitle' -s '\n' | grep -oP '(?<=title=")[^"]+')    # Extract size    size=$(echo "$announcement" | hxselect 'div.in-listingCardFeatureList__item:nth-of-type(2) span' -c | grep -oP '[0-9]+')    # Check if the description contains the word "auction"    if [[ "$description" =~ "auction" ]]; then        auction=1    else        auction=0    fi    # Add data to the arrays    prices+=("$price")    links+=("$link")    descriptions+=("$description")    sizes+=("$size")    auctions+=("$auction")done <<< "$announcements"# Print the sizes of the arraysecho "Size of the prices array: ${#prices[@]}"echo "Size of the links array: ${#links[@]}"echo "Size of the descriptions array: ${#descriptions[@]}"echo "Size of the sizes array: ${#sizes[@]}"echo "Size of the auctions array: ${#auctions[@]}"# Insert data into the SQLite databasefor ((i = 0; i < ${#prices[@]}; i++)); do    sqlite3 "$db_file" "INSERT INTO $sGDomain (price, link, description, size, auction) VALUES ('${prices[i]}', '${links[i]}', '${descriptions[i]}', '${sizes[i]}', '${auctions[i]}')"done

Currently, it places all the data related to prices (for example, price) in prices[1]. However, I'd like it to put the first price in prices[1] and the second price in prices[2], but I'm not sure how to accomplish this. Can anyone give me a hint?

I tried using the 'hxselect' command to extract data from the web page and store it in arrays. However, the data for prices was all stored in the same array index (prices[1]) instead of being distributed across different indices based on the position of the prices on the web page. I expected that the prices would be stored in different array indices corresponding to their position on the page


Viewing all articles
Browse latest Browse all 12111

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>